CN116562359B

CN116562359B - CTR prediction model training method and device based on contrast learning and electronic equipment

Info

Publication number: CN116562359B
Application number: CN202310834678.9A
Authority: CN
Inventors: 董辉; 王芳
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-11-10
Anticipated expiration: 2043-07-10
Also published as: CN116562359A

Abstract

The application provides a CTR prediction model training method and device based on contrast learning and electronic equipment. The method comprises the following steps: mapping discrete features for CTR predictive model training into low-dimensional dense vectors to obtain original feature vectors; regularization constraint is carried out on the original feature vector, so that feature alignment loss and feature consistency loss are obtained; performing data enhancement on the original feature vectors to obtain first feature vectors and second feature vectors, performing feature intersection on the first feature vectors and the second feature vectors to obtain intermediate vectors, and calculating the distance between the intermediate vectors to obtain contrast learning loss; and generating a comprehensive loss function according to the feature alignment loss, the feature consistency loss, the contrast learning loss and the loss function of the original CTR prediction task, and reversely updating model parameters by utilizing the comprehensive loss function so as to train the CTR prediction model. The method improves the generalization performance of the training of the CTR prediction model, thereby improving the prediction performance and the prediction precision of the CTR prediction model.

Description

CTR prediction model training method and device based on contrast learning and electronic equipment

Technical Field

The application relates to the technical field of computers, in particular to a CTR prediction model training method and device based on contrast learning and electronic equipment.

Background

CTR prediction is used for predicting the probability that an item (such as commodity, advertisement, etc.) is clicked, and has wide application in the fields of recommendation systems, calculation of advertisements, etc. Many recent approaches achieve performance improvements by modeling complex interactions between features (referred to as feature interactions).

The current CTR prediction methods can be divided into two types, wherein the first type adopts traditional methods, such as Logistic Regression (LR) and models based on a Factorization Machine (FM), and mainly models simple low-order feature interaction; the second category adopts a deep learning-based method (such as deep FM), and the accuracy of CTR prediction can be further improved by capturing high-order feature interactions. However, the existing CTR prediction method still has the following problems; the high frequency features have a higher chance to be trained than the low frequency features, resulting in a less than optimal representation of the low frequency features. Since most CTR prediction models learn feature representations by back propagation, low frequency features cannot be sufficiently trained due to the fewer occurrences, resulting in sub-optimal feature representations and thus sub-optimal CTR prediction performance. This not only reduces the generalization performance of model training, but also reduces the predictive performance and accuracy of the model.

Disclosure of Invention

In view of the above, the embodiment of the application provides a CTR prediction model training method and device based on contrast learning and electronic equipment, so as to solve the problems of reducing the generalization performance of model training and reducing the prediction performance and the prediction precision of the model in the prior art.

In a first aspect of the embodiment of the present application, a CTR prediction model training method based on contrast learning is provided, including: discrete features used for CTR predictive model training are obtained, and the discrete features are mapped into dense vectors with low dimensionality to obtain original feature vectors; regularization constraint is respectively carried out on the original feature vectors by utilizing preset feature alignment constraint and feature consistency constraint, so that feature alignment loss and feature consistency loss are obtained; performing data enhancement on the original feature vectors to obtain first feature vectors and second feature vectors, performing feature intersection on the first feature vectors and the second feature vectors to obtain intermediate vectors, and calculating the distance between the intermediate vectors to obtain contrast learning loss; and generating a comprehensive loss function according to the feature alignment loss, the feature consistency loss, the contrast learning loss and the loss function of the original CTR prediction task, and reversely updating model parameters by utilizing the comprehensive loss function so as to train the CTR prediction model.

In a second aspect of the embodiment of the present application, there is provided a CTR prediction model training apparatus based on contrast learning, including: the mapping module is configured to acquire discrete features for training a CTR prediction model, map the discrete features into low-dimensional dense vectors and obtain original feature vectors; the regularization module is configured to utilize preset feature alignment constraint and feature consistency constraint to respectively conduct regularization constraint on the original feature vectors to obtain feature alignment loss and feature consistency loss; the contrast learning module is configured to perform data enhancement on the original feature vectors to obtain first feature vectors and second feature vectors, perform feature intersection on the first feature vectors and the second feature vectors to obtain intermediate vectors, and calculate the distance between the intermediate vectors to obtain contrast learning loss; and the training module is configured to generate a comprehensive loss function according to the feature alignment loss, the feature consistency loss, the contrast learning loss and the loss function of the original CTR prediction task, and reversely update model parameters by using the comprehensive loss function so as to train the CTR prediction model.

In a third aspect of the embodiments of the present application, there is provided an electronic device including a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.

In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.

The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects:

mapping the discrete features into a dense vector with low dimensionality by acquiring the discrete features for training a CTR prediction model to obtain an original feature vector; regularization constraint is respectively carried out on the original feature vectors by utilizing preset feature alignment constraint and feature consistency constraint, so that feature alignment loss and feature consistency loss are obtained; performing data enhancement on the original feature vectors to obtain first feature vectors and second feature vectors, performing feature intersection on the first feature vectors and the second feature vectors to obtain intermediate vectors, and calculating the distance between the intermediate vectors to obtain contrast learning loss; and generating a comprehensive loss function according to the feature alignment loss, the feature consistency loss, the contrast learning loss and the loss function of the original CTR prediction task, and reversely updating model parameters by utilizing the comprehensive loss function so as to train the CTR prediction model. The method improves the robustness of the feature expression and improves the generalization performance of the training of the CTR prediction model, thereby further improving the prediction performance and the prediction precision of the CTR prediction model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a CTR prediction model training method based on contrast learning according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a CTR prediction model training device based on contrast learning according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

Along with the digitalized transformation of enterprises, taking real estate enterprises as an example, the online applet and APP are developed for online house watching of various types such as new development project display, second-hand house source display, house renting house source display and the like, and meanwhile, the online applet is updated and popularized through various operation activities. Besides the new house-looking mode, the system can collect user behavior data more conveniently, so that clients can be known more in multiple directions, and a more comprehensive user portrait is established. Based on the business target, the browsing duration and the user retention of the user are improved, the user is helped to quickly locate the intention room source, the use experience of the user on the product is improved, and the CTR model is required to play a role. However, many of the existing CTR prediction efforts focus on designing complex models to simulate complex feature interactions, neglecting the importance of feature representation learning, resulting in poor prediction performance.

CTR prediction (Click-through-Through Rateprediction) is a task in the art of advertising that predicts the probability that a user clicks on a particular advertisement after seeing the advertisement. Click Through Rate (CTR) is a key indicator that measures the effectiveness of an advertisement and represents the ratio of the number of times a user clicks on an advertisement to the number of times the advertisement is presented. Click-through rate prediction may help advertising platforms and marketers better understand the potential performance of advertisements, thereby optimizing advertisement placement, improving the rate of Return On Investment (ROI) and user experience.

In practice, CTR prediction is typically implemented by a machine learning algorithm. The algorithms learn the influence of factors such as advertisement characteristics, user behavior characteristics, context environments and the like on advertisement click probability by analyzing historical advertisement display and click data. These algorithms can then predict whether the user will click on a particular advertisement in the new advertisement placement scenario. CTR predictions are key components in the advertisement system, such as real-time bidding (RTB), keyword advertisement bidding (e.g., adWords), and social media advertising.

CTR prediction is used to predict the probability that an item (e.g., an advertisement) is clicked, and is widely used in the fields of recommendation systems, computing advertisements, and the like. Many recent approaches achieve performance improvements by modeling complex interactions between features (referred to as feature interactions). The current training method of the CTR prediction model can be divided into the following two types:

the first category adopts traditional methods, such as Logistic Regression (LR) and models based on a Factorization Machine (FM), and mainly models simple low-order feature interactions;

the second category adopts a deep learning-based method, such as deep FM, and the accuracy of CTR prediction can be further improved by capturing high-order feature interaction.

In addition, many novel techniques (e.g., self-attention, CIN, PIN, etc.) have been proposed and widely used to capture complex arbitrary-order feature interactions. Most CTR prediction methods based on feature interactions follow a similar design pattern: including an embedding layer, a feature interaction layer, and a prediction layer. Because of the importance of feature interactions in CTR prediction, much research has focused on designing novel structures for feature interaction layers to capture more information and complex feature interactions. The Wide & Deep model is a typical example, which combines training of Wide linear units and Deep neural networks, and maintains the memory of historical data and generalization capability of new data.

Further, deep FM is a model that fuses the Deep Neural Network (DNN) and the Factorizer (FM) together. xDeepFM further proposes a Compressed Interaction Network (CIN) based on DeepFM for explicitly modeling higher order feature interactions. DCN and DCN-V2 then automatically increase the accuracy and efficiency of the DNN model by using a cross vector/cross matrix network. In addition, the attention mechanism is a very effective structure, as shown by models such as AFM and DIN, and has been widely used to improve performance.

While current methods have been fairly good in performance, many existing CTR predictive model training methods have an inherent problem: high frequency features (more frequent features) have a higher chance to be trained than low frequency features (less frequent features), resulting in a less than optimal representation of the low frequency features. This is because most CTR predictive models learn feature representations through back-propagation algorithms, while low frequency features cannot be sufficiently trained due to the low number of occurrences, resulting in sub-optimal feature representations, which in turn affect CTR predictive performance.

In view of the problems in the prior art, the present application focuses on learning accurate feature representations directly from the embedded layer to improve prediction performance. And regularizing the learned characteristic representation in the model training process by using a self-supervision learning method, so that the model performance is improved. Thus, even if the frequency of occurrence of the low frequency features is small, the low frequency features can be better represented and optimized, so that the overall CTR prediction performance is improved. In addition, the feature representation capability is improved through comparison learning, so that the prediction precision of the CTR prediction model is improved.

Fig. 1 is a flow chart of a CTR prediction model training method based on contrast learning according to an embodiment of the present application. The CTR predictive model training method of fig. 1 based on contrast learning may be performed by a server. As shown in fig. 1, the CTR prediction model training method based on contrast learning specifically may include:

S101, discrete features used for training a CTR prediction model are obtained, and the discrete features are mapped into dense vectors with low dimensionality to obtain original feature vectors;

s102, regularization constraint is respectively carried out on the original feature vectors by utilizing preset feature alignment constraint and feature consistency constraint, so that feature alignment loss and feature consistency loss are obtained;

s103, carrying out data enhancement on the original feature vectors to obtain first feature vectors and second feature vectors, carrying out feature intersection on the first feature vectors and the second feature vectors to obtain intermediate vectors, and calculating the distance between the intermediate vectors to obtain contrast learning loss;

and S104, generating a comprehensive loss function according to the feature alignment loss, the feature consistency loss, the comparison learning loss and the loss function of the original CTR prediction task, and reversely updating model parameters by using the comprehensive loss function so as to train the CTR prediction model.

The CTR prediction model of the embodiment of the application comprises the following three parts: the device comprises a basic CTR prediction module, a characteristic regularization module and a vector comparison module. The details and principles of the three parts of the CTR predictive model will be described in detail below in connection with specific embodiments.

In some embodiments, after mapping the discrete features into the low-dimensional dense vector, resulting in the original feature vector, the method of embodiments of the present application further comprises:

and carrying out feature interdigitation combination on the original feature vectors by using a CTR prediction module of the CTR prediction model, predicting the probability of clicking the object by using a multi-layer perceptron to obtain the probability value of clicking the object by the user, and determining the loss function of the original CTR prediction task.

Specifically, the CTR prediction in the CTR prediction module is a classification task, and the input of the CTR prediction module may be the characteristics of the user and the item (such as merchandise, advertisement, etc.), and the output is the probability value of the user clicking on the item. In a CTR prediction module, typically, either discrete features (e.g., category attributes) or continuous features (e.g., numerical attributes) of the input are first mapped into a dense vector of low dimensions. In short, the different types of features are converted into a more manageable form.

The CTR prediction module then performs feature interdigitation on these vectors, i.e., allowing different features to interact and combine with each other to generate a new feature representation. This helps mine the relevance and potential regularity between features. Finally, the CTR prediction module predicts the probability of the user clicking on the item through a multi-layer perceptron (i.e., MLP, a neural network structure). The process predicts the probability of the user clicking on the item based on the learned representation of the feature.

Therefore, the CTR prediction module of the embodiment of the present application mainly includes three steps of converting features into vectors, cross-combining the features, and predicting click probability using a neural network.

In some embodiments, regularizing the original feature vector with a feature alignment constraint to obtain a feature alignment loss, comprising:

and calculating the distance between the feature vectors in the same feature domain by using a feature regularization module of the CTR prediction model, and minimizing the distance between the feature vectors in the same feature domain so as to align the feature vectors in the same feature domain, wherein the sum of the distances of the feature vectors in the same feature domain after alignment is taken as a feature alignment loss.

In particular, among the many features of users and items, some features occur more frequently in the overall training sample, while others occur less frequently. In order to ensure that the low-frequency features and the high-frequency features are equally trained, two regularization constraint methods for feature vectors are introduced in the embodiment of the application. The core idea is as follows: the feature vector distance in the same feature domain is made closer, and the vector distance between different feature domains is made farther.

Wherein, the distance in the same feature domain is shortened, i.e. the features are aligned. In practice this may be achieved by a mathematical formula which calculates the distance between vectors within the same feature domain and attempts to minimize this distance. The feature zooming of different feature domains ensures the consistency of the features. In practical applications this may be done by penalizing the similarity between features. For example, this may be achieved by a mathematical formula that calculates the distance between vectors between different feature domains and attempts to maximize this distance. Through the two regularization constraints, the low-frequency features and the high-frequency features can be equally trained, so that the performance of the CTR prediction model is improved. In calculating the distance between vectors, euclidean distance, cosine distance, or other suitable distance metric methods may be used.

Further, feature alignment constraints aim to pull feature vector distances within the same feature domain. To achieve this goal, embodiments of the present application will use a mathematical formula to calculate the distance between vectors within the same feature domain. Feature alignment is then achieved by minimizing this distance, thereby making the vectors within the same feature domain closer. Finally, embodiments of the present application use the sum of the distances between vectors within the same feature domain as a penalty term (i.e., feature alignment penalty). Feature alignment helps to increase the sensitivity of the model to specific feature domains, thereby increasing model performance.

In practical applications, the following mathematical formula may be used to pull the distances within the same feature domain (i.e., feature alignment), which may be expressed as follows:

；

in some embodiments, regularizing the original feature vector with feature consistency constraints to obtain feature consistency loss, comprising:

and calculating the distances of the feature vectors between different feature domains by using a feature regularization module of the CTR prediction model, and maximizing the distances of the feature vectors between the different feature domains so as to pull the distances between the feature vectors between the different feature domains far, and taking the negative value of the sum of the distances of the feature vectors between the different feature domains as the feature consistency loss.

In particular, feature consistency constraints aim at pulling feature vector distances between different feature domains. To achieve this objective, embodiments of the present application calculate the distance between vectors of different feature domains by using a mathematical formula. This distance is then maximized to ensure that the features of the different feature domains have a greater degree of differentiation. Finally, embodiments of the present application take the negative value of the sum of the distances between vectors between different feature domains as another penalty term (i.e., feature consistency penalty). The feature consistency can enhance the capturing capability of the model on the difference between different feature domains, so that the generalization performance of the model is improved.

In practical applications, the following mathematical formula may be used to pull the features between different feature domains (i.e. ensure the consistency of the features), where the mathematical formula may be expressed as follows:

；

according to the technical scheme provided by the embodiment of the application, the two regularization constraints (namely the feature alignment constraint and the feature consistency constraint) are fused into the training process of the CTR prediction model, so that the low-frequency features and the high-frequency features can be trained equally. This helps balance the attention of the model to different features, avoiding that the model only focuses on high frequency features and ignores low frequency features. By adopting two feature vector regularization constraint methods, the performance of the CTR prediction model in practical application is improved, and the CTR prediction model shows better generalization capability in the face of diversified data.

In some embodiments, data enhancement of the original feature vector includes:

carrying out data enhancement in a random shielding mode, and carrying out random shielding on elements in an original feature vector by using preset probability so as to generate new training data; or,

data enhancement is carried out in a feature shielding mode, and features in the original feature vectors are randomly shielded by using preset probability so as to generate new training data; or,

And carrying out data enhancement in a dimension shielding mode, and carrying out random shielding on components in the original feature vector by using preset probability so as to generate new training data.

In particular, the vector representation of the features is important in the overall CTR prediction task. In order to improve robustness of feature representation (namely stability of a model to input data change), the embodiment of the application also creates a vector comparison module by referring to the idea of comparison learning. The contents and principles of the vector contrast module will be described in detail with reference to specific embodiments.

Further, in order to improve the generalization ability of the model, i.e., the predictive ability of the model to unknown data. The original vector is first processed using a data enhancement (persistence) method. The data enhancement aims at generating new training samples by carrying out random transformation on the original vectors, thereby improving the generalization capability of the model. Specifically, the data enhancement of the embodiment of the present application may include the following three ways:

the first way is: the elements initially embedded in the matrix E (i.e. the original feature vector) are masked randomly, i.e. by a certain probability p, such as to zero. The method can increase the robustness of the model to the feature deletion, so that the model has more generalization capability.

The second way is: feature masking, i.e. randomly masking certain rows (i.e. features) initially embedded in the matrix E (i.e. original feature vectors) with a certain probability p. This approach improves the processing power of incomplete data by allowing the model to train in the absence of partial features.

Third mode: the dimension mask, i.e. the random masking of certain columns (i.e. certain components of the vector) initially embedded in the matrix E (i.e. the original feature vector) by a certain probability p. This approach helps the model focus on fewer dimensions, thereby improving the generalization ability of the model to high-dimensional data.

Further, by any of the three data enhancement methods described above, the present application can generate two different vectors E1 (i.e., a first feature vector) and E2 (i.e., a second feature vector) from the original vector. The two vectors undergo the same subsequent processing flow, and feature crossing processing is performed first, so as to generate results (i.e., intermediate vectors) h1 and h2 after feature crossing. Feature crossing helps to mine the relevance and potential regularity between features, thereby improving model performance.

In some embodiments, after feature-intersecting the first feature vector and the second feature vector to obtain an intermediate vector, the method according to the embodiment of the present application further includes:

And respectively mapping the intermediate vectors corresponding to the first characteristic vector and the second characteristic vector into a low-dimensional space by using two parallel networks in a vector comparison module of the CTR prediction model, and calculating the distance between the output vectors of the parallel networks to obtain comparison learning loss.

Specifically, to reduce the feature dimension, embodiments of the present application introduce a mapping function (Projector) that maps the intermediate vectors h1 and h2 to a lower dimension space. This helps reduce computational complexity and improves generalization ability of the model.

Further, the contrast learning penalty is obtained by calculating the L2 distance between the results produced by the two twin networks (i.e., the two parallel networks used to process E1 and E2). The L2 distance can be used to measure the similarity between two enhanced vector representations using contrast learning loss as part of the overall loss function that is ultimately used to train the CTR predictive model. By minimizing this distance, the model can learn better the feature representation with greater robustness, thereby improving overall predictive performance. This approach helps the model to maintain stability in the face of data changes or noise, thereby improving its performance in practical applications.

In some embodiments, the integrated loss function is calculated from the loss function of the feature alignment loss, the feature consistency loss, the contrast learning loss, and the original CTR prediction task using the following formula:

；

wherein,representing the integrated loss function>Loss function representing original CTR prediction task, < +.>Indicating contrast learning loss, < >>Representing feature alignment loss->Representing a loss of feature consistency, +.>And->Representing the weights.

Specifically, the contrast learning loss and the feature regularization loss mentioned in the foregoing embodiments are added to the original CTR prediction model training task in a multitasking manner. Multitasking refers to optimizing multiple target tasks simultaneously during the training process. In the embodiment of the application, the original CTR training task is to predict the probability of clicking an object by a user, and the contrast learning loss and the feature regularization are two other tasks, which respectively pay attention to the robustness of the feature representation and the distinction between different feature domains.

Further, by training these tasks together, feature richness and distinction between different feature domains can be fully exploited, so that the overall prediction effect is better. In each training iteration, model parameters need to be updated based on a weighted sum of the original loss function and the feature alignment loss and the feature consistency loss. In this way, the method can simultaneously optimize the original task and regularization constraint in the training process of the CTR prediction model, thereby realizing the balance training of the low-frequency characteristics and the high-frequency characteristics.

It should be noted that, the comprehensive loss function calculation formula combines the loss function, the contrast learning loss and the characteristic regularization loss of the original CTR prediction task. That is, this formula represents a comprehensive loss function that combines the losses of these three tasks. During the training process, the model will try to minimize this comprehensive loss function, thus optimizing these three tasks simultaneously.

According to the technical scheme provided by the embodiment of the application, the data enhancement module is introduced into the CTR prediction model training task, so that the robustness of the feature expression is improved; in addition, a contrast learning branch is added in the CTR prediction model training task, and the generalization performance of model training is improved through a contrast learning end-to-end mode; the embodiment of the application designs two feature regularization methods, effectively solves the problem of insufficient low-frequency feature training, and simultaneously improves the distinguishing degree of different feature domain feature representations.

It should be noted that, in the embodiment of the present application, the CTR prediction module may be any general model, such as WDL, deep fm, etc.; the vector comparison module is used for comparing the learned measurement distance with L2 distance and can also be KL divergence, info NCEloss and the like; the data enhancement module can effectively combine three different data enhancement methods to generate more data enhancement methods, and the data enhancement methods are applied to model training.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Fig. 2 is a schematic structural diagram of a CTR prediction model training device based on contrast learning according to an embodiment of the present application. As shown in fig. 2, the CTR prediction model training apparatus based on contrast learning includes:

a mapping module 201, configured to obtain discrete features for training the CTR prediction model, and map the discrete features into dense vectors with low dimensionality to obtain original feature vectors;

a regularization module 202 configured to perform regularization constraint on the original feature vector with a predetermined feature alignment constraint and a feature consistency constraint, respectively, resulting in a feature alignment loss and a feature consistency loss;

the contrast learning module 203 is configured to perform data enhancement on the original feature vectors to obtain first feature vectors and second feature vectors, perform feature intersection on the first feature vectors and the second feature vectors to obtain intermediate vectors, and calculate the distance between the intermediate vectors to obtain contrast learning loss;

the training module 204 is configured to generate a comprehensive loss function according to the feature alignment loss, the feature consistency loss, the contrast learning loss, and the loss function of the original CTR prediction task, and to reversely update model parameters by using the comprehensive loss function so as to train the CTR prediction model.

In some embodiments, the mapping module 201 of fig. 2, after mapping discrete features into dense vectors with low dimensionality, obtains an original feature vector, uses a CTR prediction module of a CTR prediction model to perform feature cross combination on the original feature vector, uses a multi-layer perceptron to predict the probability of a user clicking on an item, obtains a probability value of the user clicking on the item, and determines a loss function of the original CTR prediction task.

In some embodiments, the regularization module 202 of fig. 2 calculates the distances between feature vectors within the same feature domain using the feature regularization module of the CTR prediction model and minimizes the distances between feature vectors within the same feature domain to align feature vectors within the same feature domain, taking the sum of the distances of feature vectors within the same feature domain after alignment as the feature alignment penalty.

In some embodiments, regularization module 202 of fig. 2 calculates the distances of feature vectors between different feature domains using a feature regularization module of the CTR prediction model and maximizes the distances of feature vectors between different feature domains to pull the distances between feature vectors between different feature domains apart, taking the negative of the sum of the distances of feature vectors between different feature domains as a feature consistency penalty.

In some embodiments, the contrast learning module 203 of fig. 2 performs data enhancement by using a random masking manner, and performs random masking on elements in the original feature vector by using a preset probability so as to generate new training data; or, carrying out data enhancement in a feature shielding mode, and carrying out random shielding on features in the original feature vector by using preset probability so as to generate new training data; or, the data enhancement is performed in a dimension shielding mode, and the components in the original feature vectors are randomly shielded by using preset probability so as to generate new training data.

In some embodiments, after the contrast learning module 203 of fig. 2 performs feature intersection on the first feature vector and the second feature vector to obtain an intermediate vector, two parallel networks in the vector contrast module of the CTR prediction model are used to map the intermediate vector corresponding to the first feature vector and the second feature vector into a low-dimensional space, and calculate the distance between the output vectors of the parallel networks, so as to obtain the contrast learning loss.

In some embodiments, training module 204 of FIG. 2 calculates the composite loss function using the following formula:

；

wherein, Representing the integrated loss function>Loss function representing original CTR prediction task, < +.>Indicating contrast learning loss, < >>Representing feature alignment loss->Representing a loss of feature consistency, +.>And->Representing the weights.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device 3 according to an embodiment of the present application. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: a processor 301, a memory 302 and a computer program 303 stored in the memory 302 and executable on the processor 301. The steps of the various method embodiments described above are implemented when the processor 301 executes the computer program 303. Alternatively, the processor 301, when executing the computer program 303, performs the functions of the modules/units in the above-described apparatus embodiments.

Illustratively, the computer program 303 may be partitioned into one or more modules/units, which are stored in the memory 302 and executed by the processor 301 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 303 in the electronic device 3.

The electronic device 3 may be an electronic device such as a desktop computer, a notebook computer, a palm computer, or a cloud server. The electronic device 3 may include, but is not limited to, a processor 301 and a memory 302. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the electronic device 3 and does not constitute a limitation of the electronic device 3, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may also include an input-output device, a network access device, a bus, etc.

The processor 301 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 302 may be an internal storage unit of the electronic device 3, for example, a hard disk or a memory of the electronic device 3. The memory 302 may also be an external storage device of the electronic device 3, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 3. Further, the memory 302 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 302 is used to store computer programs and other programs and data required by the electronic device. The memory 302 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided by the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementations, multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The CTR prediction model training method based on contrast learning is characterized by comprising the following steps of:

discrete features used for CTR prediction model training are obtained, the discrete features are mapped into dense vectors with low dimensionality, and original feature vectors are obtained, wherein the discrete features comprise category attributes of articles;

regularization constraint is carried out on the original feature vector by utilizing preset feature alignment constraint and feature consistency constraint, so that feature alignment loss and feature consistency loss are obtained;

performing data enhancement on the original feature vectors to obtain first feature vectors and second feature vectors, performing feature intersection on the first feature vectors and the second feature vectors to obtain intermediate vectors, and calculating the distance between the intermediate vectors to obtain contrast learning loss;

Generating a comprehensive loss function according to the feature alignment loss, the feature consistency loss, the contrast learning loss and the loss function of the original CTR prediction task, and reversely updating model parameters by utilizing the comprehensive loss function so as to train the CTR prediction model, wherein the trained CTR prediction model is used for predicting the probability of clicking an article by a user;

wherein after said mapping the discrete features into low-dimensional dense vectors resulting in original feature vectors, the method further comprises:

the CTR prediction module of the CTR prediction model is utilized to carry out feature interdigitation combination on the original feature vectors, the probability of clicking the object by the user is predicted by utilizing a multi-layer perceptron, the probability value of clicking the object by the user is obtained, and the loss function of the original CTR prediction task is determined;

the data enhancement of the original feature vector comprises the following steps:

performing data enhancement by using a random shielding mode, and randomly shielding elements in the original feature vector by using preset probability so as to generate new training data; or,

2. The method of claim 1, wherein regularizing the original feature vector with a predetermined feature alignment constraint and a feature consistency constraint, respectively, to obtain a feature alignment loss and a feature consistency loss, comprises:

and calculating the distance between the feature vectors in the same feature domain by utilizing a feature regularization module of the CTR prediction model, and minimizing the distance between the feature vectors in the same feature domain so as to align the feature vectors in the same feature domain, wherein the sum of the distances of the feature vectors in the same feature domain after alignment is taken as the feature alignment loss.

3. The method of claim 1, wherein regularizing the original feature vector with a predetermined feature alignment constraint and a feature consistency constraint, respectively, to obtain a feature alignment loss and a feature consistency loss, comprises:

and calculating the distance of the feature vectors among different feature domains by utilizing a feature regularization module of the CTR prediction model, and maximizing the distance of the feature vectors among different feature domains so as to pull the distance among the feature vectors among different feature domains far, and taking the negative value of the sum of the distances of the feature vectors among different feature domains as the feature consistency loss.

4. The method of claim 1, wherein after feature-crossing the first feature vector and the second feature vector to obtain an intermediate vector, the method further comprises:

and respectively mapping the intermediate vectors corresponding to the first characteristic vector and the second characteristic vector into a low-dimensional space by utilizing two parallel networks in a vector comparison module of the CTR prediction model, and calculating the distance between the parallel network output vectors to obtain the comparison learning loss.

5. The method of claim 1, wherein the integrated loss function is calculated from the feature alignment loss, the feature consistency loss, the contrast learning loss, and a loss function of an original CTR prediction task using the following formula:

6. CTR predictive model training device based on contrast study, characterized by comprising:

the mapping module is configured to acquire discrete features for training a CTR prediction model, map the discrete features into low-dimensional dense vectors to obtain original feature vectors, wherein the discrete features comprise category attributes of articles;

The regularization module is configured to respectively perform regularization constraint on the original feature vector by utilizing a preset feature alignment constraint and a feature consistency constraint to obtain a feature alignment loss and a feature consistency loss;

the contrast learning module is configured to perform data enhancement on the original feature vectors to obtain first feature vectors and second feature vectors, perform feature intersection on the first feature vectors and the second feature vectors to obtain intermediate vectors, and calculate the distance between the intermediate vectors to obtain contrast learning loss;

the training module is configured to generate a comprehensive loss function according to the characteristic alignment loss, the characteristic consistency loss, the contrast learning loss and the loss function of the original CTR prediction task, and reversely update model parameters by utilizing the comprehensive loss function so as to train the CTR prediction model, wherein the trained CTR prediction model is used for predicting the probability of clicking an article by a user;

the mapping module is further configured to, after the discrete features are mapped to the dense vectors with low dimensionality to obtain original feature vectors, perform feature cross combination on the original feature vectors by using a CTR prediction module of the CTR prediction model, predict probability of clicking an object by a user by using a multi-layer perceptron, obtain probability values of clicking the object by the user, and determine a loss function of the original CTR prediction task;

The contrast learning module is also used for carrying out data enhancement in a random shielding mode, and carrying out random shielding on elements in the original feature vector by using preset probability so as to generate new training data; or, carrying out data enhancement in a feature shielding mode, and carrying out random shielding on features in the original feature vector by using preset probability so as to generate new training data; or, the data enhancement is performed in a dimension shielding mode, and the components in the original feature vectors are randomly shielded by using preset probability so as to generate new training data.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 5 when the program is executed.

8. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any one of claims 1 to 5.