CN113902473A

CN113902473A - Training method and device of business prediction system

Info

Publication number: CN113902473A
Application number: CN202111155580.8A
Authority: CN
Inventors: 杨哲; 杨一鹏
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-07

Abstract

The embodiment of the specification provides a training method of a service prediction system, wherein the service prediction system comprises a public model maintained by a server and a plurality of local models deployed in a plurality of user terminals; the method is performed by a server and comprises the following steps: obtaining a plurality of public sample sets corresponding to a plurality of users, and training a public model by using the public sample sets; then, based on the trained common model, performing joint training including multiple iterations on the multiple local models, wherein any iteration comprises: aiming at any first user, processing a corresponding public sample by using a public model to obtain a public prediction result, and sending the public prediction result to a corresponding first terminal; subsequently, receiving from the first terminal parameter update data of a first local model, which is determined based on the common prediction result and a local sample of the first terminal; then, based on the plurality of parameter update data received from the plurality of terminals, the parameters after the current round of joint update are determined and provided to the plurality of terminals.

Description

Training method and device of business prediction system

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for training a business prediction system.

Background

With the progress of science and technology and the development of society, a large number of service platforms emerge to provide various services for people so as to meet the requirements of people in work and life. Meanwhile, machine learning techniques have become a focus of current research, and many service platforms attempt to provide better services to users by predicting user behaviors through machine learning models. For example, the service platform may decide whether to distribute the coupon to a certain user by predicting whether the user will approve the coupon distributed to the user, so that as many users as possible can really enjoy the limited rights and interests distributed by the service platform.

However, the current model prediction for user behavior has some problems to be improved, for example, the prediction accuracy is limited, or the model training process has privacy disclosure risks. Therefore, a scheme is needed to effectively improve the prediction accuracy for the user behavior and effectively reduce the risk of privacy disclosure, and the like.

Disclosure of Invention

One or more embodiments of the present specification describe a training method and an apparatus for a business prediction system, which combine a user sample already collected in a server with a local sample of each user, construct different behavior prediction models for different users, and implement "thousands of people and thousands of models", thereby effectively improving the prediction performance of the models, and each user trains the local model using the local sample, so that the risk of privacy disclosure of data collected by a terminal can be effectively reduced.

According to a first aspect, a method for training a traffic prediction system is provided, where the traffic prediction system includes a common model maintained in a server and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the method is performed by the server and comprises: acquiring a plurality of public sample sets corresponding to the plurality of users, wherein the label of each public sample indicates whether the corresponding user performs a target behavior on the business object; training the common model using the plurality of common sample sets; based on the trained common model, performing joint training including multiple iterations on the multiple local models, wherein any iteration comprises: aiming at any first user, processing a corresponding public sample by using the public model to obtain a public prediction result, and sending the public prediction result to a corresponding first terminal; receiving, from the first terminal, parameter update data for a first local model, the parameter update data determined based on the common prediction result and a local sample of the first terminal; determining parameters after the current round of joint update based on a plurality of parameter update data received from the plurality of terminals for providing to the plurality of terminals.

In one embodiment, the business object is a user interest, and the target behavior comprises a verification and cancellation behavior; or, the business object is a business link, and the target behavior comprises a click behavior; or, the business object is a commodity, and the target behavior comprises a purchasing behavior.

In one embodiment, the traffic prediction system further comprises a plurality of task models corresponding to the plurality of users, and the training of the common model involves performing a plurality of rounds of iterative training in combination with the plurality of task models, wherein any round of iterative training comprises: selecting a plurality of task models participating in the iterative training of the current round from the plurality of task models; determining model parameters of a first task model of the task models as current parameters of the common model, and determining a batch of training samples and a batch of testing samples based on a common sample set of a user corresponding to the first task model; training the first task model based on the batch of training samples, and calculating a first test gradient based on the trained first task model and the batch of test samples; updating current parameters of the common model based on a number of test gradients calculated for the number of task models.

In a specific embodiment, training the first task model based on the batch of training samples includes: calculating a first training gradient based on the batch of training samples and a first task model; updating model parameters of the first task model based on the first training gradient and a learning rate set for the first task model; wherein there are two task models among the plurality of task models, the two common sample sets corresponding thereto have different sample numbers, and the task model corresponding to the smaller sample number is set to have a larger learning rate.

In one embodiment, the joint training including a plurality of iterations is performed on the plurality of local models based on the trained common model, and the joint training includes: and respectively sending the parameters of the common model to the plurality of user terminals, so that the plurality of user terminals initialize the parameters of the local model based on the parameters of the common model.

In one embodiment, the sample space corresponding to the local sample is a subspace of the feature space corresponding to the common sample, and the local models are sub-models of the global model; wherein determining the parameters after the current round of joint update based on the parameter update data received from the terminals comprises: updating a current global model based on the plurality of parameter update data; after determining to update the current global model, the method further comprises: and determining a first parameter part corresponding to the first local model based on the current global model, and sending the first parameter part to the first terminal.

In a specific embodiment, the parameter update data includes a parameter gradient portion corresponding to the first local model; wherein updating the current global model based on the plurality of parameter update data comprises: determining a global parameter gradient corresponding to the global model based on a plurality of parameter gradient portions in the plurality of parameter update data; updating the current global model based on the global parameter gradient.

According to a second aspect, there is provided a training method of a traffic prediction system, the traffic prediction system including a common model maintained in a server, and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the method involves performing a plurality of iterations of joint training on the plurality of local models, performed by a first terminal of any of the plurality of terminals, wherein any iteration comprises: receiving a common prediction result for a first user from the server, wherein the common prediction result is determined by processing a common sample corresponding to the first user by using a common model, and the common model is obtained by training based on a plurality of common sample sets corresponding to the plurality of users; training a first local model based on the local sample in the first terminal and the public prediction result to obtain corresponding parameter updating data; sending the parameter updating data to the server; acquiring parameters after the round of joint update from the server, which are determined by the server based on a plurality of parameter update data received from the plurality of terminals; and updating the first local model by using the parameters after the current round of joint updating.

In one embodiment, the local model includes a prediction layer and a weighting processing layer; wherein training a first local model based on the local samples in the first terminal and the common prediction result comprises: processing the local sample by utilizing the prediction layer to obtain a local prediction result; processing the local prediction result and the public prediction result by utilizing the weighting processing layer to obtain a comprehensive prediction result; training the first local model based on the synthetic prediction results and the labels of the local samples.

In a specific embodiment, any iteration is a first iteration; wherein, prior to training the local model, the method further comprises: receiving parameters of the common model from the server; initializing parameters of the prediction layer based on parameters of the common model.

In one embodiment, training a first local model based on the local samples in the first terminal and the common prediction result to obtain corresponding parameter update data includes: determining a batch of training samples and a batch of testing samples to be used in the iterative training based on the local sample set; training the first local model based on the batch of training samples; and calculating a first test gradient based on the trained first local model and the batch of test samples to serve as the parameter updating data.

In one embodiment, the sample space corresponding to the local sample is a subspace of the feature space corresponding to the common sample, and the local models are sub-models of the global model; obtaining parameters after the current round of combined updating from the server comprises the following steps: obtaining, from the server, a first parameter portion of a current global model corresponding to the first local model, the current global model being determined by the server based on the plurality of parameter update data; wherein, updating the local model by using the parameters after the current round of joint update comprises: updating the parameters of the local model to the first parameter portion.

According to a third aspect, there is provided a training apparatus for a traffic prediction system, the traffic prediction system including a common model maintained in a server, and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the apparatus is integrated with the server, and comprises: a public sample acquiring unit configured to acquire a plurality of public sample sets corresponding to the plurality of users, wherein a label of each public sample indicates whether the corresponding user performs a target behavior on the business object; a common model training unit configured to train the common model using the plurality of common sample sets; a local model training unit configured to perform joint training including multiple iterations on the plurality of local models based on the trained common model, wherein any iteration is performed by the following sub-units: the public prediction subunit is configured to process a corresponding public sample by using the public model aiming at any first user to obtain a public prediction result; a result transmitting subunit configured to transmit the common prediction result to the corresponding first terminal; an update receiving subunit configured to receive, from the first terminal, parameter update data of a first local model, the parameter update data being determined based on the common prediction result and a local sample of the first terminal; an aggregation subunit configured to determine, based on a plurality of parameter update data received from the plurality of terminals, parameters after the current round of joint update for providing to the plurality of terminals.

According to a fourth aspect, there is provided a training apparatus of a traffic prediction system, the traffic prediction system including a common model maintained in a server, and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the training comprises joint training of multiple iterations of the multiple local models, the apparatus is integrated in a first terminal of any one of the multiple terminals, and the apparatus performs any one of the multiple iterations through the following units included in the apparatus: a result receiving unit configured to receive, from the server, a common prediction result for a first user, the common prediction result being determined by processing a common sample corresponding to the first user using a common model, the common model being trained based on a plurality of common sample sets corresponding to the plurality of users; the determining unit is configured to train a first local model based on the local sample in the first terminal and the public prediction result to obtain corresponding parameter updating data; a transmission unit configured to transmit the parameter update data to the server; a receiving unit configured to acquire, from the server, parameters after the round of joint update, which are determined by the server based on a plurality of parameter update data received from the plurality of terminals; and the local model updating unit is configured to update the first local model by using the parameters after the current round of joint updating.

According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method provided by the first or second aspect.

According to a sixth aspect, there is provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method provided by the first or second aspect.

By adopting the method and the device provided by the embodiment of the specification, in the training method of the business prediction system disclosed by the embodiment of the specification, the public model maintained in the server is trained, and then the multiple local models deployed in the multiple terminals are jointly trained on the basis of the trained public model, so that the public sample in the server and the local sample in the user terminal are used in combination, different behavior prediction models are constructed for different users, thousands of faces of a characteristic layer and a model layer are realized, the accuracy of the prediction system is effectively improved, and meanwhile, the local data acquired by the terminals are only used locally, so that the privacy leakage risk of the local data of the terminals is effectively reduced.

Furthermore, on one hand, the concept of meta-learning is introduced, the training of the public model in the server is used as pre-training for the training of the local model in the user terminal, so that the training efficiency and accuracy of the prediction system are effectively improved, furthermore, the characteristics of the long-tail user are fully considered in the pre-training, different learning rates are set for different tasks, the model is prevented from being over-fitted, and the prediction accuracy of the model is fully improved; on the other hand, each local model is designed as a sub-model of the global model, so that each terminal only needs to perform local calculation on the parameter part corresponding to the local model and communicate with the server, and does not need to update and communicate the full parameters of the global model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an implementation scenario of a training traffic prediction system according to an embodiment;

FIG. 2 illustrates a flow diagram of a method of training a common model according to one embodiment;

FIG. 3 shows a flow diagram of a method of training a common model according to another embodiment;

FIG. 4 illustrates a multi-party interaction diagram for jointly training multiple local models, according to one embodiment;

FIG. 5 illustrates a scenario diagram for jointly training multiple local models, according to one embodiment;

FIG. 6 illustrates a schematic diagram of a training apparatus of a traffic prediction system according to one embodiment;

fig. 7 is a schematic structural diagram of a training device of a traffic prediction system according to another embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

As mentioned above, there are some problems to be improved in the current user behavior prediction, for example, the prediction accuracy is limited, and the privacy disclosure risk exists in the model training process.

Based on the scheme, the inventor provides a scheme which at least can effectively improve the prediction accuracy and effectively reduce the privacy disclosure risk.

Fig. 1 is a schematic diagram illustrating an implementation scenario of training a traffic prediction system according to an embodiment, as shown in fig. 1, the traffic prediction system includes a common model maintained by a server, and a plurality of local models deployed in a plurality of terminals (e.g., deployed at a terminal t)₁Local model m in₁Etc.).

The training process of the traffic prediction system includes two phases A and B, first, in phase A, the server is based on a plurality of collected sample sets corresponding to a plurality of users (for example, aiming at user u)₁Collected sample set s₁Etc.), a common model for predicting user behavior is trained. Then, in the stage B, performing joint training of multiple iterations on the multiple local models, specifically, in any iteration, the terminal t_iUsing its local samples k_iAnd the received common model is based on user u_iCommon score g of corresponding common sample output_iTraining local model m_i(ii) a Then, the server aggregates the model parameter update information uploaded by the plurality of terminals, and feeds back the aggregated information to each terminal (in the process, the process is not shown in fig. 1)Schematic); after multiple rounds of iterative joint training are carried out, a plurality of trained local models can be obtained.

Therefore, the local sample collected by the user terminal can be kept locally without being transmitted outwards, so that the risk of privacy disclosure of the user can be effectively reduced; meanwhile, the training of each local model substantially uses the user data legally collected in the server and the local samples respectively collected by the plurality of terminals, and the prediction accuracy of the model can be effectively improved through abundant training data.

The implementation steps of the above scheme are described below with reference to specific examples. For clarity of description, the above-mentioned stage A is introduced. Fig. 2 shows a flowchart of a method for training a common model according to an embodiment, and an execution subject of the method may be any device, apparatus, server, or server cluster with computing and processing capabilities, such as the server shown in fig. 1. As shown in fig. 2, the method comprises the steps of:

step S210, a plurality of public samples are obtained, wherein the label of each public sample indicates whether a corresponding user performs a target behavior on a business object; step S220, training a common model using the plurality of common samples.

The development of the above steps is as follows:

first, in step S210, a plurality of common samples are acquired. It should be understood that, for the purpose of describing the samples collected in the server and the samples collected in the user terminal in a differentiated manner, the former are referred to as common samples and the latter are referred to as local samples.

Each public sample comprises a sample characteristic and a sample label, wherein the sample characteristic at least comprises a user characteristic sample of a corresponding user, and the sample label indicates whether the corresponding user performs a target behavior on the business object. In one embodiment, the user characteristics may include user base attributes such as age, gender, region, terminal device model, and the like. In another embodiment, the user characteristics may also include operating characteristics of the user in the service platform providing the service object, such as login frequency or frequency of performing other specific operations.

On the other hand, when there is a difference between the business objects targeted by different samples, the sample features may further include object features of corresponding business objects. In one embodiment, the business object is a business link, and accordingly, the characteristics of the business link may include: link carriers (such as text or pictures), contents displayed after clicking links, and the like, and the target behavior may be click behavior. In another embodiment, the business object is a commodity, and accordingly, the commodity characteristics may include: the source, cost, sales, name, category, etc., and the target behavior may be a purchasing behavior. In another embodiment, the business object is a user interest, and accordingly, the interest characteristics may include: the amount of the interest, the kind of the interest (such as a full discount coupon, a coupon or a redemption coupon, etc.), and the target behavior may be a verification and cancellation behavior.

According to a specific embodiment, the sample label indicates whether the corresponding user performs a verification and cancellation action on the user interest issued by the user, and the sample characteristics include a user basic attribute (such as age, gender and the like), a user active attribute (such as payment frequency and login frequency), a user payment capacity (such as payment using balance, flower, bank card or familiarity card), a user payment preference (such as preference for consumption in an e-commerce, entertainment, game or trip), a transaction attribute (such as order amount, order time and merchant name), a terminal use characteristic (such as terminal operation track, application list, page access content and the like), and an interest characteristic (such as an interest amount, an interest type and the like).

In the above, a common sample is introduced. After the common samples are obtained, a common model may be trained using the plurality of common samples in step S220. It is to be understood that the common model may be implemented as a deep neural network DNN or a convolutional neural network CNN, etc. In one embodiment, in each iteration training round, sampling is performed based on the plurality of common samples, a batch of sampled samples are processed by using a common model, so that a training loss is determined based on a corresponding model prediction result and a sample label, and parameters of the common model are adjusted based on the training loss until a trained common model is obtained after a plurality of iteration training rounds.

In another embodiment, the inventor considers that parameters of the local model can be initialized with parameters of the common model later in the B phase to optimize the joint training effect of the local model. To this end, the idea of meta-learning (meta-learning) is introduced to train common models. Specifically, the plurality of common samples relate to different users, based on which the plurality of common samples are divided into common sample sets corresponding to the different users, so that behavior prediction of each of the related plurality of users is used as a single prediction task, and a plurality of task models for the plurality of users are designed. Further, multiple rounds of iterative training are performed on the plurality of task models and the common model, and any round of iterative training is introduced for the sake of brief description.

Fig. 3 shows a flowchart of a training method of a common model according to another embodiment, as shown in fig. 3, the following steps are included in any round of iterative training:

in step S301, a plurality of task models participating in the iterative training of the current round are selected from the plurality of task models. It is to be understood that several of these references refer to one or more of these. In one embodiment, a predetermined number of task models may be randomly selected as the plurality of task models. In another embodiment, a predetermined number of task models may be selected in sequence as the plurality of task models.

Based on the selected task models, in step S302, for any one of the task models (for brevity, referred to as a first task model), the model parameters are determined as the current parameters of the common model. It should be understood that if the iteration training of the current round is the first round, before step S302, the parameters of the common model also need to be initialized randomly, and accordingly, the current parameters of the common model are the parameters after the random initialization; if the iteration training of the current round is not the first round, the current parameters of the common model are the parameters updated by the previous iteration.

For brevity, a user corresponding to the first task model is referred to as a first user, and a common sample set corresponding to the first user is referred to as a first common sample set, and it should be noted that the method further includes: determining a batch of training samples and a batch of testing samples for the current iteration based on the first common sample set; in addition, the determination of a batch of training samples and a batch of testing samples can be completed by sampling the existing sampling mode, and is not described in detail.

Thereafter, in step S303, a first task model is trained based on the batch of training samples. Specifically, the sample characteristics of a training sample are input into a first task model to obtain a prediction result, and then a first training gradient is calculated based on the prediction result and the sample label of the training sample; then, based on the first training gradient and the learning rate set for the first task model, the model parameters of the first task model are updated. Illustratively, the updated calculation of the model parameters may be represented as:

in the above equation, θ and θ' represent the parameter before and after the update of the first task model, respectively, α represents the learning rate set for the first task model, and f represents the learning rate set for the first task model_θRepresenting the first task model before the update,

a loss function representing a first task model,

representing a first training gradient.

It should be noted that, in step S302, a plurality of batches of training samples may also be sampled, and accordingly, in this step, a plurality of batches of training samples may be used to train the first task model multiple times.

Thus, the trained first task model can be obtained, and in step S304, a first test gradient is calculated based on the trained first task model and the batch of test samples. Specifically, the sample characteristics of the test sample are input into the trained first task model f_θ′Obtaining a prediction result, and then based on the prediction result and a sample mark of the test sampleCalculating a first test gradient

In this way, a first test gradient corresponding to the first task model, which is any one of the task models, can be obtained, which means that a plurality of test gradients calculated for the plurality of task models can be obtained. Further, in step S305, the current parameters of the common model are updated based on the number of test gradients.

In one embodiment, the model parameters of the common model may be updated based on the mean of several test gradients and the learning rate set for the common model. Illustratively, the updated calculation of the model parameters may be represented as:

in the above equation, θ represents a model parameter of the common model, β represents a learning rate set for the common model,

mean values of several test gradients are indicated.

In another embodiment, the parameter update of the common model may also use the first gradient described above. Specifically, the current parameters of the common model are updated based on the training gradient mean values of the training gradients corresponding to the task models, the testing gradient mean values of the testing gradients, and the learning rate set for the common model. Illustratively, the updated calculation of the current parameter may be represented as follows:

the mean value of the test gradient described above is expressed,

representing the mean of the training gradients described above.

Therefore, parameter updating of the public model can be completed based on training of the plurality of task models, so that the public model can learn how the task models are learned, and when the public model needs to be used for a specific user prediction task, the public model can have excellent prediction capability after being finely adjusted by using training samples under the corresponding task.

It should be noted that, in the acquisition of the common sample, a small number (e.g., 20%) of high-activity users contribute to a large number (e.g., 80%) of samples, which leads to the prediction result of the high-activity user leading the model, and the accuracy is reduced. In order to prevent the overfitting of the model, when the plurality of tasks are trained, an individualized learning rate alpha can be set, for example, a long-tailed user uses a larger learning rate, so that the model can pay more attention to the gradient direction of the task, and the accuracy of the model is improved. It should be understood that the long-tailed user refers to a user with a smaller number of samples in the corresponding common sample set among the plurality of users.

Thus, by repeatedly executing steps S301 to S305, multiple rounds of iterative training of the common model and the task models can be completed, and a well-trained common model with excellent performance can be obtained.

In the above, the common model in the training business prediction system in the phase a is introduced. Next, joint training of a plurality of local models in the business prediction system based on the trained common model in the phase B is described.

Fig. 4 is a schematic diagram illustrating interaction among multiple parties for jointly training multiple local models according to an embodiment, where multiple parties include the server and multiple terminals corresponding to the multiple users, and the first terminal illustrated in fig. 4 is any one of the multiple terminals. It should be understood that the above server may be implemented instead as any other device, equipment cluster, etc. with computing and processing capabilities, and the terminal of the user may be a smartphone, a wearable device, a tablet, a notebook, a desktop, etc. that binds the corresponding user identity.

The joint training of the local models involves multiple rounds of iterative training, where any one round of iterative training is accomplished through the multi-party interaction shown in fig. 4. As shown in FIG. 4, the multi-party interaction process includes the following steps:

step S401, the server processes the public sample corresponding to the first user of any one of the plurality of users by using the trained public model to obtain a public prediction result. It should be understood that the server stores a first common sample set corresponding to the first user, and based on this, sample characteristics of some or all common samples in the first common sample set may be input into the trained common model, respectively, to obtain a common prediction result corresponding to the common samples. Further, in one example, the common prediction result may be a prediction score indicating a probability that the first user made a target behavior for the business object; in another example, the common prediction result may be a two class value (e.g., 1 or 0) indicating that the first user made a target behavior, or did not make a target behavior, on the business object.

In this way, the server may obtain the common prediction result corresponding to the first user, so that the common prediction result is transmitted to the first terminal corresponding to the first user in step S402.

Step S403, the first terminal trains a local model (or called a first local model) of the first terminal based on the local sample and the common prediction result, and obtains corresponding parameter update data.

For ease of understanding, the determination of the initial parameters of the first local model in the first iteration is described. In one embodiment, the initial parameters of the first local model are random values determined based on a stochastic algorithm. In another embodiment, the first local model comprises a prediction layer and a weighting processing layer, wherein the prediction layer is used for processing the local sample to obtain a local prediction result, and the weighting processing layer is used for processing the local prediction result and the public prediction result to obtain a comprehensive prediction result; based on this, in a possible case, the local sample processed by the prediction layer and the common sample processed by the common model have the same feature space, so the prediction layer can be designed to have the same model structure as the common model, and the parameters of the trained common model are used as the initial parameters in the prediction layer.

In another possible scenario, for various reasons, the characteristics collected locally by the user terminal are limited compared to the characteristics collected by the server, and specifically, the server may collect data authorized by the user through multiple channels, for example, collect registration data and payment data of the user through a payment platform, collect transaction data of the user through an e-commerce platform, and collect credit investigation data of the user through a credit investigation platform, and the like, while for the user terminal, the data that can be collected is limited, and often the operating system is different and the data that has the authority to collect is also different. This means that the feature space corresponding to the local sample in each ue is usually a subspace of the complete feature space corresponding to the common sample, in other words, the set of feature items included in the local sample is a subset of the set of feature items included in the common sample; based on this, the prediction layer can be designed as a sub-model of the common model, whose parameters are part of the parameters in the common model. Further, the first terminal may obtain, from the server, a parameter portion of the common model corresponding to the prediction layer of the first local model, thereby initializing the parameter of the prediction layer thereof to the parameter portion.

In addition, in the two possible cases, for the weighting processing layer, the parameters in the weighting processing layer can be initialized randomly or set to initial values designated manually; and, the weighting processing layers of the respective local models have the same network structure.

The source of the initial parameters of the first local model is described above.

For the execution of this step, in one implementation, the first terminal collects a batch of local training samples from a local sample set, then, for any one of the local training samples, uses the sample characteristics and the common prediction result as the input characteristics of the first local model together to obtain a corresponding training prediction result, further determines a training loss based on the training prediction result and the sample label of the local training sample, and determines a training gradient corresponding to the local training sample based on the training loss; further, in an embodiment, an average value of training gradients corresponding to the batch of local training samples is used as the parameter updating data; in another embodiment, based on the training gradient of the batch of local training samples, the first local model is subjected to parameter adjustment, and the model parameters obtained after parameter adjustment are used as the parameter updating data. It should be noted that, similar to the public sample, the sample label of the local sample indicates whether the corresponding user performs the target behavior on the business object.

On the other hand, in one embodiment, the first terminal receives a plurality of common prediction results from the server, the common prediction results correspond to a plurality of common samples of the first user, and accordingly, for any one of the local training samples, a part of the common prediction results before the acquisition time of the local training sample, corresponding to the acquisition time of the common sample, may be selected from the plurality of common prediction results, and the part of the common prediction results and the sample characteristics of the local training sample are input together into the first local model to obtain the training prediction results.

In another aspect, in an embodiment, the first local model includes a prediction layer and a weighting processing layer, and the determining a training prediction result corresponding to any one local training sample may include: and processing the sample characteristics of the local training samples by using the prediction layer to obtain a local prediction result, and processing the local prediction result and the public prediction result by using the weighting processing layer to obtain a comprehensive prediction result serving as a corresponding training prediction result. It should be understood that, in the weighting processing layer, the local prediction result and the public prediction result are subjected to weighted summation processing through the weight parameters for the local prediction result and the public prediction result included in the weighting processing layer, so as to obtain a comprehensive prediction result.

Therefore, a training prediction result can be obtained, and the parameter updating data can be obtained based on the training prediction result and the sample label of the local training sample.

In another embodiment, the idea of meta-learning can be introduced to train multiple local models. Specifically, a first terminal collects a batch of local training samples and a batch of local testing samples from a local sample set; then, training a first local model based on the batch of local training samples to obtain a trained first local model; then, in a specific embodiment, a local test gradient is calculated based on the trained first local model and the batch of local test samples, and is used as the parameter update data, and in another specific embodiment, a model parameter obtained by further performing parameter adjustment on the first local model based on the local test gradient is determined, and is used as the parameter update data. It should be noted that, for the description of calculating the local test gradient, reference may be made to the description of calculating the first test gradient.

Therefore, the first terminal obtains corresponding parameter updating data by training the local model. Then, the first terminal transmits the parameter update data to the server at step S404. Thus, in step S405, the server determines the parameters after the current round of joint update based on the plurality of parameter update data received from the plurality of terminals.

In one embodiment, the prediction layers in different local models are different submodels of a common model, and the weighting processing layer of each local model has the same network structure and parameter items, so that each local model is a submodel of a global model obtained by aggregating a plurality of local models; based on this, the parameter update data sent by the first terminal corresponds to the partial model parameters of the global model, and the server can update the global model through the partial model parameters sent by the plurality of terminals. In this regard, reference may be made to FIG. 5, which illustrates a training scenario for jointly training a plurality of local models, i.e., a plurality of sub-models of a global model.

Further, in a specific embodiment, the parameter update data includes parameter gradients corresponding to the local model, and thus the server determines a plurality of gradient values corresponding to each parameter in the global model based on the plurality of parameter gradients in the plurality of parameter update data, and obtains an aggregated gradient of a corresponding parameter item by aggregating the plurality of gradient values, thereby updating the current parameter of the global model in combination with a preset learning rate. In another specific embodiment, the parameter update data includes updated parameters corresponding to the local model, and thus the server determines a plurality of parameter values corresponding to each parameter in the global model based on the plurality of updated parameters in the plurality of pieces of parameter update data, and updates the current parameter of the global model to the parameter obtained by aggregation by aggregating the plurality of parameter values.

Further, for the aggregation mentioned herein, in one example, it may be averaging, and in another example, it may be weighted summation, and specifically, the weighted summation of the parameter update data may be performed based on the weights preset for the plurality of local models; on the other hand, in an example, the plurality of parameter update data may be filtered first, and then the remaining parameter update data may be aggregated, where the filtering may include: and removing the parameter updating data with smaller reflected parameter variation.

Based on the above, in a specific embodiment, the current parameters of the updated global model may be used as the parameters after the current round of joint update; in another specific embodiment, the parameter portion corresponding to each local model may be determined according to the updated global model parameter, and is used as the parameter after the current round of joint update.

In another embodiment, each local model has the same model structure and parameter terms, and accordingly, the global models corresponding to the plurality of local models also have the same model structure and parameter terms. Therefore, the server can aggregate the plurality of parameter updating data to obtain the current parameters of the updated global model as the parameters after the current round of combined updating.

Therefore, the server can obtain the parameters after the current round of combined update by performing aggregation processing on the plurality of parameter update data sent by the plurality of terminals.

Then, in step S406, the first terminal obtains the parameters after the current round of joint update from the server; in step S407, the first terminal updates the first local model using the parameters updated in the current round of the joint update.

In an embodiment, the first local model is a sub-model of the global model, and in this case, in a specific embodiment, the first terminal may obtain a current parameter of the global model from the server, and update the first local model by using a parameter portion corresponding to the first local model; in another specific embodiment, the first terminal may obtain, from the server, a parameter portion corresponding to the first local model in the global model, and update the first local model with the parameter portion. In another embodiment, the first local model has the same model structure and model parameters as the global model, and in this case, the first terminal may obtain the current parameters of the global model from the server and update the model parameters of the first local model to the current parameters.

From above, the updating of the first local model may be done. By repeatedly executing the steps S401 to S407 for multiple times, multiple joint iterative training of multiple local models can be realized.

To sum up, in the training method of the business prediction system disclosed in the embodiment of the present specification, the public model maintained in the server is trained in the a phase, and the trained public model is used in the B phase to jointly train a plurality of local models deployed in a plurality of terminals, so as to implement the combined use of the public sample in the server and the local sample in the user terminal, and different behavior prediction models are constructed for different users, thereby implementing thousands of faces of the feature level and the model level, effectively improving the accuracy of the prediction system, and simultaneously, since the local data acquired by the terminals are only used locally, the privacy disclosure risk of the local data of the terminals is effectively reduced.

In the above, the training process of the traffic prediction system is introduced. For ease of understanding, the use of the trained traffic prediction system will be briefly described below. In a use scenario, in response to a trigger of a certain user to a predetermined operation associated with a certain business object, a server reads a plurality of public samples collected for the certain user within a predetermined time period (such as the last 1 month), and respectively processes the public samples by using a public model in a business prediction system to obtain a plurality of public scores; and the terminal of the certain user inputs the public scores, the locally acquired operation data of the certain user and the object characteristics of the certain service object into a local model together to obtain a final prediction result, wherein the final prediction result indicates whether the service object is pushed to the certain user or not. Further, in a more specific scenario, the predetermined operation is a payment operation, a certain business object is a payment coupon, and the final prediction result indicates whether to push the payment coupon to the certain user.

Corresponding to the training method of the business prediction system, the embodiment of the specification also discloses a training device. The method comprises the following specific steps:

FIG. 6 shows a schematic diagram of a training apparatus structure of a traffic prediction system including a common model maintained in a server and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users, according to one embodiment; the device is integrated with the server. As shown in fig. 6, the apparatus 600 includes:

a common sample acquiring unit 610 configured to acquire a plurality of common sample sets corresponding to the plurality of users, wherein a label of each common sample indicates whether the corresponding user performs a target behavior on the business object. A common model training unit 620 configured to train the common model using the plurality of common sample sets. A local model training unit 630, configured to perform joint training including multiple iterations on the multiple local models based on the trained common model, where any iteration is performed through the following sub-units: the common prediction subunit 631 is configured to, for any first user, process the corresponding common sample by using the common model to obtain a common prediction result; a result transmitting subunit 632 configured to transmit the common prediction result to the corresponding first terminal; an update receiving subunit 633 configured to receive, from the first terminal, parameter update data of a first local model, the parameter update data being determined based on the common prediction result and a local sample of the first terminal; an aggregation sub-unit 634 configured to determine parameters after the current round of joint update based on a plurality of parameter update data received from the plurality of terminals, for providing to the plurality of terminals.

In one embodiment, the business prediction system further includes a plurality of task models corresponding to the plurality of users, and the training of the common model involves performing a plurality of rounds of iterative training in combination with the plurality of task models, wherein any one round of iterative training is performed by the following sub-units in the common model training unit 620: a model selecting subunit 621 configured to select, from the plurality of task models, a plurality of task models participating in the iterative training of the current round; a test gradient calculation unit 622 configured to determine, for a first task model of any one of the task models, a model parameter thereof as a current parameter of the common model, and determine a batch of training samples and a batch of test samples based on a common sample set of a user corresponding to the first task model; training the first task model based on the batch of training samples, and calculating a first test gradient based on the trained first task model and the batch of test samples; a parameter updating subunit 623 configured to update the current parameters of the common model based on the number of test gradients calculated for the number of task models.

In a specific embodiment, the test gradient calculating unit 622 is configured to train the first task model based on the batch of training samples, specifically including: calculating a first training gradient based on the batch of training samples and a first task model; updating model parameters of the first task model based on the first training gradient and a learning rate set for the first task model; wherein there are two task models among the plurality of task models, the two common sample sets corresponding thereto have different sample numbers, and the task model corresponding to the smaller sample number is set to have a larger learning rate.

In a specific embodiment, the local model training unit 630 further includes: an initial parameter sending subunit 635, configured to send the parameters of the common model to the multiple user terminals, respectively, so that the multiple user terminals initialize the parameters of the local model based on the parameters of the common model, respectively.

In one embodiment, the sample space corresponding to the local sample is a subspace of the feature space corresponding to the common sample, and the local models are sub-models of the global model; wherein the polymerization subunit 634 is specifically configured to: updating a current global model based on the plurality of parameter update data; the local model training unit 630 further includes: an update parameter sending subunit 636, configured to determine, based on the current global model, a first parameter portion corresponding to the first local model, and send the first parameter portion to the first terminal.

In a specific embodiment, the parameter update data includes a parameter gradient portion corresponding to the first local model; wherein the polymerization subunit 634 is specifically configured to: determining a global parameter gradient corresponding to the global model based on a plurality of parameter gradient portions in the plurality of parameter update data; updating the current global model based on the global parameter gradient.

Fig. 7 is a schematic diagram showing a structure of a training apparatus of a traffic prediction system according to another embodiment, the traffic prediction system including a common model maintained in a server and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the training comprises performing a plurality of iterations of joint training on the plurality of local models, and the apparatus is integrated with a first terminal of any one of the plurality of terminals. As shown in fig. 7, the apparatus 700 performs any one of the multiple iterations by including the following:

a result receiving unit 710 configured to receive, from the server, a common prediction result for a first user, the common prediction result being determined by processing a common sample corresponding to the first user using a common model, the common model being trained based on a plurality of common sample sets corresponding to the plurality of users; a determining unit 720, configured to train a first local model based on the local samples in the first terminal and the common prediction result, so as to obtain corresponding parameter update data; a transmitting unit 730 configured to transmit the parameter update data to the server; a receiving unit 740 configured to acquire, from the server, parameters after the round of joint update, which are determined by the server based on a plurality of parameter update data received from the plurality of terminals; a local model updating unit 750 configured to update the first local model by using the current round of jointly updated parameters.

In one embodiment, the local model includes a prediction layer and a weighting processing layer; the determining unit 720 is specifically configured to: processing the local sample by utilizing the prediction layer to obtain a local prediction result; processing the local prediction result and the public prediction result by utilizing the weighting processing layer to obtain a comprehensive prediction result; training the first local model based on the synthetic prediction results and the labels of the local samples.

In one embodiment, the apparatus 700 further comprises: an initialization unit 760 configured to receive parameters of the common model from the server if the any iteration is a first iteration; and initializing parameters of the prediction layer based on the parameters of the common model.

In an embodiment, the determining unit 720 is specifically configured to: determining a batch of training samples and a batch of testing samples to be used in the iterative training based on the local sample set; training the first local model based on the batch of training samples; and calculating a first test gradient based on the trained first local model and the batch of test samples to serve as the parameter updating data.

In one embodiment, the sample space corresponding to the local sample is a subspace of the feature space corresponding to the common sample, and the local models are sub-models of the global model; the receiving unit 740 is specifically configured to: obtaining, from the server, a first parameter portion of a current global model corresponding to the first local model, the current global model being determined by the server based on the plurality of parameter update data; the local model updating unit 750 is specifically configured to: updating the parameters of the local model to the first parameter portion.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 or fig. 3 or fig. 4.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 2 or fig. 3 or fig. 4.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A training method of a business prediction system comprises a public model maintained in a server and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the method is performed by the server and comprises:

acquiring a plurality of public sample sets corresponding to the plurality of users, wherein the label of each public sample indicates whether the corresponding user performs a target behavior on the business object;

training the common model using the plurality of common sample sets;

based on the trained common model, performing joint training including multiple iterations on the multiple local models, wherein any iteration comprises:

aiming at any first user, processing a corresponding public sample by using the public model to obtain a public prediction result, and sending the public prediction result to a corresponding first terminal;

receiving, from the first terminal, parameter update data for a first local model, the parameter update data determined based on the common prediction result and a local sample of the first terminal;

determining parameters after the current round of joint update based on a plurality of parameter update data received from the plurality of terminals for providing to the plurality of terminals.

2. The method of claim 1, wherein the business object is a user interest, the target behavior comprises a verification and cancellation behavior;

or, the business object is a business link, and the target behavior comprises a click behavior;

or, the business object is a commodity, and the target behavior comprises a purchasing behavior.

3. The method of claim 1, wherein the traffic prediction system further comprises a plurality of task models corresponding to the plurality of users, and the training of the common model involves a plurality of iterative training rounds in conjunction with the plurality of task models, wherein any one of the iterative training rounds comprises:

selecting a plurality of task models participating in the iterative training of the current round from the plurality of task models;

determining model parameters of a first task model of the task models as current parameters of the common model, and determining a batch of training samples and a batch of testing samples based on a common sample set of a user corresponding to the first task model; training the first task model based on the batch of training samples, and calculating a first test gradient based on the trained first task model and the batch of test samples;

updating current parameters of the common model based on a number of test gradients calculated for the number of task models.

4. The method of claim 3, wherein training the first task model based on the batch of training samples comprises:

calculating a first training gradient based on the batch of training samples and a first task model;

updating model parameters of the first task model based on the first training gradient and a learning rate set for the first task model;

wherein there are two task models among the plurality of task models, the two common sample sets corresponding thereto have different sample numbers, and the task model corresponding to the smaller sample number is set to have a larger learning rate.

5. The method of claim 1, wherein jointly training the plurality of local models, including a plurality of iterations, based on the trained common model comprises:

and respectively sending the parameters of the common model to the plurality of user terminals, so that the plurality of user terminals initialize the parameters of the local model based on the parameters of the common model.

6. The method of claim 1, wherein the sample space corresponding to the local sample is a subspace of the feature space corresponding to the common sample, and the local models are sub-models of a global model; wherein determining the parameters after the current round of joint update based on the parameter update data received from the terminals comprises:

updating a current global model based on the plurality of parameter update data;

after determining to update the current global model, the method further comprises:

and determining a first parameter part corresponding to the first local model based on the current global model, and sending the first parameter part to the first terminal.

7. The method of claim 6, wherein the parameter update data comprises a parameter gradient portion corresponding to the first local model; wherein updating the current global model based on the plurality of parameter update data comprises:

determining a global parameter gradient corresponding to the global model based on a plurality of parameter gradient portions in the plurality of parameter update data;

updating the current global model based on the global parameter gradient.

8. A training method of a business prediction system comprises a public model maintained in a server and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the method involves performing a plurality of iterations of joint training on the plurality of local models, performed by a first terminal of any of the plurality of terminals, wherein any iteration comprises:

receiving a common prediction result for a first user from the server, wherein the common prediction result is determined by processing a common sample corresponding to the first user by using a common model, and the common model is obtained by training based on a plurality of common sample sets corresponding to the plurality of users;

training a first local model based on the local sample in the first terminal and the public prediction result to obtain corresponding parameter updating data;

sending the parameter updating data to the server;

acquiring parameters after the round of joint update from the server, which are determined by the server based on a plurality of parameter update data received from the plurality of terminals;

and updating the first local model by using the parameters after the current round of joint updating.

9. The method of claim 8, wherein the local model comprises a prediction layer and a weighting layer; wherein training a first local model based on the local samples in the first terminal and the common prediction result comprises:

processing the local sample by utilizing the prediction layer to obtain a local prediction result;

processing the local prediction result and the public prediction result by utilizing the weighting processing layer to obtain a comprehensive prediction result;

training the first local model based on the synthetic prediction results and the labels of the local samples.

10. The method of claim 9, wherein the any iteration is a first iteration; wherein, prior to training the local model, the method further comprises:

receiving parameters of the common model from the server;

initializing parameters of the prediction layer based on parameters of the common model.

11. The method of claim 8, wherein training a first local model based on local samples in the first terminal and the common prediction result for corresponding parameter update data comprises:

determining a batch of training samples and a batch of testing samples to be used in the iterative training based on the local sample set;

training the first local model based on the batch of training samples;

and calculating a first test gradient based on the trained first local model and the batch of test samples to serve as the parameter updating data.

12. The method of claim 8, wherein the sample space corresponding to the local sample is a subspace of the feature space corresponding to the common sample, and the local models are sub-models of a global model; obtaining parameters after the current round of combined updating from the server comprises the following steps:

obtaining, from the server, a first parameter portion of a current global model corresponding to the first local model, the current global model being determined by the server based on the plurality of parameter update data;

wherein, updating the local model by using the parameters after the current round of joint update comprises:

updating the parameters of the local model to the first parameter portion.

13. A training device of a business prediction system comprises a public model maintained in a server and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the apparatus is integrated with the server, and comprises:

a public sample acquiring unit configured to acquire a plurality of public sample sets corresponding to the plurality of users, wherein a label of each public sample indicates whether the corresponding user performs a target behavior on the business object;

a common model training unit configured to train the common model using the plurality of common sample sets;

a local model training unit configured to perform joint training including multiple iterations on the plurality of local models based on the trained common model, wherein any iteration is performed by the following sub-units:

the public prediction subunit is configured to process a corresponding public sample by using the public model aiming at any first user to obtain a public prediction result;

a result transmitting subunit configured to transmit the common prediction result to the corresponding first terminal;

an update receiving subunit configured to receive, from the first terminal, parameter update data of a first local model, the parameter update data being determined based on the common prediction result and a local sample of the first terminal;

an aggregation subunit configured to determine, based on a plurality of parameter update data received from the plurality of terminals, parameters after the current round of joint update for providing to the plurality of terminals.

14. The apparatus of claim 13, wherein the business object is a user interest, the target behavior comprises a verification and cancellation behavior;

15. The apparatus of claim 13, wherein the traffic prediction system further comprises a plurality of task models corresponding to the plurality of users, and the training of the common model involves a plurality of iterative training rounds in conjunction with the plurality of task models, wherein any one of the iterative training rounds is performed by one of the following sub-units in the common model training unit:

the model selection subunit is configured to select a plurality of task models participating in the iterative training of the current round from the plurality of task models;

the test gradient calculation unit is configured to determine model parameters of a first task model in the task models as current parameters of the common model, and determine a batch of training samples and a batch of test samples based on a common sample set of a user corresponding to the first task model; training the first task model based on the batch of training samples, and calculating a first test gradient based on the trained first task model and the batch of test samples;

a parameter updating subunit configured to update the current parameters of the common model based on a number of test gradients calculated for the number of task models.

16. The apparatus of claim 15, wherein the test gradient calculation unit is configured to train the first task model based on the batch of training samples, and specifically comprises:

17. The apparatus of claim 13, wherein the local model training unit further comprises:

an initial parameter sending subunit, configured to send the parameters of the common model to the plurality of user terminals, respectively, so that the plurality of user terminals initialize the parameters of the local model based on the parameters of the common model, respectively.

18. The apparatus of claim 13, wherein the sample space corresponding to the local sample is a subspace of the feature space corresponding to the common sample, and the local models are sub-models of a global model; wherein the polymerization subunit is specifically configured to:

the local training model further comprises:

and the update parameter sending subunit is configured to determine a first parameter part corresponding to the first local model based on the current global model, and send the first parameter part to the first terminal.

19. The apparatus of claim 18, wherein the parameter update data comprises a parameter gradient portion corresponding to the first local model; wherein the polymerization subunit is specifically configured to:

updating the current global model based on the global parameter gradient.

20. A training device of a business prediction system comprises a public model maintained in a server and a plurality of local models deployed in a plurality of terminals corresponding to a plurality of users; the training comprises joint training of multiple iterations of the multiple local models, the apparatus is integrated in a first terminal of any one of the multiple terminals, and the apparatus performs any one of the multiple iterations through the following units included in the apparatus:

a result receiving unit configured to receive, from the server, a common prediction result for a first user, the common prediction result being determined by processing a common sample corresponding to the first user using a common model, the common model being trained based on a plurality of common sample sets corresponding to the plurality of users;

the determining unit is configured to train a first local model based on the local sample in the first terminal and the public prediction result to obtain corresponding parameter updating data;

a transmission unit configured to transmit the parameter update data to the server;

a receiving unit configured to acquire, from the server, parameters after the round of joint update, which are determined by the server based on a plurality of parameter update data received from the plurality of terminals;

and the local model updating unit is configured to update the first local model by using the parameters after the current round of joint updating.

21. The apparatus of claim 20, wherein the local model comprises a prediction layer and a weighting layer; the determining unit is specifically configured to:

22. The apparatus of claim 21, wherein the apparatus further comprises:

an initialization unit configured to receive parameters of the common model from the server if the any iteration is a first iteration; and initializing parameters of the prediction layer based on the parameters of the common model.

23. The apparatus according to claim 20, wherein the determining unit is specifically configured to:

training the first local model based on the batch of training samples;

24. The apparatus of claim 20, wherein the sample space corresponding to the local sample is a subspace of the feature space corresponding to the common sample, and the local models are sub-models of a global model; the receiving unit is specifically configured to:

the local model updating unit is specifically configured to:

updating the parameters of the local model to the first parameter portion.

25. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-12.