CN112465043A

CN112465043A - Model training method, device and equipment

Info

Publication number: CN112465043A
Application number: CN202011392884.1A
Authority: CN
Inventors: 王健宗; 李泽远; 朱星华
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-03-09
Anticipated expiration: 2040-12-02
Also published as: CN112465043B; WO2022116440A1

Abstract

The invention provides a model training method, a device and equipment, wherein the method comprises the following steps: the method comprises the steps that a first data source obtains a first training set, samples in the first training set are adopted to train a first base model, first model parameters when the first base model converges are obtained, the first model parameters are sent to a server, the first data source obtains aggregation parameters from the server, the aggregation parameters are obtained by the server through aggregation of the first model parameters and j-1 second model parameters, the first data source updates the samples in the first training set according to the aggregation parameters, the updated samples in the first training set are adopted to train the first base model until the first data source obtains T aggregation parameters, and the first data source obtains a final classification model according to the T aggregation parameters. According to the technical scheme, multiple rounds of iterative model training are performed by combining training sets of multiple data sources, the requirement for training based on a large amount of data can be met, and the classification accuracy of the final classification model is improved.

Description

Model training method, device and equipment

Technical Field

The invention relates to the field of data mining, in particular to a model training method, a model training device and model training equipment.

Background

Adaboost is mainly applied to classification as an iterative algorithm, and the basic principle of the Adaboost is to reasonably combine a plurality of weak classifiers to make the weak classifiers become a strong classifier. The classification is widely applied, for example, spam is identified, and the accuracy of the classification model is directly influenced by the size of data volume when the classification is performed.

The traditional Adaboost algorithm is used for carrying out iterative training on a model based on a data set of a single data source, the data volume of the data set of the single data source is limited, the computing power of the Adaboost algorithm cannot meet the requirement of training a classification model based on a large amount of data, and the classification result is inaccurate.

Disclosure of Invention

The embodiment of the invention provides a model training method, a model training device and model training equipment, which can meet the requirement of training a final classification model based on a large amount of data during model training, thereby improving the classification accuracy of the final classification model.

In a first aspect, a model training method is provided, where the method is applied to a communication system including a server and j data sources, where j is an integer greater than or equal to 2, and the method includes:

a first data source obtains a first training set, and trains a first basic model by using samples in the first training set to obtain a first model parameter when the first basic model converges, wherein the first data source is any one of the j data sources;

the first data source sends the first model parameters to the server;

the first data source obtains aggregation parameters from the server, the aggregation parameters are obtained by the server through aggregation according to the first model parameters and j-1 second model parameters, the j-1 second model parameters are from j-1 second data sources except the first data source in the j data sources, one second model parameter is from one second data source, the second model parameters are parameters when a second base model is trained by the second data source by using samples in a second training set, the second base model converges, and the second base model and the first base model are models of the same type;

the first data source updates the samples in the first training set according to the aggregation parameters, and trains the first base model by using the updated samples in the first training set until the first data source obtains T aggregation parameters, wherein T is an integer greater than or equal to 2;

and the first data source obtains a final classification model according to the T aggregation parameters.

With reference to the first aspect, in a possible implementation manner, the updating, by the first data source, the samples in the first training set according to the aggregation parameter includes: the first data source constructs a test base model according to the aggregation parameters; the first data source performs class testing on the samples in the first training set by adopting the testing base model, and determines a prediction class corresponding to the samples in the first training set; the first data source determines a training error corresponding to the aggregation parameter according to an error between a prediction category corresponding to the samples in the first training set and an actual category corresponding to the samples in the first training set; and the first data source updates the samples in the first training set according to the training errors corresponding to the aggregation parameters.

With reference to the first aspect, in a possible implementation manner, the updating, by the first data source, the samples in the first training set according to the training error corresponding to the aggregation parameter includes: calculating a model weight corresponding to the aggregation parameter by a first data source according to a training error corresponding to the aggregation parameter, wherein the model weight is used for representing the importance degree of the aggregation parameter in the final classification model; and the first data source updates the weight value of the current weight of the sample in the first training set according to the model weight corresponding to the aggregation parameter and the first weight value.

With reference to the first aspect, in a possible implementation manner, the first training set includes at least two samples, and one sample corresponds to one first weight value; the class testing of the samples in the first training set is performed by the first data source by using the test base model, and the determining of the prediction class corresponding to the samples in the first training set includes: the first data source respectively carries out category testing on each sample in the at least two samples by adopting the testing base model to obtain a prediction category corresponding to each sample; the determining, by the first data source, a training error corresponding to the aggregation parameter according to an error between a prediction class corresponding to the sample in the first training set and an actual class corresponding to the sample in the first training set includes: for each sample in the at least two samples, if an error exists between a prediction type corresponding to the sample and an actual type corresponding to the sample, determining that the sample is a prediction error sample; the first data source obtains at least one misprediction sample in the at least two samples, and determines the sum of at least one first weight value corresponding to the at least one misprediction sample as a training error corresponding to the aggregation parameter.

With reference to the first aspect, in a possible implementation manner, the obtaining, by the first data source, a final classification model according to the T aggregation parameters includes: the first data source respectively obtains a model weight corresponding to each aggregation parameter in the T aggregation parameters, and the model weight is used for representing the importance degree of the aggregation parameters in the final classification model; the first data source obtains integrated model parameters according to each aggregation parameter in the T aggregation parameters and the model weight corresponding to each aggregation parameter; and the first data source generates a final classification model according to the integrated model parameters.

With reference to the first aspect, in a possible implementation manner, after the obtaining the final classification model, the method further includes: the first data source obtains a test set, the final classification model is adopted to carry out class testing on the samples in the test set, and the prediction classes corresponding to the samples in the test set are determined; the first data source determines the classification accuracy of the final classification model according to the error between the prediction category corresponding to the sample in the test set and the actual category corresponding to the sample in the test set; and if the classification accuracy is greater than a first threshold value, the first data source outputs prompt information, and the prompt information is used for prompting that the final classification model is trained completely.

With reference to the first aspect, in one possible implementation manner, the first base model and the second base model are two-layer neural network models.

In a second aspect, a model training apparatus is provided, where the apparatus is applied to a first data source in a communication system, the communication system includes a server and j data sources, j is an integer greater than or equal to 2, and the model training apparatus includes:

the training module is used for acquiring a first training set, training a first basic model by adopting samples in the first training set and acquiring a first model parameter when the first basic model is converged;

a sending module, configured to send the first model parameter to the server;

a receiving module, configured to obtain aggregation parameters from the server, where the aggregation parameters are obtained by aggregating the server according to the first model parameters and j-1 second model parameters, where the j-1 second model parameters are from j-1 second data sources, except the first data source, of the j data sources, one second model parameter is from one second data source, the second model parameters are parameters when a second base model is converged when the second data source trains the second base model by using samples in a second training set, and the second base model and the first base model are models of the same type;

an updating module, configured to update the samples in the first training set according to the aggregation parameters, and train the first base model by using the updated samples in the first training set until the first data source obtains T aggregation parameters, where T is an integer greater than or equal to 2;

and the aggregation module is used for obtaining a final classification model according to the T aggregation parameters.

With reference to the second aspect, in a possible implementation manner, the updating module is configured to update the samples in the first training set according to the aggregation parameter, and includes: constructing a test base model according to the polymerization parameters; performing class testing on the samples in the first training set by adopting the testing base model, and determining the prediction classes corresponding to the samples in the first training set; and determining a training error corresponding to the aggregation parameter according to an error between the prediction category corresponding to the sample in the first training set and the actual category corresponding to the sample in the first training set, and updating the sample in the first training set according to the training error corresponding to the aggregation parameter.

With reference to the second aspect, in a possible implementation manner, a weight value of a current weight of a sample in the first training set is a first weight value, and the updating module is configured to calculate a model weight corresponding to the aggregation parameter according to a training error corresponding to the aggregation parameter, where the model weight is used to represent an importance degree of the aggregation parameter in the final classification model; and the first data source updates the weight value of the current weight of the sample in the first training set according to the model weight corresponding to the aggregation parameter and the first weight value.

With reference to the second aspect, in a possible implementation manner, the first training set includes at least two samples, one sample corresponds to one first weight value, and the updating module is further configured to perform a class test on each of the at least two samples by using the test base model, so as to obtain a prediction class corresponding to each of the samples. For each sample in the at least two samples, if an error exists between a prediction type corresponding to the sample and an actual type corresponding to the sample, determining that the sample is a prediction error sample; the updating module is further configured to obtain at least one misprediction sample of the at least two samples, and determine a sum of at least one first weight value corresponding to the at least one misprediction sample as a training error corresponding to the aggregation parameter.

With reference to the second aspect, in a possible implementation manner, the aggregation module is configured to obtain a final classification model according to the T aggregation parameters; respectively obtaining a model weight corresponding to each aggregation parameter in the T aggregation parameters, wherein the model weight is used for representing the importance degree of the aggregation parameters in the final classification model; obtaining an integrated model parameter according to each aggregation parameter in the T aggregation parameters and the model weight corresponding to each aggregation parameter; and the aggregation module generates a final classification model according to the integrated model parameters.

With reference to the second aspect, in a possible implementation manner, the apparatus further includes: the test module is used for obtaining a test set, performing class test on the samples in the test set by adopting the final classification model and determining the prediction classes corresponding to the samples in the test set; determining the classification accuracy of the final classification model according to the error between the prediction category corresponding to the sample in the test set and the actual category corresponding to the sample in the test set; and if the classification accuracy is greater than a first threshold value, the first data source outputs prompt information, and the prompt information is used for prompting that the final classification model is trained completely.

With reference to the second aspect, in one possible implementation manner, the first base model and the second base model are two-layer neural network models.

In a third aspect, a model training apparatus is provided, which includes a processor, a memory, and an input/output interface, where the processor, the memory, and the input/output interface are connected to each other, where the input/output interface is used for inputting or outputting data, the memory is used for storing program codes, and the processor is configured to execute the method of the first aspect.

In a fourth aspect, there is provided a computer storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect described above.

In the embodiment of the invention, a first data source trains a first base model by adopting samples in a first training set to obtain a first model parameter, and sends the first model parameter to a server, so that the server gathers the first model parameter and j-1 second model parameters to obtain a gathering parameter, and sends the gathering parameter to the first data source until the first data source obtains T gathering parameters, so that a final classification model is obtained according to the T gathering parameters, and the final classification model is obtained by training data of the first training set of the first data source and data of j-1 second training sets of j-1 second data sources, so that the final classification model can meet the requirement of training based on a large amount of data, and the classification accuracy of the final classification model is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another model training method provided by the embodiment of the invention;

FIG. 3 is a schematic diagram of data parameter interaction in a model training process according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a structure of a model training apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a model training device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

According to the embodiment of the invention, the model is trained to obtain the final classification model, and the final classification model is adopted to classify the object to be classified. The final classification model of the present application can be applicable to various classification scenarios, and optionally, the final classification model of the present application can implement a scenario of classifying emails, for example, the emails can be classified into spam emails and non-spam emails. Alternatively, a scenario to which the final classification model of the present application is applied may also be a scenario in which the communication account is classified, for example, the communication account may be classified into a false account and a non-false account. Alternatively, the final classification model of the present application may classify whether the user is ill or not, and so on. It can be understood that the final classification model of the present application is applicable to different scenes, and correspondingly, training data in the data set are different, for example, in a scene of classifying whether an email is a spam email, the training data in the data set is a word vector corresponding to a text in the email to be classified; in the scene of classifying whether the user is ill, the training data in the data set includes age, gender, eating habits, feature vectors corresponding to the physical examination results, and the like.

The model training method is applicable to a communication system comprising a server and j data sources, wherein j is an integer greater than or equal to 2, the first data source obtains a first training set, and samples in the first training set are adopted to train a first basic model to obtain a first model parameter when the first basic model converges, and the first data source is any one of the j data sources;

the first data source sends the first model parameters to the server;

the method comprises the steps that a first data source obtains aggregation parameters from a server, the aggregation parameters are obtained by the server through aggregation according to first model parameters and j-1 second model parameters, the j-1 second model parameters come from j-1 second data sources except the first data source in the j data sources, one second model parameter comes from one second data source, the second model parameters are parameters when a second base model is trained by the second data source through samples in a second training set, the second base model converges, and the second base model and the first base model are models of the same type;

the first data source updates samples in the first training set according to the aggregation parameters, and trains the first base model by using the updated samples in the first training set until the first data source obtains T aggregation parameters, wherein T is an integer greater than or equal to 2;

Because the aggregation parameters are obtained based on the parameters of the training models in the multiple data sources, the training models of the training sets of the multiple different data sources are aggregated, and the classification model has better performance and higher accuracy.

Referring to fig. 1, fig. 1 is a schematic flow chart of a model training method provided in an embodiment of the present invention, including:

s101, a first data source obtains a first training set, and a first basic model is trained by using samples in the first training set to obtain a first model parameter when the first basic model converges.

Before training the first base model, the first data source firstly processes a mail data set owned by the first data source, and the processing steps are as follows:

firstly, dividing a data set into a training set and a testing set, and ensuring that the data distribution of the training set and the data distribution of the testing set are approximately consistent.

Initializing sample weights in the divided training sets, wherein if the number of samples in the training set of the first data source is m, and m is an integer, the initial weight of each sample in the training set is 1/m, that is, in the first round of training, the initial weight of each sample in the data set is the same, the initial weight is a first weight value in the first round of training, and is the reciprocal of the sum of the samples in the training set, and the initialized training set is called as the first training set.

Deploying a double-layer neural network model on a first data source to serve as a first base model, training the first base model by the first data source based on a first training set, updating parameters of the first base model through a gradient descent and other optimization algorithm, and recording the parameters of the first base model during convergence as first model parameters.

Because the first data source is any one of the j data sources, and the j data sources have different data sets, in order to facilitate the distinction, j-1 data sources of the j data sources except the first data source are marked as second data sources, and a double-layer neural network model having the same network structure as the first base model in the second data sources is marked as a second base model. And the second data source is the same as the first data source, the second data source processes the data set owned by the second data source by adopting the same steps as the first data source to obtain a second training set, the second base model is trained on the basis of the second training set, and the parameters of the second base model during convergence are recorded as second model parameters.

S102, the first data source sends the first model parameters to a server.

And after obtaining the first model parameter, the first data source sends the first model parameter to the server. j-1 second data sources except the first data source in the j data sources send j-1 second model parameters obtained based on training of the second training set to the server.

S103, the first data source obtains aggregation parameters from the server, and the aggregation parameters are obtained by the server through aggregation according to the first model parameters and the j-1 second model parameters.

After the previous round of training, the server receives j model parameters, including a first model parameter from a first data source and a second model parameter from j-1 second data sources, and the server side obtains aggregation parameters by adopting an average aggregation mode in a FedAvg algorithm on the received j model parameters.

After obtaining the aggregation parameter, the server sends the aggregation parameter to the j data sources, and the j data sources receive the aggregation parameter from the server, where the first data source and the j-1 data sources both receive the same aggregation parameter from the server.

And S104, the first data source updates the samples in the first training set according to the aggregation parameters, and trains the first base model by adopting the updated samples in the first training set until the first data source obtains T aggregation parameters, wherein T is an integer greater than or equal to 2.

The method comprises the following specific steps:

the method comprises the following steps that a first data source receives aggregation parameters from a server, and a test base model is built according to the aggregation parameters and a first base model, wherein the mode of building the test base model is as follows: model parameters in the first base model are updated using the aggregated parameters. Since the aggregation parameters have all parameters of the model, and the first base model only provides the network structure in the process of constructing the test base model, the construction method here may be that the aggregation parameters are combined with any neural network model having the same network structure to generate the test base model, such as the neural network model of the first base model at any time in the training process. The process is the same as the process of generating the test base model in the first data source, and J-1 test base models are constructed by J-1 second data sources except the first data source in the J data sources according to the aggregation parameters and the second base models.

It should be understood that the second base model is also a neural network model having the same network structure as the first base model. That is, the j data sources construct the test base model based on the aggregation parameters and a neural network model.

And secondly, the first data source performs class testing on the samples in the first training set by adopting a testing base model to determine a prediction class corresponding to the samples in the first training set, because the training set is provided with at least two samples, each sample corresponds to a first weight value, each sample corresponds to a real class, and for the samples with the real classes and the prediction classes not matched, the testing base model predicts the samples wrongly, accumulates the first weight values of the samples with the wrong predictions, and then obtains a training error of the testing base model on the training set in the first data source, namely a training error corresponding to the aggregation parameter.

Thirdly, after the training error corresponding to the aggregation parameter is obtained, based on the training error, the model weight corresponding to the aggregation parameter can be calculated, the training error represents the prediction accuracy of the model generated based on the aggregation parameter, the model weight is used for representing the importance degree of the aggregation parameter in the final classification model, and obviously, the higher the relative accuracy is, the more important the aggregation parameter is in the final classification model.

And fourthly, updating the weight of each sample in the first training set based on the model weight and the first sample weight in the previous round of training of the first training set, increasing the weight of the sample in the next round of model training for the sample with the wrong prediction in the test base model, and decreasing the weight of the sample in the next round of model training for the sample with the right prediction in the test base model to obtain the first training set with the updated sample weight.

And fifthly, performing next round of model training on the first base model by using the updated samples in the first training set, and repeating the steps of S102-S104 until the first data source obtains T aggregation parameters, wherein T is an integer greater than or equal to 2.

It should be understood that the j second data sources other than the first data source among the j data sources will also obtain T aggregation parameters.

And S105, the first data source obtains a final classification model according to the T aggregation parameters.

And aggregating the T aggregation parameters according to the importance degree of the aggregation parameters in each training round, namely the model weight corresponding to the aggregation parameters to obtain the integrated model parameters. The method comprises the steps of obtaining a first training set, predicting the first training set by using a test base model generated based on different aggregation parameters, obtaining different training errors, calculating different model weights, generating a final classification model based on the integration parameters, and performing classification on the first training set.

It should be understood that, since the training sets in the j data sources are different, the training effect of the test base model generated according to each round of aggregation parameters is different, the model weight of each data source for the same aggregation parameter is different, and finally the j data sources generate different final classification models suitable for the data sets of the data sources.

Optionally, step S106 is further included after steps S101-S105.

And S106, performing class test on the samples in the test set by adopting the final classification model.

The first data source performs class testing on the test set in step S101 by using the adaptive final classification model generated by the first data source, and obtains a training error of the final classification model, that is, a classification accuracy.

For the final classification model, if the classification accuracy is greater than the first threshold, the first data source outputs prompt information for prompting that the training of the final classification model is completed, and it should be understood that the first threshold may be selected as needed and may be any value between 50% and 100%.

Each data source in the j data sources can generate a self-adaptive final classification model, so that each data source in the j data sources can use the final classification model to perform class testing on a test set owned by the data source to obtain model accuracy, different data sources can set different first threshold values according to own needs, and prompt information is output to prompt that the training of the final classification model is completed under the condition that the classification models of the j data sources are all larger than the corresponding first threshold values.

First, in each round of model training, an aggregation parameter is generated by combining data of a plurality of data sources, the data volume of the plurality of data sources is greater than that of a single data source, and the higher the data volume of the model training is, the higher the accuracy of the obtained model is; meanwhile, original data are not transmitted among all data sources, so that the safety of local data is well guaranteed; because the test base models generated by the same aggregation parameters have different training effects on different data sources, the model weights corresponding to each aggregation parameter are different, and finally, the final classification models suitable for different data sources can be generated under the condition of being supported by a large amount of data; in addition, the training model suitable for the model training of the data source in the local is simple and easy to implement, and the requirement on the computing capacity of each data source is low, so that the threshold for participating in the joint training is not high, and the realization is easy; in addition, the first threshold set in the test stage can be preset differently according to needs, so that different requirements of each data source are fully met.

The above process is mainly explained with respect to the whole model training process from the perspective of the first data source, and the whole process will be explained with respect to the interaction between the j data sources and the server, referring to fig. 2, where fig. 2 is a schematic flow chart of another model training method according to an embodiment of the present invention. Further, taking the recognition task of spam as an example, j data sources participating in model training have different mail data sets respectively, including:

s201, data preparation and model deployment

J data sources respectively have a mail data set S _k1,2 …, the number of samples per dataset is | s_k|＝_kFor convenience of explanation, k is used to denote the number of the data source, and any one of the j data sources, i.e., the first data source, is denoted. Dividing the data set into training sets by each data source according to the self-defined proportion

And test set

The data distribution of the training set and the test set is approximately consistent, and the sample numbers of the training set and the test set are assumed to be respectively

And

then

Where 0 is a non-spam label and 1 is a spam label.

Referring to fig. 3, fig. 3 is a schematic diagram of data parameter interaction in the model training method, which provides a data parameter interaction process of the data source and the server in the process from step S201 to step S205.

Each data source has a double-layer neural network model, the input neuron of the neural network is D, the number of the output neurons is 1, and the number of the hidden layer neurons is L, so that the model is the first base model in the embodiment of the invention. The output layer activation function of the double-layer neural network model is sigmoid, and the activation function is mainly used for mapping an output value to a (0,1) interval. The sigmoid function is an activation function commonly used in neural networks, and is defined by the following formula:

it should be understood that, in the embodiment of the present invention, the base models used in each training round are not required to be consistent, and for convenience of description, the present solution sets the base models in one training round as a two-layer neural network model.

Before model training, each participant initializes the distribution of the respective training set, in the embodiment of the invention, the data distribution of the training set is specifically parameterized into the weight of each training sample, and in order to realize the goal of generating the personalized integrated model by different data sources, the data distribution of each data source is initialized respectively, namely for the kth data source, the data distribution of each data source is initialized respectively

The weights of the training samples are initialized to:

in the above equation, the label 1 in the upper right corner indicates that the model performs the 1 st round of training, indicating that while performing the first round of training,

the weights of the training samples are the same

The sample weight is a first weight value corresponding to the sample, and step S201 is equivalent to the step of processing the mail data set owned by the first data source and deploying the double-layer neural network model in step S101.

And S202, performing model training on the j data sources based on training sets owned by the respective data sources.

Training a double-layer neural network base model by each data source based on a training set owned by the data source, and recording the base model of the kth data source in the t-th training as

The loss function is:

wherein

An ith sample representing a kth training set of data sources;

by optimization algorithms such as gradient descent method, etc

Until the lossThe value of the function is minimal, and the model parameters are recorded as

Model parameters at this time

I.e. the first model parameters, refer to the process of training the first base model in step S101.

And S203, the j data sources send the model parameters obtained by training to a server.

As shown in fig. 3, after the j data sources train the base model based on the loss function in step S202, the parameters of the base model are calculated

And transmitting to the central server, as detailed in step S102.

S204, the server aggregates the received basic model parameters and sends the basic model parameters to j data sources.

Server pair base model parameters from each data source

Carrying out polymerization, and obtaining a polymerization parameter G of the t-th round of training by adopting an average polymerization mode in a FedAvg algorithm^tThe corresponding parameters are:

as shown in fig. 3, the server aggregates the aggregated parameter G^tTo j data sources. See step S103 for details.

S205, j data sources are based on the received aggregation parameter G^tThe data distribution of the training set is adjusted.

For each data source, using the received aggregation parameter G^tAnd generating a test base model from the two-layer neural network model deployed in step S201, wherein the two-layer neural network model only provides a network structure, so that the test is performedThe generation mode of the base model can be as follows: polymerization parameter G^tAnd generating a test base model by combining any neural network model with the same network structure as the double-layer neural network model. Each data source adopts a test base model to predict a training set, and according to the method in Adaboost, the data distribution of the training set is adjusted by using a training error, and the method mainly comprises the following 3 steps:

calculating the prediction error of the test base model on a training set owned by a data source, wherein for a kth data source, the training error after the t-th round of training is as follows:

wherein 1 is an indicative function, th is a preset classification threshold, and the value range of the classification threshold is greater than 0 and less than 1;

for the inner layer sexual function: 1{ G }^t(x_ki) H, when the model outputs a result G^t(x_ki) If the prediction type is more than th, the prediction type is 1 (namely, the spam is judged), otherwise, the prediction type is 0 (namely, the non-spam is judged).

For the outer illustrative function: 1{1{ G^t(x_ki)＞th}≠y_kiWhen the prediction type does not accord with the real type, the type of the sample is predicted wrongly; when the predicted class matches the true class, the class prediction for the sample is correct.

And summing the weights of the training samples with the prediction errors to obtain the training error of the test base model on the data source.

Secondly, calculating the polymerization parameter G corresponding to the tth test base model^tThe degree of importance in the final integrated classification model, i.e., the model weight corresponding to the aggregation parameter.

Model weight coefficients

Indicates G obtained by the training round^tThe importance in the final integration model is that,

is composed of

The error in the training is the error in the training,

in order to be accurate in the sense of accuracy,

for relative accuracy, it is obvious that the higher the relative accuracy of the test base model is, the greater the model weight corresponding to the aggregation parameter is.

And thirdly, updating the sample weight of the training set, namely updating a first weight value corresponding to the sample:

wherein:

according to the formula, for the sample with the prediction error of the test base model in the previous training, the weight corresponding to the sample is increased; and for the sample with the correct prediction of the test base model in the previous training round, the weight corresponding to the sample is reduced.

After the first weight values corresponding to the samples are updated, the next round of performing step S202-step S205 is started based on the training set of the first weight values corresponding to the updated samples, that is, the next round of model training is started until T aggregation parameters are obtained, which can be referred to in detail in step S104.

S206, model integration

T aggregation parameters can be obtained through T rounds of training, and each aggregation parameter is obtained through calculation of each data source in each round of training process and is finally integratedModel weights in a classification model of

For the kth data source, the final integrated model parameters are:

combining the model parameters with a neural network model to generate a final classification model,

for a detailed description, refer to step S105.

And S207, performing class test on the test set divided in the step S201 by using the final classification model.

In this embodiment, for a specific description of step S207, refer to step S106, which is not described herein again.

The method of the embodiments of the present invention is described above, and the apparatus of the embodiments of the present invention is described below.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a model training apparatus provided in an embodiment of the present invention, where the apparatus 40 includes:

a training module 401, configured to obtain a first training set, train a first base model by using samples in the first training set, and obtain a first model parameter when the first base model converges;

a sending module 402, configured to send the first model parameter to the server;

a receiving module 403, configured to obtain aggregation parameters from the server, where the aggregation parameters are obtained by aggregating the server according to the first model parameters and j-1 second model parameters, where the j-1 second model parameters are from j-1 second data sources, except the first data source, of the j data sources, one second model parameter is from one second data source, the second model parameters are parameters when a second base model is converged when the second data source trains the second base model by using samples in a second training set, and the second base model and the first base model are models of the same type;

an updating module 404, configured to update the samples in the first training set according to the aggregation parameters, and train the first base model by using the updated samples in the first training set until the first data source obtains T aggregation parameters, where T is an integer greater than or equal to 2;

and an aggregation module 405, configured to obtain a final classification model according to the T aggregation parameters.

With reference to the second aspect, in a possible implementation manner, the updating module 404 is configured to update the samples in the first training set according to the aggregation parameter, and includes: constructing a test base model according to the polymerization parameters; performing class testing on the samples in the first training set by adopting the testing base model, and determining the prediction classes corresponding to the samples in the first training set; and determining a training error corresponding to the aggregation parameter according to an error between the prediction category corresponding to the sample in the first training set and the actual category corresponding to the sample in the first training set, and updating the sample in the first training set according to the training error corresponding to the aggregation parameter.

In a possible design, a weight value of a current weight of a sample in the first training set is a first weight value, and the updating module 404 is configured to calculate a model weight corresponding to the aggregation parameter according to a training error corresponding to the aggregation parameter, where the model weight is used to represent an importance degree of the aggregation parameter in the final classification model; and the first data source updates the weight value of the current weight of the sample in the first training set according to the model weight corresponding to the aggregation parameter and the first weight value.

In a possible design, the first training set includes at least two samples, one sample corresponds to one first weight value, and the updating module 404 is further configured to perform a category test on each sample of the at least two samples by using the test base model, so as to obtain a prediction category corresponding to each sample. For each sample in the at least two samples, if an error exists between a prediction type corresponding to the sample and an actual type corresponding to the sample, determining that the sample is a prediction error sample; the updating module 404 is further configured to obtain at least one misprediction sample of the at least two samples, and determine a sum of at least one first weight value corresponding to the at least one misprediction sample as a training error corresponding to the aggregation parameter.

In one possible design, the aggregation module 405 is configured to obtain a final classification model according to the T aggregation parameters; respectively obtaining a model weight corresponding to each aggregation parameter in the T aggregation parameters, wherein the model weight is used for representing the importance degree of the aggregation parameters in the final classification model; obtaining an integrated model parameter according to each aggregation parameter in the T aggregation parameters and the model weight corresponding to each aggregation parameter; the aggregation module 405 generates a final classification model based on the integrated model parameters.

In one possible design, the apparatus further includes: the testing module 406 is configured to obtain a test set, perform category testing on the samples in the test set by using the final classification model, and determine a prediction category corresponding to the samples in the test set; determining the classification accuracy of the final classification model according to the error between the prediction category corresponding to the sample in the test set and the actual category corresponding to the sample in the test set; and if the classification accuracy is greater than a first threshold value, the first data source outputs prompt information, and the prompt information is used for prompting that the final classification model is trained completely.

In one possible design, the first base model and the second base model are two-layer neural network models.

It should be noted that, for the content that is not mentioned in the embodiment corresponding to fig. 4, reference may be made to the description of the method embodiment, and details are not described here again.

In the embodiment of the invention, a first training set is obtained, samples in the first training set are adopted to train a first base model, a first model parameter when the first base model converges is obtained, the first model parameter is sent to a server, an aggregation parameter is received from the server, the aggregation parameter is obtained by the server according to the first model parameter and j-1 second model parameters, the samples in the first training set are updated according to the aggregation parameter, the updated samples in the first training set are adopted to carry out on the first base model until T aggregation parameters are obtained, and then a final classification model is obtained according to the T aggregation parameters. In the process of each round of model training, an aggregation parameter is generated by combining data of a plurality of data sources, so that the data volume is large, and the obtained classification result of the model is more accurate; meanwhile, the test base model generated based on the same aggregation parameter has different training effects in different data sources, so that the importance degree of each data source to the same aggregation parameter in the final model parameter is different, and finally, a personalized final classification model suitable for each data source can be generated to meet the requirements of different data sources.

Referring to fig. 5, fig. 5 is a schematic diagram of a structure of a model training apparatus according to an embodiment of the present invention, where the apparatus 50 includes a processor 501, a memory 502, and an input/output interface 503. The processor 501 is connected to the memory 502 and the input/output interface 503, for example, the processor 501 may be connected to the memory 502 and the input/output interface 503 through a bus.

The processor 501 is configured to support the model training apparatus to perform corresponding functions in the model training methods described in fig. 1-3. The processor 501 may be a Central Processing Unit (CPU), a Network Processor (NP), a hardware chip, or any combination thereof. The hardware chip may be an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The memory 502 is used to store program codes and the like. Memory 502 may include Volatile Memory (VM), such as Random Access Memory (RAM); the memory 502 may also include a non-volatile memory (NVM), such as a read-only memory (ROM), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); the memory 502 may also comprise a combination of memories of the kind described above.

The input/output interface 503 is used for inputting or outputting data.

The processor 501 may call the program code to perform the following operations:

acquiring a first training set, and training a first basic model by using samples in the first training set to obtain a first model parameter when the first basic model is converged;

sending the first model parameters to the server;

acquiring aggregation parameters from a server, wherein the aggregation parameters are obtained by the server through aggregation according to the first model parameters and the j-1 second model parameters;

updating samples in the first training set according to the aggregation parameters, and training the first base model by using the updated samples in the first training set until the first data source obtains T aggregation parameters, wherein T is an integer greater than or equal to 2;

and obtaining a final classification model according to the T aggregation parameters.

It should be noted that, the implementation of each operation may also correspond to the corresponding description with reference to the above method embodiment; the processor 501 may also cooperate with the input-output interface 503 to perform other operations in the above-described method embodiments.

Embodiments of the present invention also provide a computer storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method according to the aforementioned embodiments, the computer may be part of the aforementioned model training apparatus. Such as processor 501 described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A method for model training, the method being applicable to a communication system comprising a server and j data sources, j being an integer greater than or equal to 2, the method comprising:

the first data source sends the first model parameters to the server;

2. The method of claim 1, wherein the first data source updating the samples in the first training set according to the aggregation parameter comprises:

the first data source constructs a test base model according to the aggregation parameters;

the first data source performs class testing on the samples in the first training set by adopting the testing base model, and determines a prediction class corresponding to the samples in the first training set;

the first data source determines a training error corresponding to the aggregation parameter according to an error between a prediction category corresponding to the samples in the first training set and an actual category corresponding to the samples in the first training set;

and the first data source updates the samples in the first training set according to the training errors corresponding to the aggregation parameters.

3. The method of claim 2, wherein the weighted value of the current weight of the samples in the first training set is a first weighted value, and the first data source updates the samples in the first training set according to the training error corresponding to the aggregation parameter, including:

calculating a model weight corresponding to the aggregation parameter by a first data source according to a training error corresponding to the aggregation parameter, wherein the model weight is used for representing the importance degree of the aggregation parameter in the final classification model;

and the first data source updates the weight value of the current weight of the sample in the first training set according to the model weight corresponding to the aggregation parameter and the first weight value.

4. The method of claim 2 or 3, wherein the first training set comprises at least two samples, one sample corresponding to a first weight value;

the class testing of the samples in the first training set is performed by the first data source by using the test base model, and the determining of the prediction class corresponding to the samples in the first training set includes:

the first data source respectively carries out category testing on each sample in the at least two samples by adopting the testing base model to obtain a prediction category corresponding to each sample;

the determining, by the first data source, a training error corresponding to the aggregation parameter according to an error between a prediction class corresponding to the sample in the first training set and an actual class corresponding to the sample in the first training set includes:

for each sample in the at least two samples, if an error exists between a prediction type corresponding to the sample and an actual type corresponding to the sample, determining that the sample is a prediction error sample;

the first data source obtains at least one misprediction sample in the at least two samples, and determines the sum of at least one first weight value corresponding to the at least one misprediction sample as a training error corresponding to the aggregation parameter.

5. The method of claim 1, wherein the first data source obtains a final classification model based on the T aggregation parameters, comprising:

the first data source respectively obtains a model weight corresponding to each aggregation parameter in the T aggregation parameters, and the model weight is used for representing the importance degree of the aggregation parameters in the final classification model;

the first data source obtains integrated model parameters according to each aggregation parameter in the T aggregation parameters and the model weight corresponding to each aggregation parameter;

and the first data source generates a final classification model according to the integrated model parameters.

6. The method of claim 1, wherein after obtaining the final classification model, further comprising:

the first data source obtains a test set, the final classification model is adopted to carry out class testing on the samples in the test set, and the prediction classes corresponding to the samples in the test set are determined;

the first data source determines the classification accuracy of the final classification model according to the error between the prediction category corresponding to the sample in the test set and the actual category corresponding to the sample in the test set;

and if the classification accuracy is greater than a first threshold value, the first data source outputs prompt information, and the prompt information is used for prompting that the final classification model is trained completely.

7. The method of claim 1, in which the first base model and the second base model are two-layer neural network models.

8. A model training apparatus, applied to a first data source in a communication system, the communication system including a server and j data sources, j being an integer greater than or equal to 2, the model training apparatus comprising:

a sending module, configured to send the first model parameter to the server;

9. A model training device, comprising a processor, a memory and an input-output interface, wherein the processor, the memory and the input-output interface are connected with each other, wherein the input-output interface is used for inputting or outputting data, the memory is used for storing program codes, and the processor is used for calling the program codes and executing the method according to any one of claims 1-7.

10. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any one of claims 1-7.