CN112766455A

CN112766455A - Learning model training method and system

Info

Publication number: CN112766455A
Application number: CN202011570866.8A
Authority: CN
Inventors: 徐亚鹏; 秦凯新; 刘黎; 陈天石; 王小珂
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-12-26
Filing date: 2020-12-26
Publication date: 2021-05-07

Abstract

The invention discloses a learning model training method and a system, wherein the method comprises the following steps: sending, by the distributor, the host-side branching model to the plurality of participant sides; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; and updating the branch model at the host side based on the parameter updating information. The method can give consideration to both data privacy and model training effects in multi-field joint training.

Description

Learning model training method and system

Technical Field

The present invention relates to the field of artificial intelligence, and more particularly, to a learning model training method and system.

Background

In the problem of multi-domain joint training, each participant needs to send local data to the host to train the model. However, in certain cases the local data may be private or sensitive data of the participants themselves, which may be transmitted to the host leading to potential privacy disclosure or compromise. In the prior art, each participant usually takes part in training after local data of each participant is desensitized locally, but desensitization itself grinds the characteristics of the data itself, and then influences the effect of a training model.

Aiming at the problems that in the prior art, the data privacy of multi-field joint training cannot be protected and the training effect is poor, no effective solution is available at present.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a learning model training method and system, which can achieve both data privacy and model training effects in multi-domain joint training.

In view of the above, a first aspect of the embodiments of the present invention provides a learning model training method, including the following steps:

sending, by the distributor, the host-side branching model to the plurality of participant sides;

inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics;

the output characteristics of each participant end are fed back to the host end by the pusher;

performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information;

and updating the branch model at the host side based on the parameter updating information.

In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.

In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.

In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.

In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.

A second aspect of an embodiment of the present invention provides a learning model training system, including:

a processor; and

a memory storing program code executable by the processor, the program code when executed performing the steps of:

The invention has the following beneficial technical effects: according to the learning model training method and system provided by the embodiment of the invention, the distributor sends the branch model of the host end to the plurality of participant ends; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a learning model training method provided by the present invention;

FIG. 2 is a branch flow diagram of a learning model training method provided by the present invention;

fig. 3 is an overall structure diagram of the learning model training method provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In view of the foregoing, a first aspect of the embodiments of the present invention provides an embodiment of a learning model training method that considers both data privacy and model training effects in multi-domain joint training. Fig. 1 is a schematic flow chart of a learning model training method provided by the present invention.

The learning model training method, as shown in fig. 1, includes the following steps:

step S101, a distributor sends a branch model of a host end to a plurality of participant ends;

step S103, inputting local training samples of a plurality of participant terminals into respective branch models to obtain output characteristics;

step S105, the output characteristics of each participant end are fed back to the host end by the pusher;

step S107, the manager executes forward propagation and gradient operation based on the output characteristics to obtain parameter updating information;

step S109, updating the branch model of the host end based on the parameter updating information.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. Embodiments of the computer program may achieve the same or similar effects as any of the preceding method embodiments to which it corresponds.

The following further illustrates embodiments of the invention in accordance with the specific example shown in fig. 2.

The main server deploys an algorithm model needing to be trained, and then the federal learning distributor distributes the branch network to corresponding participants according to the definition. After receiving the branch model algorithm, the participant inputs the training sample marked in the local server into the branch model, and deduces the output characteristic of the last layer of the branch model, and the output characteristic is pushed to the manager of the main server by the federal learning pusher. After the model manager collects the output features pushed by all participants, the model manager can integrate the output features to complete the forward propagation and gradient calculation of the whole model and update the parameters of the model. After the model is updated, the branch model is distributed to the participants through the federal model distributor again, and the synchronization of the training model is ensured.

Referring to fig. 2 and fig. 3, the federal learning manager is deployed in the main server, and during the joint training, the distributor is called to send each branch model to the participant server, then the manager collects the output feature results of each participant branch model, and the main model completes the gradient update of the model by fusing the feature results. After the updating, the manager calls the distributor to distribute the updated sub-branch model to the participant server to be ready for the next training. The thumbnail of fig. 2 is a small graph at the branching model in fig. 3.

The federated learning manager firstly establishes a deep learning model based on a data structure of a joint participant end and initializes parameters of the deep learning model. The model structure has three branches, and each branch corresponds to a training task required by a participant. The manager invokes the federated learning distributor to distribute the branching model to the partner server for use. After the branch model is distributed to the participants, the manager waits for the collection of the output data of each participant end. After the model manager collects the output features pushed by all participants, the model manager can integrate the output features to complete the forward propagation and gradient calculation of the whole model and update the parameters of the model. The manager then notifies the distributor to redistribute the updated branch models to the various participants to begin the next round of training.

The federated learning distributor exists only at the host server and automatically redistributes the branch models to the participant servers when the models begin to train or after the models are updated. After the manager acquires the output data of all the branch models, the data are fused to complete the forward propagation and reverse gradient calculation of the whole model, and the calculated gradient is used for updating the weight parameters of the whole model. To ensure the branch models are synchronized, the distributor redistributes the updated branch models to the participant servers for the next round of training.

The federated learning pusher only exists in a participant server, and after the participants receive the branch model sent by the distributor, the local training data is input into the branch model, and the model reasoning result is pushed to the manager, and the manager completes gradient calculation and updating of the whole model by using the reasoning data of all participant ends. Therefore, data of the participant end and model training are decoupled from each other, and the safety of the data is guaranteed.

It can be seen from the foregoing embodiments that, in the learning model training method provided in the embodiments of the present invention, the distributor sends the branch model at the host end to the multiple participant ends; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.

It should be particularly noted that, the steps in the embodiments of the learning model training method described above can be mutually intersected, replaced, added, or deleted, and therefore, the learning model training method based on these reasonable permutation and combination transformations shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the described embodiments.

In view of the foregoing, a second aspect of the embodiments of the present invention provides an embodiment of a learning model training system that combines data privacy and model training effects in multi-domain joint training. The system comprises:

a processor; and

As can be seen from the foregoing embodiments, the system provided in the embodiments of the present invention sends the branch model at the host end to a plurality of participant ends through the distributor; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.

It should be particularly noted that the embodiment of the system described above employs the embodiment of the learning model training method to specifically describe the working process of each module, and those skilled in the art can easily think that these modules are applied to other embodiments of the learning model training method. Of course, since the steps in the embodiment of the learning model training method can be mutually intersected, replaced, added, and deleted, these reasonable permutation and combination transformations should also belong to the scope of the present invention for the system, and should not limit the scope of the present invention to the embodiment.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A learning model training method, comprising performing the steps of:

inputting local training samples of the participants into the branch models to obtain output characteristics;

feeding back the output characteristics of each participant end to the host end by a pusher;

performing, by a manager, forward propagation and gradient operations based on the output features to obtain parameter update information;

updating the branch model at the host side based on the parameter update information.

2. The method of claim 1, wherein the distributor and the manager are located at the host side; the pusher is arranged at a plurality of participant ends.

3. The method of claim 1, wherein the training samples local to a plurality of the participant terminals include private data that is not public.

4. The method of claim 1, wherein inputting their local training samples into the respective branch models at a plurality of the participant terminals to obtain output features comprises: and inputting local training samples of the participants into training branches of the branch model at a plurality of participant ends to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as the output characteristic.

5. The method of claim 1, wherein updating the branch model at the host based on the parameter update information comprises: updating parameters in the branch model based on the parameter update information.

6. A learning model training system, comprising:

a processor; and

7. The system of claim 6, wherein the distributor and the manager are disposed at the host side; the pusher is arranged at a plurality of participant ends.

8. The system of claim 6, wherein the training samples local to a plurality of the participant terminals include private data that is not public.

9. The system of claim 6, wherein inputting their local training samples into the respective branch models at the plurality of participant terminals to obtain output features comprises: and inputting local training samples of the participants into training branches of the branch model at a plurality of participant ends to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as the output characteristic.

10. The system of claim 6, wherein updating the branch model at the host based on the parameter update information comprises: updating parameters in the branch model based on the parameter update information.