CN112766455A - Learning model training method and system - Google Patents
Learning model training method and system Download PDFInfo
- Publication number
- CN112766455A CN112766455A CN202011570866.8A CN202011570866A CN112766455A CN 112766455 A CN112766455 A CN 112766455A CN 202011570866 A CN202011570866 A CN 202011570866A CN 112766455 A CN112766455 A CN 112766455A
- Authority
- CN
- China
- Prior art keywords
- model
- participant
- branch
- host
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000000694 effects Effects 0.000 abstract description 11
- 238000004364 calculation method Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioethics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a learning model training method and a system, wherein the method comprises the following steps: sending, by the distributor, the host-side branching model to the plurality of participant sides; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; and updating the branch model at the host side based on the parameter updating information. The method can give consideration to both data privacy and model training effects in multi-field joint training.
Description
Technical Field
The present invention relates to the field of artificial intelligence, and more particularly, to a learning model training method and system.
Background
In the problem of multi-domain joint training, each participant needs to send local data to the host to train the model. However, in certain cases the local data may be private or sensitive data of the participants themselves, which may be transmitted to the host leading to potential privacy disclosure or compromise. In the prior art, each participant usually takes part in training after local data of each participant is desensitized locally, but desensitization itself grinds the characteristics of the data itself, and then influences the effect of a training model.
Aiming at the problems that in the prior art, the data privacy of multi-field joint training cannot be protected and the training effect is poor, no effective solution is available at present.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a learning model training method and system, which can achieve both data privacy and model training effects in multi-domain joint training.
In view of the above, a first aspect of the embodiments of the present invention provides a learning model training method, including the following steps:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics;
the output characteristics of each participant end are fed back to the host end by the pusher;
performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information;
and updating the branch model at the host side based on the parameter updating information.
In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.
In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.
In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.
In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.
A second aspect of an embodiment of the present invention provides a learning model training system, including:
a processor; and
a memory storing program code executable by the processor, the program code when executed performing the steps of:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics;
the output characteristics of each participant end are fed back to the host end by the pusher;
performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information;
and updating the branch model at the host side based on the parameter updating information.
In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.
In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.
In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.
In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.
The invention has the following beneficial technical effects: according to the learning model training method and system provided by the embodiment of the invention, the distributor sends the branch model of the host end to the plurality of participant ends; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a learning model training method provided by the present invention;
FIG. 2 is a branch flow diagram of a learning model training method provided by the present invention;
fig. 3 is an overall structure diagram of the learning model training method provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the foregoing, a first aspect of the embodiments of the present invention provides an embodiment of a learning model training method that considers both data privacy and model training effects in multi-domain joint training. Fig. 1 is a schematic flow chart of a learning model training method provided by the present invention.
The learning model training method, as shown in fig. 1, includes the following steps:
step S101, a distributor sends a branch model of a host end to a plurality of participant ends;
step S103, inputting local training samples of a plurality of participant terminals into respective branch models to obtain output characteristics;
step S105, the output characteristics of each participant end are fed back to the host end by the pusher;
step S107, the manager executes forward propagation and gradient operation based on the output characteristics to obtain parameter updating information;
step S109, updating the branch model of the host end based on the parameter updating information.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. Embodiments of the computer program may achieve the same or similar effects as any of the preceding method embodiments to which it corresponds.
In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.
In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.
In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.
In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.
The following further illustrates embodiments of the invention in accordance with the specific example shown in fig. 2.
The main server deploys an algorithm model needing to be trained, and then the federal learning distributor distributes the branch network to corresponding participants according to the definition. After receiving the branch model algorithm, the participant inputs the training sample marked in the local server into the branch model, and deduces the output characteristic of the last layer of the branch model, and the output characteristic is pushed to the manager of the main server by the federal learning pusher. After the model manager collects the output features pushed by all participants, the model manager can integrate the output features to complete the forward propagation and gradient calculation of the whole model and update the parameters of the model. After the model is updated, the branch model is distributed to the participants through the federal model distributor again, and the synchronization of the training model is ensured.
Referring to fig. 2 and fig. 3, the federal learning manager is deployed in the main server, and during the joint training, the distributor is called to send each branch model to the participant server, then the manager collects the output feature results of each participant branch model, and the main model completes the gradient update of the model by fusing the feature results. After the updating, the manager calls the distributor to distribute the updated sub-branch model to the participant server to be ready for the next training. The thumbnail of fig. 2 is a small graph at the branching model in fig. 3.
The federated learning manager firstly establishes a deep learning model based on a data structure of a joint participant end and initializes parameters of the deep learning model. The model structure has three branches, and each branch corresponds to a training task required by a participant. The manager invokes the federated learning distributor to distribute the branching model to the partner server for use. After the branch model is distributed to the participants, the manager waits for the collection of the output data of each participant end. After the model manager collects the output features pushed by all participants, the model manager can integrate the output features to complete the forward propagation and gradient calculation of the whole model and update the parameters of the model. The manager then notifies the distributor to redistribute the updated branch models to the various participants to begin the next round of training.
The federated learning distributor exists only at the host server and automatically redistributes the branch models to the participant servers when the models begin to train or after the models are updated. After the manager acquires the output data of all the branch models, the data are fused to complete the forward propagation and reverse gradient calculation of the whole model, and the calculated gradient is used for updating the weight parameters of the whole model. To ensure the branch models are synchronized, the distributor redistributes the updated branch models to the participant servers for the next round of training.
The federated learning pusher only exists in a participant server, and after the participants receive the branch model sent by the distributor, the local training data is input into the branch model, and the model reasoning result is pushed to the manager, and the manager completes gradient calculation and updating of the whole model by using the reasoning data of all participant ends. Therefore, data of the participant end and model training are decoupled from each other, and the safety of the data is guaranteed.
It can be seen from the foregoing embodiments that, in the learning model training method provided in the embodiments of the present invention, the distributor sends the branch model at the host end to the multiple participant ends; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.
It should be particularly noted that, the steps in the embodiments of the learning model training method described above can be mutually intersected, replaced, added, or deleted, and therefore, the learning model training method based on these reasonable permutation and combination transformations shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the described embodiments.
In view of the foregoing, a second aspect of the embodiments of the present invention provides an embodiment of a learning model training system that combines data privacy and model training effects in multi-domain joint training. The system comprises:
a processor; and
a memory storing program code executable by the processor, the program code when executed performing the steps of:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics;
the output characteristics of each participant end are fed back to the host end by the pusher;
performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information;
and updating the branch model at the host side based on the parameter updating information.
In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.
In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.
In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.
In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.
As can be seen from the foregoing embodiments, the system provided in the embodiments of the present invention sends the branch model at the host end to a plurality of participant ends through the distributor; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.
It should be particularly noted that the embodiment of the system described above employs the embodiment of the learning model training method to specifically describe the working process of each module, and those skilled in the art can easily think that these modules are applied to other embodiments of the learning model training method. Of course, since the steps in the embodiment of the learning model training method can be mutually intersected, replaced, added, and deleted, these reasonable permutation and combination transformations should also belong to the scope of the present invention for the system, and should not limit the scope of the present invention to the embodiment.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (10)
1. A learning model training method, comprising performing the steps of:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of the participants into the branch models to obtain output characteristics;
feeding back the output characteristics of each participant end to the host end by a pusher;
performing, by a manager, forward propagation and gradient operations based on the output features to obtain parameter update information;
updating the branch model at the host side based on the parameter update information.
2. The method of claim 1, wherein the distributor and the manager are located at the host side; the pusher is arranged at a plurality of participant ends.
3. The method of claim 1, wherein the training samples local to a plurality of the participant terminals include private data that is not public.
4. The method of claim 1, wherein inputting their local training samples into the respective branch models at a plurality of the participant terminals to obtain output features comprises: and inputting local training samples of the participants into training branches of the branch model at a plurality of participant ends to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as the output characteristic.
5. The method of claim 1, wherein updating the branch model at the host based on the parameter update information comprises: updating parameters in the branch model based on the parameter update information.
6. A learning model training system, comprising:
a processor; and
a memory storing program code executable by the processor, the program code when executed performing the steps of:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of the participants into the branch models to obtain output characteristics;
feeding back the output characteristics of each participant end to the host end by a pusher;
performing, by a manager, forward propagation and gradient operations based on the output features to obtain parameter update information;
updating the branch model at the host side based on the parameter update information.
7. The system of claim 6, wherein the distributor and the manager are disposed at the host side; the pusher is arranged at a plurality of participant ends.
8. The system of claim 6, wherein the training samples local to a plurality of the participant terminals include private data that is not public.
9. The system of claim 6, wherein inputting their local training samples into the respective branch models at the plurality of participant terminals to obtain output features comprises: and inputting local training samples of the participants into training branches of the branch model at a plurality of participant ends to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as the output characteristic.
10. The system of claim 6, wherein updating the branch model at the host based on the parameter update information comprises: updating parameters in the branch model based on the parameter update information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011570866.8A CN112766455A (en) | 2020-12-26 | 2020-12-26 | Learning model training method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011570866.8A CN112766455A (en) | 2020-12-26 | 2020-12-26 | Learning model training method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112766455A true CN112766455A (en) | 2021-05-07 |
Family
ID=75695845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011570866.8A Withdrawn CN112766455A (en) | 2020-12-26 | 2020-12-26 | Learning model training method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112766455A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449329A (en) * | 2021-08-31 | 2021-09-28 | 国网浙江省电力有限公司信息通信分公司 | Energy data fusion calculation method under federal learning scene based on safe sharing |
CN115098885A (en) * | 2022-07-28 | 2022-09-23 | 清华大学 | Data processing method and system and electronic equipment |
-
2020
- 2020-12-26 CN CN202011570866.8A patent/CN112766455A/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449329A (en) * | 2021-08-31 | 2021-09-28 | 国网浙江省电力有限公司信息通信分公司 | Energy data fusion calculation method under federal learning scene based on safe sharing |
CN115098885A (en) * | 2022-07-28 | 2022-09-23 | 清华大学 | Data processing method and system and electronic equipment |
CN115098885B (en) * | 2022-07-28 | 2022-11-04 | 清华大学 | Data processing method and system and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112118565B (en) | Multi-tenant service gray level publishing method, device, computer equipment and storage medium | |
CN110443375B (en) | Method and device for federated learning | |
CN107330522B (en) | Method, device and system for updating deep learning model | |
CN109951547B (en) | Transaction request parallel processing method, device, equipment and medium | |
CN103905508B (en) | Cloud platform application dispositions method and device | |
CN109146490A (en) | block generation method, device and system | |
CN110221872A (en) | Method for page jump, device, electronic equipment and storage medium | |
CN112766455A (en) | Learning model training method and system | |
CN110708358B (en) | Session message processing method, electronic device and computer-readable storage medium | |
CN108712491A (en) | Block chain node, exchange information processing method, terminal device and medium | |
CN111737755A (en) | Joint training method and device for business model | |
CN111797999A (en) | Longitudinal federal modeling optimization method, device, equipment and readable storage medium | |
CN108737105A (en) | Method for retrieving, device, private key equipment and the medium of private key | |
CN110276060A (en) | The method and device of data processing | |
CN108897559A (en) | System and method are realized in a kind of software upgrading under Network Isolation | |
CN109582289A (en) | The processing method of regular flow, system, storage medium and processor in regulation engine | |
CN107231400A (en) | The synchronous method and device of a kind of data | |
CN109492049A (en) | Data processing, block generation and synchronous method for block chain network | |
CN112650812A (en) | Data fragment storage method and device, computer equipment and storage medium | |
CN109150981B (en) | Block chain network networking method, device, equipment and computer readable storage medium | |
CN111008249A (en) | Parallel chain block synchronization method, device and storage medium | |
CN111951112A (en) | Intelligent contract execution method based on block chain, terminal equipment and storage medium | |
CN109104472B (en) | Block chain network networking method, device, equipment and computer readable storage medium | |
CN113361236A (en) | Method and device for editing document | |
CN107968798A (en) | A kind of network management resources label acquisition method, cache synchronization method, apparatus and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210507 |
|
WW01 | Invention patent application withdrawn after publication |