CN112766455A - Learning model training method and system - Google Patents

Learning model training method and system Download PDF

Info

Publication number
CN112766455A
CN112766455A CN202011570866.8A CN202011570866A CN112766455A CN 112766455 A CN112766455 A CN 112766455A CN 202011570866 A CN202011570866 A CN 202011570866A CN 112766455 A CN112766455 A CN 112766455A
Authority
CN
China
Prior art keywords
model
participant
branch
host
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011570866.8A
Other languages
Chinese (zh)
Inventor
徐亚鹏
秦凯新
刘黎
陈天石
王小珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202011570866.8A priority Critical patent/CN112766455A/en
Publication of CN112766455A publication Critical patent/CN112766455A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a learning model training method and a system, wherein the method comprises the following steps: sending, by the distributor, the host-side branching model to the plurality of participant sides; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; and updating the branch model at the host side based on the parameter updating information. The method can give consideration to both data privacy and model training effects in multi-field joint training.

Description

Learning model training method and system
Technical Field
The present invention relates to the field of artificial intelligence, and more particularly, to a learning model training method and system.
Background
In the problem of multi-domain joint training, each participant needs to send local data to the host to train the model. However, in certain cases the local data may be private or sensitive data of the participants themselves, which may be transmitted to the host leading to potential privacy disclosure or compromise. In the prior art, each participant usually takes part in training after local data of each participant is desensitized locally, but desensitization itself grinds the characteristics of the data itself, and then influences the effect of a training model.
Aiming at the problems that in the prior art, the data privacy of multi-field joint training cannot be protected and the training effect is poor, no effective solution is available at present.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a learning model training method and system, which can achieve both data privacy and model training effects in multi-domain joint training.
In view of the above, a first aspect of the embodiments of the present invention provides a learning model training method, including the following steps:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics;
the output characteristics of each participant end are fed back to the host end by the pusher;
performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information;
and updating the branch model at the host side based on the parameter updating information.
In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.
In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.
In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.
In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.
A second aspect of an embodiment of the present invention provides a learning model training system, including:
a processor; and
a memory storing program code executable by the processor, the program code when executed performing the steps of:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics;
the output characteristics of each participant end are fed back to the host end by the pusher;
performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information;
and updating the branch model at the host side based on the parameter updating information.
In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.
In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.
In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.
In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.
The invention has the following beneficial technical effects: according to the learning model training method and system provided by the embodiment of the invention, the distributor sends the branch model of the host end to the plurality of participant ends; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a learning model training method provided by the present invention;
FIG. 2 is a branch flow diagram of a learning model training method provided by the present invention;
fig. 3 is an overall structure diagram of the learning model training method provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the foregoing, a first aspect of the embodiments of the present invention provides an embodiment of a learning model training method that considers both data privacy and model training effects in multi-domain joint training. Fig. 1 is a schematic flow chart of a learning model training method provided by the present invention.
The learning model training method, as shown in fig. 1, includes the following steps:
step S101, a distributor sends a branch model of a host end to a plurality of participant ends;
step S103, inputting local training samples of a plurality of participant terminals into respective branch models to obtain output characteristics;
step S105, the output characteristics of each participant end are fed back to the host end by the pusher;
step S107, the manager executes forward propagation and gradient operation based on the output characteristics to obtain parameter updating information;
step S109, updating the branch model of the host end based on the parameter updating information.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. Embodiments of the computer program may achieve the same or similar effects as any of the preceding method embodiments to which it corresponds.
In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.
In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.
In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.
In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.
The following further illustrates embodiments of the invention in accordance with the specific example shown in fig. 2.
The main server deploys an algorithm model needing to be trained, and then the federal learning distributor distributes the branch network to corresponding participants according to the definition. After receiving the branch model algorithm, the participant inputs the training sample marked in the local server into the branch model, and deduces the output characteristic of the last layer of the branch model, and the output characteristic is pushed to the manager of the main server by the federal learning pusher. After the model manager collects the output features pushed by all participants, the model manager can integrate the output features to complete the forward propagation and gradient calculation of the whole model and update the parameters of the model. After the model is updated, the branch model is distributed to the participants through the federal model distributor again, and the synchronization of the training model is ensured.
Referring to fig. 2 and fig. 3, the federal learning manager is deployed in the main server, and during the joint training, the distributor is called to send each branch model to the participant server, then the manager collects the output feature results of each participant branch model, and the main model completes the gradient update of the model by fusing the feature results. After the updating, the manager calls the distributor to distribute the updated sub-branch model to the participant server to be ready for the next training. The thumbnail of fig. 2 is a small graph at the branching model in fig. 3.
The federated learning manager firstly establishes a deep learning model based on a data structure of a joint participant end and initializes parameters of the deep learning model. The model structure has three branches, and each branch corresponds to a training task required by a participant. The manager invokes the federated learning distributor to distribute the branching model to the partner server for use. After the branch model is distributed to the participants, the manager waits for the collection of the output data of each participant end. After the model manager collects the output features pushed by all participants, the model manager can integrate the output features to complete the forward propagation and gradient calculation of the whole model and update the parameters of the model. The manager then notifies the distributor to redistribute the updated branch models to the various participants to begin the next round of training.
The federated learning distributor exists only at the host server and automatically redistributes the branch models to the participant servers when the models begin to train or after the models are updated. After the manager acquires the output data of all the branch models, the data are fused to complete the forward propagation and reverse gradient calculation of the whole model, and the calculated gradient is used for updating the weight parameters of the whole model. To ensure the branch models are synchronized, the distributor redistributes the updated branch models to the participant servers for the next round of training.
The federated learning pusher only exists in a participant server, and after the participants receive the branch model sent by the distributor, the local training data is input into the branch model, and the model reasoning result is pushed to the manager, and the manager completes gradient calculation and updating of the whole model by using the reasoning data of all participant ends. Therefore, data of the participant end and model training are decoupled from each other, and the safety of the data is guaranteed.
It can be seen from the foregoing embodiments that, in the learning model training method provided in the embodiments of the present invention, the distributor sends the branch model at the host end to the multiple participant ends; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.
It should be particularly noted that, the steps in the embodiments of the learning model training method described above can be mutually intersected, replaced, added, or deleted, and therefore, the learning model training method based on these reasonable permutation and combination transformations shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the described embodiments.
In view of the foregoing, a second aspect of the embodiments of the present invention provides an embodiment of a learning model training system that combines data privacy and model training effects in multi-domain joint training. The system comprises:
a processor; and
a memory storing program code executable by the processor, the program code when executed performing the steps of:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics;
the output characteristics of each participant end are fed back to the host end by the pusher;
performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information;
and updating the branch model at the host side based on the parameter updating information.
In some embodiments, the distributor and manager are located on the host side; the pusher is arranged at a plurality of participant ends.
In some embodiments, the training samples local to the plurality of participant terminals include private data that is not public.
In some embodiments, inputting their local training samples into respective branch models at the plurality of participant terminals to obtain the output features comprises: and inputting local training samples of a plurality of participant terminals into training branches of the branch model to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as an output characteristic.
In some embodiments, updating the host-side branch model based on the parameter update information includes: parameters in the branch model are updated based on the parameter update information.
As can be seen from the foregoing embodiments, the system provided in the embodiments of the present invention sends the branch model at the host end to a plurality of participant ends through the distributor; inputting local training samples of a plurality of participants into respective branch models to obtain output characteristics; the output characteristics of each participant end are fed back to the host end by the pusher; performing, by the manager, forward propagation and gradient operations based on the output characteristics to obtain parameter update information; the technical scheme of updating the branch model at the host side based on the parameter updating information can give consideration to both data privacy and model training effects in multi-field joint training.
It should be particularly noted that the embodiment of the system described above employs the embodiment of the learning model training method to specifically describe the working process of each module, and those skilled in the art can easily think that these modules are applied to other embodiments of the learning model training method. Of course, since the steps in the embodiment of the learning model training method can be mutually intersected, replaced, added, and deleted, these reasonable permutation and combination transformations should also belong to the scope of the present invention for the system, and should not limit the scope of the present invention to the embodiment.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A learning model training method, comprising performing the steps of:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of the participants into the branch models to obtain output characteristics;
feeding back the output characteristics of each participant end to the host end by a pusher;
performing, by a manager, forward propagation and gradient operations based on the output features to obtain parameter update information;
updating the branch model at the host side based on the parameter update information.
2. The method of claim 1, wherein the distributor and the manager are located at the host side; the pusher is arranged at a plurality of participant ends.
3. The method of claim 1, wherein the training samples local to a plurality of the participant terminals include private data that is not public.
4. The method of claim 1, wherein inputting their local training samples into the respective branch models at a plurality of the participant terminals to obtain output features comprises: and inputting local training samples of the participants into training branches of the branch model at a plurality of participant ends to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as the output characteristic.
5. The method of claim 1, wherein updating the branch model at the host based on the parameter update information comprises: updating parameters in the branch model based on the parameter update information.
6. A learning model training system, comprising:
a processor; and
a memory storing program code executable by the processor, the program code when executed performing the steps of:
sending, by the distributor, the host-side branching model to the plurality of participant sides;
inputting local training samples of the participants into the branch models to obtain output characteristics;
feeding back the output characteristics of each participant end to the host end by a pusher;
performing, by a manager, forward propagation and gradient operations based on the output features to obtain parameter update information;
updating the branch model at the host side based on the parameter update information.
7. The system of claim 6, wherein the distributor and the manager are disposed at the host side; the pusher is arranged at a plurality of participant ends.
8. The system of claim 6, wherein the training samples local to a plurality of the participant terminals include private data that is not public.
9. The system of claim 6, wherein inputting their local training samples into the respective branch models at the plurality of participant terminals to obtain output features comprises: and inputting local training samples of the participants into training branches of the branch model at a plurality of participant ends to respectively execute multilayer convolution, and acquiring an output result of the last convolution layer as the output characteristic.
10. The system of claim 6, wherein updating the branch model at the host based on the parameter update information comprises: updating parameters in the branch model based on the parameter update information.
CN202011570866.8A 2020-12-26 2020-12-26 Learning model training method and system Withdrawn CN112766455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011570866.8A CN112766455A (en) 2020-12-26 2020-12-26 Learning model training method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011570866.8A CN112766455A (en) 2020-12-26 2020-12-26 Learning model training method and system

Publications (1)

Publication Number Publication Date
CN112766455A true CN112766455A (en) 2021-05-07

Family

ID=75695845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011570866.8A Withdrawn CN112766455A (en) 2020-12-26 2020-12-26 Learning model training method and system

Country Status (1)

Country Link
CN (1) CN112766455A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449329A (en) * 2021-08-31 2021-09-28 国网浙江省电力有限公司信息通信分公司 Energy data fusion calculation method under federal learning scene based on safe sharing
CN115098885A (en) * 2022-07-28 2022-09-23 清华大学 Data processing method and system and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449329A (en) * 2021-08-31 2021-09-28 国网浙江省电力有限公司信息通信分公司 Energy data fusion calculation method under federal learning scene based on safe sharing
CN115098885A (en) * 2022-07-28 2022-09-23 清华大学 Data processing method and system and electronic equipment
CN115098885B (en) * 2022-07-28 2022-11-04 清华大学 Data processing method and system and electronic equipment

Similar Documents

Publication Publication Date Title
CN112118565B (en) Multi-tenant service gray level publishing method, device, computer equipment and storage medium
CN110443375B (en) Method and device for federated learning
CN107330522B (en) Method, device and system for updating deep learning model
CN109951547B (en) Transaction request parallel processing method, device, equipment and medium
CN103905508B (en) Cloud platform application dispositions method and device
CN109146490A (en) block generation method, device and system
CN110221872A (en) Method for page jump, device, electronic equipment and storage medium
CN112766455A (en) Learning model training method and system
CN110708358B (en) Session message processing method, electronic device and computer-readable storage medium
CN108712491A (en) Block chain node, exchange information processing method, terminal device and medium
CN111737755A (en) Joint training method and device for business model
CN111797999A (en) Longitudinal federal modeling optimization method, device, equipment and readable storage medium
CN108737105A (en) Method for retrieving, device, private key equipment and the medium of private key
CN110276060A (en) The method and device of data processing
CN108897559A (en) System and method are realized in a kind of software upgrading under Network Isolation
CN109582289A (en) The processing method of regular flow, system, storage medium and processor in regulation engine
CN107231400A (en) The synchronous method and device of a kind of data
CN109492049A (en) Data processing, block generation and synchronous method for block chain network
CN112650812A (en) Data fragment storage method and device, computer equipment and storage medium
CN109150981B (en) Block chain network networking method, device, equipment and computer readable storage medium
CN111008249A (en) Parallel chain block synchronization method, device and storage medium
CN111951112A (en) Intelligent contract execution method based on block chain, terminal equipment and storage medium
CN109104472B (en) Block chain network networking method, device, equipment and computer readable storage medium
CN113361236A (en) Method and device for editing document
CN107968798A (en) A kind of network management resources label acquisition method, cache synchronization method, apparatus and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210507

WW01 Invention patent application withdrawn after publication