CN116030317A

CN116030317A - Model training method and system based on DMA-MaaS federal learning platform

Info

Publication number: CN116030317A
Application number: CN202211716818.4A
Authority: CN
Inventors: 陈益强; 蒋鑫龙; 闫冰洁; 王志睿
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-04-28

Abstract

The invention provides a model training method and system based on a DMA-MaaS federal learning platform, comprising the following steps: uploading training data to a federal learning platform, and checking the training data by the federal learning platform and adding the training data to a data pool; uploading tasks to a federal learning platform, wherein the federal learning platform adds the disclosed tasks to a public task pool for other user terminals to select; and judging whether the selected task type is federal learning or not by selecting a task initiated by the user or selecting a task in a public task pool, if so, executing federal learning on the user equipment where the user terminal is located based on training data, returning the model parameters and results obtained by learning to a federal learning platform for parameter aggregation until the aggregated model reaches the required performance, otherwise, executing non-federal learning on the cloud end by the federal learning platform based on the training data. According to the invention, through the MaaS function, heterogeneity of the user side of the federal platform is relieved, management and creation of data, tasks, algorithms and models are completed, and the value of the federal model is brought into play.

Description

Model training method and system based on DMA-MaaS federal learning platform

Technical Field

The invention relates to the technical field of federal learning, in particular to a model training method and system based on a DMA-MaaS federal learning platform.

Background

In recent years, with the development of machine learning and deep learning technologies, there have been wide applications in different fields. However, due to factors such as competition relationship, security problem, complex approval process and the like among companies, the data cooperation among the companies and even the data interconnection and interworking inside the companies are greatly hindered, and the phenomenon of 'data island' is caused. Meanwhile, with the situation that user privacy data are revealed in some large enterprises, data privacy protection has become a worldwide trend, and various countries are also continuously perfected and keep good health in terms of data privacy, which brings great challenges to the development of machine learning technology taking data as grains.

For the two major problems of "data island" and "data privacy", a variety of solutions have been proposed by many scholars and institutions, and the federal learning (Federated Learning) algorithm framework proposed in 2016 has received a great deal of attention. With the rise of federal learning, technology and industrialization are gradually maturing over several years, numerous platforms and products are emerging, and the technology starts to move to large-scale commercial landings, and the technology is applied to various fields including finance, medical treatment and smart cities. However, the federal learning platform faces a new mode different from the machine learning platform which only provides training reasoning service in the past, and the user equipment of the federal platform has heterogeneity problems (such as data heterogeneity, model heterogeneity, execution environment heterogeneity and the like), so that the data, the model and the algorithm of the federal platform also need a new method to manage, and the user can use the federal learning platform conveniently while solving the problems.

Currently, federal platforms focus mainly on communication, privacy and flow, while management of data, tasks, algorithms, models lacks a scientific and reasonable approach to federal learning heterogeneity issues.

Disclosure of Invention

The invention is applied to scientific management of Data, tasks, algorithms and models of the federation platform, designs a service center based on DMA (Data/session/algorism Pool) and MaaS (Model as a Service), and realizes an object-oriented federation mechanism, so that more portable and easy-to-use federation service can be directly provided for various users. In order to achieve the technical effects, the invention comprises the following key technical points:

the key point 1 is that a federal platform architecture design based on a DMA-MaaS service center is adopted;

a key point 2, performing Data quality assessment and a Data management method of an incentive mechanism in a Data Pool (Data Pool);

the key point 3 is that after the task Pool (mixing Pool) carries out cluster user environment assessment, personalized federation or asynchronous federation task setting is carried out aiming at federation learning heterogeneity;

a key point 4, performing federal platform Algorithm management method of federal and non-federal Algorithm division in an Algorithm Pool (Algorithm Pool);

and 5, model increment learning route control and model loading, downloading, using and charging of model flows are carried out on the basis of MaaS (Model as a Service) by the key point 5.

Specifically, the invention provides a model training method based on a DMA-MaaS federal learning platform, which comprises the following steps:

a data preparation step, wherein a user side uploads training data to a federal learning platform, and sets whether the training data is disclosed or not, and the federal learning platform checks the training data and adds the training data to a data pool;

a task initiating step, wherein the user side uploads a task to the federation learning platform, and the task content comprises a task name, whether the task is disclosed, whether the task type is federation learning, algorithms and parameters required by the task, models and parameters required by the task and the number of task participants; the federal learning platform adds the disclosed tasks into a public task pool for other user terminals to select;

a task preparation step, wherein a user side judges whether the selected task type is federation learning or not by selecting a task initiated by the user side or selecting a task in the public task pool, if so, the federation learning step is executed, and otherwise, a non-federation learning step is executed;

the federation learning step, in the user equipment where the user terminal is located, performs federation learning based on the training data, and returns the model parameters and results obtained by learning to the federation learning platform for parameter aggregation until the aggregated model reaches the required performance;

and the non-federation learning step, wherein the federation learning platform executes non-federation learning on the cloud based on the training data to obtain a model conforming to the required performance.

The model training method based on the DMA-MaaS federation learning platform comprises an algorithm pool, wherein the algorithm pool comprises a federation learning algorithm and a non-federation algorithm, and the federation learning algorithm comprises a transverse federation learning algorithm, a longitudinal federation learning algorithm and a user preset federation learning and safety aggregation method; the non-federal learning algorithm comprises a random gradient descent model optimization algorithm.

The model training method based on the DMA-MaaS federal learning platform comprises a model pool, wherein the model pool comprises a plurality of preset models; the model management of the federal learning platform comprises model loading, model downloading, model use, model charging, model market and model popularization; model loading, downloading, use, charging and popularization are all carried out on a model market, so that various model management is provided for a user side;

in the existing data, tasks, algorithms and models of the federal learning platform, the tasks are initiated by the following method, and the platform recommends corresponding parameter configuration according to the use frequency of a user:

initiating from a data pool: after the user sees the wanted public data in the data pool, the task can be initiated on the data page, and at this time, the corresponding algorithm and model are selected, or the algorithm and model recommended by the platform are selected; or (b)

Initiating from an algorithm pool: the user can train the algorithm which is required to be applied in the algorithm pool, and can initiate tasks on the algorithm page, and at the moment, the user also needs to select corresponding data and models or select data and models recommended by the platform; or (b)

Originating from the model market: after a user sees a model which is wanted to be used in a model market, the user can initiate tasks on a model page, and an reasoning task or a federation task is initiated according to the needs; in the reasoning task, model reasoning can be carried out only by selecting data used by the user to obtain a reasoning result; in the federal task, the data and algorithm to be used are also selected to initiate the task, or the data and algorithm recommended by the platform are selected.

The model training method based on the DMA-MaaS federation learning platform is characterized in that the model is an image classification model, training data is an image with marked categories, and the task is an image classification task;

the method further comprises a model executing step of inputting the image to be identified into the image classification model to execute the image classification task, and obtaining the category of the image to be identified.

The invention also provides a model training system based on the DMA-MaaS federal learning platform, which comprises the following steps:

the data preparation module is used for uploading training data to the federal learning platform and setting whether the training data is disclosed or not, and the federal learning platform checks the training data and adds the training data to the data pool;

the task initiating module is used for uploading tasks to the federation learning platform, and the task content comprises task names, whether the tasks are disclosed, whether the task types are federation learning, algorithms and parameters required by the tasks, models and parameters required by the tasks and the number of task participants; the federal learning platform adds the disclosed tasks into a public task pool for other user terminals to select;

the task preparation module is used for judging whether the selected task type is federal learning or not by selecting a task initiated by the user or selecting a task in the public task pool, if so, executing the federal learning module, otherwise, executing the non-federal learning module;

the federation learning module is used for executing federation learning based on the training data in the user equipment where the user terminal is located, and returning the model parameters and results obtained by learning to the federation learning platform for parameter aggregation until the aggregated model reaches the required performance;

the non-federal learning module is used for enabling the federal learning platform to execute non-federal learning on the cloud based on the training data to obtain a model conforming to required performance.

The model training system based on the DMA-MaaS federation learning platform comprises an algorithm pool, wherein the algorithm pool comprises a federation learning algorithm and a non-federation algorithm, and the federation learning algorithm comprises a transverse federation learning algorithm, a longitudinal federation learning algorithm and a user preset federation learning and safety aggregation method; the non-federal learning algorithm comprises a random gradient descent model optimization algorithm.

The model training system based on the DMA-MaaS federal learning platform comprises a model pool, wherein the model pool comprises a plurality of preset models; the model management of the federal learning platform comprises model loading, model downloading, model use, model charging, model market and model popularization; model loading, downloading, use, charging and popularization are all carried out on a model market, so that various model management is provided for a user side;

The model training system based on the DMA-MaaS federation learning platform is characterized in that the model is an image classification model, training data is an image with marked category, and the task is an image classification task;

the model executing module is used for inputting the image to be identified into the image classifying model to execute the image classifying task, and obtaining the category of the image to be identified.

The invention also proposes a storage medium for storing a program for executing any of the model training methods based on the DMA-MaaS federal learning platform as claimed in claims 1 to 4.

The invention also provides a client which is used for the model training system based on the DMA-MaaS federal learning platform in any one of claims 5 to 8.

The advantages of the invention are as follows:

compared with the prior art, the DMA-MaaS federation platform architecture design based on the invention aims at DMA related optimization of federal learning heterogeneous scenes and model incremental learning routing control and model management model of model flows, namely service (MaaS).

Drawings

FIG. 1 is a diagram of a federal platform architecture design based on DMA-MaaS;

FIG. 2 is a block diagram of the overall system of the present invention;

FIG. 3 is a flow chart of the federal learning method of the present invention;

FIG. 4 is a schematic diagram of a data processing apparatus of the present invention.

Detailed Description

The invention aims to solve the problems of insufficient pertinence to the Union model and lack of a new management mode to data, tasks, algorithms and models in the existing federal platform, and provides a federal platform architecture design based on DMA-MaaS. The service platform and model market model fusion, model increment and other models, namely service (MaaS, model as a Service) functions, are created through the innovative Data Pool (Data Pool), task Pool (Mission Pool), algorithm Pool (Algorithm Pool) and linkage thereof, so that heterogeneity of a user side of the federal platform can be relieved, management and rapid creation of Data, tasks, algorithms and models are completed, and federal model values can be exerted.

The invention relates to a novel management method of data, tasks, algorithms and models of a service center platform based on DMA-MaaS aiming at federal heterogeneity, which is arranged between a federal platform bottom driving service and an application layer, and the whole architecture is shown in figure 1.

The federal platform architecture based on DMA-MaaS is composed of a DMA service center, a MaaS service layer and linkage between the DMA service center and the MaaS service layer except for bottom layer driving and upper layer application.

1. The DMA service middle station comprises 3 parts of a data pool, a task pool and an algorithm pool:

1) Data Pool (Data Pool): the data pool is mainly used for data management of users and comprises meta-information of data and a federal data processing method.

The meta information of the data includes data name, data type, data authority, etc., and the table data contains information such as data number, column name, column data type, etc., and the image data contains relevant information such as picture name, picture size, label, etc.

Federal data has heterogeneity problems such as large data distribution differences, large data quantity differences, and the like. Thus leading to poor federal model effectiveness, inadequate generalization capability, and unfairness. Therefore, quality evaluation and further processing are required to be carried out on the federal data, and data quality quantitative evaluation based on prior posterior fusion can be used for ensuring generalization capability and fairness of the federal model.

2) Task Pool (mix Pool): the task pool contains various tasks of the user and corresponding federation modes.

The task content in the task pool comprises the initiation of the task, the modification of the task, the preparation of the task, the distribution of the task, the proceeding of the task and the completion of the task.

And different federal task modes need to be given to different algorithms when the task is constructed. Aiming at the problem of heterogeneous data, when the federation model needs to be lifted to improve the model performance on single user data, personalized federation tasks need to be performed. When the execution environments of the federal parties are greatly different, such as caused by network, calculation force or interruption, in order to prevent the long-time waiting from wasting more time and calculation force, asynchronous federal learning tasks are needed to be adopted to relieve the problem of execution environment heterogeneity. For use in the model market, only the inference tasks need to be established.

3) Algorithm Pool (Algorithm Pool): the algorithm pool comprises a platform preset algorithm and a user-defined algorithm, and comprises federal and non-federal algorithms. The federal algorithm comprises a horizontal federal learning algorithm (FedAVg), a vertical federal learning algorithm, personalized federal learning preset by a user and various security aggregation methods. The non-federal algorithm comprises a model optimization method such as random gradient descent and the like.

2. The MaaS service layer includes model management and model application and reasoning.

Model as a Service (MaaS) is an important way of profit and application of the federal platform, with each federal or non-federal user co-building model as a separate product that is put on shelf to the MaaS layer to expose and provide an interface for initiating training or reasoning tasks.

In terms of model management on the platform side, the model of each user needs to be stored, which includes information such as the structure of the model, the deep learning framework to which the model is applied, training process parameters of the model (including training parameters related to the model, such as pre-training model, batch size, iteration times, learning rate, etc., so that the user can use and set the parameters to perform model fine adjustment more quickly), and input data formats. In the aspect of storage, a distributed storage mode is adopted to ensure that important model assets are not lost. When the model is put on the shelf to the market, the basic cost corresponding to model reasoning is calculated as a reference according to the parameters and the required calculation force of the model, and meanwhile, corresponding safety protection is carried out according to the public and non-public authority levels of the model, so that the model parameters are prevented from being leaked, and the user asset loss is caused.

In the aspect of model management at the user side, there are model loading, model downloading, model use, model charging and model market. Model loading, downloading, using and charging are all carried out on the model market, and flexible and various model management modes are provided for users. The platform preset model comprises a decision tree, a random forest, linear regression, a convolutional neural network and the like.

In the aspect of model application at a user side, a pre-training model provided by a platform can rapidly start a process of model training and model initial parameter exploration by using services such as model increment, model migration and the like, reduce task iteration rounds and accelerate model training efficiency. Meanwhile, in order to ensure full utilization of the model, model intelligent routing based on opportunity incremental learning can be used, and model flow control of model incremental learning is optimized.

In the aspect of model popularization on a user side, the platform can provide an API corresponding to model reasoning, and model reasoning is carried out by using user equipment, so that convenience is provided for promoting self business for the user.

In the model market, users build and allow the on-shelf models and the relevant trained models provided by the platform side through federal or non-federal tasks in the task pool, the users and the platform side can provide model services through the model market, model profit is realized, model value is exerted, and model energization is completed.

3. DMA-MaaS linkage.

The DMA-MaaS linkage refers to the organic combination and linkage of all parts under the framework of a data pool, a task pool, an algorithm pool and a model market. The whole DMA-MaaS framework is unfolded around the task, corresponding data, algorithm and model used by the task are required to be selected when the task is created, the task can be directly initiated and created by any module in an algorithm pool, a data pool and a model market, at the moment, only two other missing items are required to be selected, and meanwhile, the platform can conduct recommendation of relevant selection according to the frequency of matching selection of a user, so that the user can conveniently and quickly initiate and search.

When the user selects and matches the algorithm, the data and the model, the platform can give a default parameter configuration file according to the algorithm type (federal algorithm, security algorithm), the data type (image multi-classification, multi-label classification, image segmentation, text, voice and the like) and the like, so that the user can conveniently tune and modify the corresponding parameters.

In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.

In the invention, the user performs model training locally, but the trained model participates in global (federal) model aggregation so as to ensure the safety of user data. According to the user operation flow, the present invention is divided into the following stages, as shown in fig. 3.

1. Data preparation stage:

when a user uses the architecture platform designed based on the invention, the stage in which the user is positioned is a data preparation stage, and the process of the stage is as follows:

1) Data import: clicking new data key on the interactive interface, the system prompts the preparation and steps needed by the data uploading operation. After checking the data, the user sets the data authority (public and non-public), fills in the data description, and uploads the data to the cluster data management interface provided by the bottom framework through the interactive interface. And the user can upload the data to the K3S cluster of the bottom drive through the data uploading service provided by the federation platform.

2) Data pool processing: the system performs data correlation attribute check on the data submitted by the user to ensure that the data formats uploaded by all parties are consistent, ensure the normal operation of subsequent model training, and after the check is passed, add the data submitted by the user into a data pool, and if the data is public data, other users can see the data in the public data pool. And then executing a data quality evaluation program on the user side, and applying a corresponding incentive mechanism to perform corresponding processing.

Wherein the evaluation program for data quality is executed at the user side, comprising: the user's data may be too much in duplicate, redundant or have much fictitious noise data, or the data is very small in number, so their data quality is low. In conjunction with a subsequent incentive mechanism, users providing high quality data samples are motivated.

2. Task initiation phase:

after completing the data preparation phase, the user enters a task initiation phase, and the process of the phase is as follows:

1) Task initiation: the user clicks on the initiate task on the interactive interface and the system prompts the impending task discovery and its required preparations and steps. The user fills in the task name according to the prompt, selects the visibility (public and private) of the task, the type (federation and non-federation) of the task, the required algorithm and parameters of the task, the required model and parameters of the task, the number of people involved in the task, the required data format of the task, fills in the task description and completes the task initiation.

2) Processing a task pool: and the system receives the task initiated by the user, and if the task is public, the task is added into a public task pool for other users to select to add according to the visibility of the task.

3. Task preparation stage:

after the user finishes the task and initiates, enter the task and prepare the stage, this stage process is as follows:

1) The user can check the task initiated by himself by clicking on 'My task', select the task to be added in the public task pool, check by other invitation code, and add the task. Each task comprises a task name, task visibility, a task flow type, a task description, task related parameters, task data, a task model, a task algorithm, preparation conditions of each participant and equipment thereof, a task state and task operation. The tasks can be classified into public tasks and non-public tasks according to task visibility, and can be classified into federate tasks and non-federate tasks according to task flows.

2) After the user clicks the join task, the data is selected, if the user has completed the data preparation stage, the user can directly select the required data, otherwise, the data preparation stage is required to be performed again to complete the data selection.

3) After the user completes the joining task and performs any relevant authority, the joining task is completed.

4) After the user joins the task, the system begins to detect whether the user device is normal, including data preparation in the device, network environment, connection status, etc. And performing execution environment assessment, and using corresponding federation task modes, such as asynchronous federation tasks and personalized federation tasks.

4. Task execution and completion phase:

when all task participants complete the task preparation stage, the task can be started, and the stage flow is as follows:

1) And after the corresponding model increment adjustment, each user trains the model according to own data, and the trained model parameters and results are returned to the platform side through the cluster to carry out parameter aggregation until the model reaches the required performance.

2) During the task, each user can see the training progress and status. If the model does not reach the required performance after one round of training is finished, continuing to perform optimization training by using a model increment method based on model flow.

3) After all training tasks are completed, the task state becomes completed and a model is provided after the tasks have been completed.

5. Model enabling stage:

after the task is completed, the user can enter a model enabling stage according to the requirement, and the process of the stage is as follows:

1) The user can select the model after the task is completed, fill in the related model description, put the model description on the model market, wherein the model description comprises the information of the participants, the data attribute, the quantity and the like of the model.

2) Other users may view the model and its description in the model marketplace, purchase and download models as needed, or use online reasoning services.

3) And the users on the model get benefits through purchase and use of other users, settle accounts and complete model energization.

6. DMA-MaaS linkage phase:

in the existing data, tasks, algorithms and models of the platform, the tasks can be initiated rapidly through a DMA-MaaS linkage method.

1) Initiating from a data pool: after the user sees the wanted public data in the data pool, the task can be initiated on the data page, and at this time, the corresponding algorithm and model are selected.

2) Initiating from an algorithm pool: the user can train the algorithm which is wanted to be applied in the algorithm pool, and can initiate tasks on the algorithm page, and at the moment, the corresponding data and model are needed to be selected.

3) Originating from the model market: after the user sees the model which is wanted to be used in the model market, the user can initiate tasks on the model page, and can initiate reasoning tasks or federation tasks according to the needs. In the reasoning task, model reasoning can be carried out only by selecting data used by the user to obtain a reasoning result. In the federation task, the data and algorithm to be used are also selected to initiate the task.

The specific application cases are as follows: the hospital A wants to build an ophthalmic fundus image auxiliary diagnosis model, but because the data needs expert labeling, the self-labeling data is not more, possibly less than 1000, and is not enough to independently build a robust diagnosis model; or because of fewer samples of a certain case diagnosed by the hospital itself, the established model may be overfitted for a certain disease. Just hospital a knows that hospital B, C has the same requirements and data, and that three parties can upload data to their own local clusters separately through the federal platform. In the federal platform system, a hospital A clicks data uploaded by the hospital A, creates a task built by an ophthalmic auxiliary diagnosis model through the characteristics of DMA linkage, simultaneously rapidly positions and selects a model to be utilized according to a platform recommendation result, initiates task invitation to a hospital B, C, checks a task pool to find the task built by the ophthalmic auxiliary diagnosis model initiated by the hospital A, uploads own data after entering, and the federal platform checks according to the data requirement of the hospital A, then prepares respective equipment, and starts the task after being confirmed by the hospital A.

When the training task is completed, the hospital A, B, C can see the model obtained after the task is completed in my model, and can choose whether to disclose or not, after which other users can see the model in the model pool.

Hospital D happens to have a related need, but is not involved in the task because of its lack of data. After the model is disclosed, the model can be found in the model pool, the model is obtained through the model enabling characteristic of the platform MaaS, the reasoning task is initiated after paying, and the data of the model is utilized for reasoning, so that the auxiliary diagnosis result can be obtained. And hospitals A, B, C can also be used with models to obtain revenue based on the respective engagement.

If the hospital A wants to popularize the self-work and utilizes the self-owned computing power, the hospital A can utilize the model popularization service provided by the platform MaaS function to bind the self-computing power, the platform provides a corresponding model reasoning service interface for the hospital A, and the hospital A can build the self-reasoning platform service by using the interface and experience the model reasoning service function to the external user.

The following is a system example corresponding to the above method example, and this embodiment mode may be implemented in cooperation with the above embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.

As shown in fig. 4, the present invention also proposes a storage medium for storing a program for executing the model training method based on the DMA-MaaS federal learning platform according to any one of claims 1 to 4.

Claims

1. A model training method based on a DMA-MaaS federal learning platform is characterized by comprising the following steps:

2. The model training method based on a DMA-MaaS federal learning platform according to claim 1, wherein the federal learning platform comprises an algorithm pool, the algorithm pool comprises a federal learning algorithm and a non-federal algorithm, the federal learning algorithm comprises a horizontal federal learning algorithm, a vertical federal learning algorithm, a user preset federal learning and security aggregation method; the non-federal learning algorithm comprises a random gradient descent model optimization algorithm.

3. The model training method based on the DMA-MaaS federal learning platform according to claim 1, wherein the federal learning platform comprises a model pool comprising a plurality of preset models; the model management of the federal learning platform comprises model loading, model downloading, model use, model charging, model market and model popularization; model loading, downloading, use, charging and popularization are all carried out on a model market, so that various model management is provided for a user side;

4. The model training method based on the DMA-MaaS federal learning platform according to claim 1, wherein the model is an image classification model, the training data is an image with marked category, and the task is an image classification task;

5. Model training system based on DMA-MaaS federation learning platform, characterized by comprising:

6. The model training system based on a DMA-MaaS federal learning platform of claim 5, wherein the federal learning platform comprises an algorithm pool comprising a federal learning algorithm and a non-federal algorithm, the federal learning algorithm comprising a horizontal federal learning algorithm, a vertical federal learning algorithm, a user preset federal learning and security aggregation method; the non-federal learning algorithm comprises a random gradient descent model optimization algorithm.

7. The model training system based on a DMA-MaaS federal learning platform of claim 5, wherein the federal learning platform comprises a model pool comprising a plurality of preset models; the model management of the federal learning platform comprises model loading, model downloading, model use, model charging, model market and model popularization; model loading, downloading, use, charging and popularization are all carried out on a model market, so that various model management is provided for a user side;

8. The model training system based on the DMA-MaaS federal learning platform according to claim 5, wherein the model is an image classification model, the training data is an image of a marked category, and the task is an image classification task;

9. A storage medium storing a program for executing the model training method based on the DMA-MaaS federal learning platform according to any one of claims 1 to 4.

10. A client for use in the model training system of any one of claims 5 to 8 based on a DMA-MaaS federal learning platform.