WO2021221242A1

WO2021221242A1 - Federated learning system and method

Info

Publication number: WO2021221242A1
Application number: PCT/KR2020/013548
Authority: WO
Inventors: 문재원; 금승우; 김영기
Original assignee: 한국전자기술연구원
Priority date: 2020-04-27
Filing date: 2020-10-06
Publication date: 2021-11-04
Also published as: KR102544531B1; KR20210132500A

Abstract

The present invention provides a federated learning system comprising: a plurality of user terminals that generate learning data by learning a global model on the basis of user data; a server that generates a global model, collects learning data, and uses the learning data to improve the global model; and a data management unit that stores and manages the learning data and model data related to the global model, delivers the model data to the plurality of user terminals, and delivers the learning data to the server.

Description

Federated Learning Systems and Methods

The present invention relates to a federated learning system and method.

Recently, with the development of cloud and big data analysis and processing technology, artificial intelligence (AI) technology has been universally applied in various services. In order to apply artificial intelligence technology to services as described above, a procedure for learning an artificial intelligence model based on a large amount of data must be preceded.

Training of artificial intelligence models requires numerous computer resources to perform large-scale calculations. Cloud computing service is the best solution that can easily provide computing infrastructure to train artificial intelligence models without complex hardware and software installation.

Because cloud computing is based on centralization of resources, all necessary data should be stored in cloud memory and utilized for model training. Although data centralization offers many advantages in terms of maximizing efficiency, there is a risk of leakage of user personal data, which is becoming an increasingly important business issue as data transmission increases.

Recently, many learning algorithms have been introduced to support federated learning systems and federated learning architectures to overcome these problems.

Federated learning is a learning method in the form of centrally collecting a model learned based on user personal data in a user terminal, rather than learning by collecting user personal data in the center as in the past. Since this federated learning does not centrally collect user personal data, there is little possibility of invasion of privacy.

The practical use of federated learning systems requires considering not only algorithmic aspects such as how to update parameters and learning schedules, but also system aspects such as independent data management for each device and how to efficiently communicate with heterogeneous systems.

In addition, the network dependency between the server and the user terminal is another problem to be solved. That is, in order to perform federated learning, each server and a plurality of user terminals must be closely connected to each other. In this case, when an unstable network or connection problem occurs, it is difficult to respond. In addition, there is a problem in that the user terminal has an additional burden of maintaining data transmitted to the server until the data transmission with the server is completed even if the resource is insufficient and the network state is unstable.

In order to solve the problems of the prior art as described above, an object of the present invention is to provide a federated learning system and method in which a server and a user terminal can asynchronously perform a learning task.

In addition, according to the present invention, it is an object of the present invention to provide a federated learning system and method capable of introducing a data management unit in order to alleviate the workload of the server, and performing learning tasks centering on the data management unit.

The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned above can be clearly understood by those of ordinary skill in the art to which the present invention belongs from the description below. There will be.

In order to solve the above problems, the present invention, a plurality of user terminals that generate training data by learning a global model based on user data, create a global model, collect the training data, and use the global model It provides a federated learning system including a server to improve, and a data management unit that stores and manages model data and learning data related to the global model, transmits the model data to a plurality of user terminals, and transmits the learning data to the server.

Here, the model data includes global parameters of the global model, learning time of the user terminal, and type and size information of user data to be used for learning.

In addition, the user terminal establishes a learning plan based on the model data and performs learning according to the learning plan.

Also, the training data is a local model or a local parameter of the local model.

In addition, the data management unit generates metadata including the size of the training data, the creation date and time, and the distribution characteristics.

In addition, the server, based on the metadata, determines the range and number of learning data, selects the learning data to be collected, or establishes or changes a collection plan of the learning data.

In addition, the data management unit manages the model data and the training data for each version.

In addition, the present invention, the server creates a global model, registering model data related to the global model to the data management unit, the data management unit transmitting the model data to a plurality of user terminals, the plurality of user terminals Generating training data by learning a global model based on user data, a plurality of user terminals registering the training data to the data management unit, transmitting the training data to the server by the data management unit, and the server learning It provides a federated learning method comprising the step of aggregating data to improve a global model.

Here, the step of registering the model data to the data management unit includes the server requesting the data management unit to register the model data, and the data management unit registering the model data for each version.

In addition, the step of transferring the model data to the plurality of user terminals, the plurality of user terminals requesting the model data to the data management unit, the data management unit the latest version of the model data or the model data of the version requested by the plurality of user terminals and transmitting the to a plurality of user terminals.

In addition, the step of registering the learning data to the data management unit includes the step of the user terminal requesting the registration of the learning data to the data management unit, and the data management unit registering the learning data for each version.

In addition, the step of transmitting the learning data to the server includes the step of the server requesting the learning data to the data management unit, and the data management unit transmitting the latest version of the learning data or the version of the learning data requested by the server to the server. .

According to the present invention, the user terminal and the server independently perform tasks without considering each other's work state, so that federated learning can be flexibly performed and the performance of the global model can be improved.

In addition, according to the present invention, the server can perform federated learning by bringing only the data stored in the data management unit regardless of the user terminal and the network connection state, thereby reducing the burden on the server.

Also, according to the present invention, by changing the communication connection between the server and the data management unit and between the user terminal and the data management unit, it is possible to reduce bandwidth, increase network efficiency, and prepare for network failure.

The effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned may be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description. will be.

1 is a block diagram of a conventional federated learning system.

2 is a flowchart of a conventional associative learning method.

3 is a block diagram of a federated learning system according to an embodiment of the present invention.

4 is a flowchart of a federated learning method according to an embodiment of the present invention.

FIG. 5 is a detailed flowchart of the step of registering the model data of FIG. 4 .

6 is a detailed flowchart of a step of transmitting the model data of FIG. 4 .

7 is a detailed flowchart of the step of registering the learning data of FIG.

FIG. 8 is a detailed flowchart of a step of transmitting the learning data of FIG. 4 .

In order to fully understand the configuration and effect of the present invention, preferred embodiments of the present invention will be described with reference to the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, and may be embodied in various forms and various modifications may be made. However, the description of the present embodiment is provided so that the disclosure of the present invention is complete, and to fully inform those of ordinary skill in the art to which the present invention belongs, the scope of the invention. In the accompanying drawings, components are enlarged in size than actual for convenience of description, and ratios of each component may be exaggerated or reduced.

Terms such as 'first' and 'second' may be used to describe various elements, but the elements should not be limited by the above terms. The above term may be used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a 'first component' may be referred to as a 'second component', and similarly, a 'second component' may also be referred to as a 'first component'. can Also, the singular expression includes the plural expression unless the context clearly dictates otherwise. Unless otherwise defined, terms used in the embodiments of the present invention may be interpreted as meanings commonly known to those of ordinary skill in the art.

1 is a block diagram of a conventional federated learning system, and FIG. 2 is a flowchart of a conventional federated learning method.

Referring to FIG. 1 , the conventional federated learning system may be configured to include a plurality of user terminals 10 , a server 20 and a storage 30 .

The server 20 generates a global model and stores the generated global model in the storage 30 . Then, the server 20 transmits the global model stored in the storage 30 to the plurality of user terminals 10 .

The plurality of user terminals 10 generate a learning model by learning a global model based on user data. In addition, the plurality of user terminals 10 transmit the learning model to the server 20 .

The server 20 collects the training data and uses it to improve the global model. Then, the server 20 stores the improved global model in the storage 30 , and transmits the improved global model to the plurality of user terminals 10 again. This process may be repeated until the global model performance reaches a certain level or higher.

Referring to Figure 2, the conventional federated learning method consists of a selection (Selection) step, a configuration (Configuration) step and a reporting (Reporting) step.

First, the server 20 stores the model data including the global parameters of the global model, the learning plan, the data structure, and the work to be performed in the storage 30 .

Next, a plurality of user terminals 10a to 10e capable of performing the federated learning notifies that the learning is ready by sending a message to the server 20 (①). Then, the server 20 collects information of a plurality of user terminals 10a to 10e, and according to a rule such as the number of participating terminals, a user terminal 10 most suitable for participating in learning among a plurality of user terminals 10a to 10e. ~10c) (selection step).

Next, the server 20 reads the model data stored in the storage 30 (②), and transmits it to the selected user terminals 10 to 10c (③). Then, the user terminals 10a to 10c perform learning by applying the user data to the global model according to the model data (④) (configuration step).

Next, the user terminals 10a to 10c transmit training data, for example, a local model or a local parameter of the local model, to the server 20 when learning is completed. In this case, transmission of some user terminals 10b may fail due to an unstable network or connection problem. When the server 20 receives the training data from the

user terminals

10a and 10c, the server 20 collects the training data and improves the model data of the global model using this (⑤). Then, the server 20 stores the model data of the improved global model in the storage 30 (report step).

As such, when one round of federated learning is completed and the model data is improved, the next round can be restarted. At this time, when the global model performance exceeds a certain level by repeating several rounds, the entire federated learning process is terminated.

As described above, in the conventional federated learning method, the storage 30 is used only for storing model data of the global model generated by the server 20 . Then, the server 20 checks the status of the plurality of user terminals 10, selects a suitable user terminal 10, determines whether a sufficient amount of learning data to be collected has been collected, and transmits the model data to the plurality of users. It performs many roles, such as transmitting to the terminal 10 .

In addition, the conventional federated learning method may be reasonable when the number of user terminals 10 to be managed by the server 20 is small, but the number of user terminals 10 participating in federated learning is greatly increased or the user terminals 10 When the number of and its characteristics are flexible, it becomes a great burden on the server 20 for the server 20 to manage all of them.

In addition, in the conventional federated learning method, since all responses of a plurality of user terminals 10 having their own environment and specifications are independent, the server 20 determines the exact number and timing of individual responses of the user terminals 10 . cannot predict As such, since the server 20 cannot predict the exact number and timing of individual responses of the user terminals 10, it is inefficient for the server 20 to manage the responses of all the user terminals 10. FIG.

In addition, in the conventional federated learning method, the server 20 and the plurality of user terminals 10 have a dependency. That is, the server 20 can proceed to update the global model by collecting the learning data only after collection of all responses of the user terminal 10 is completed, and when a failure occurs in the user terminal 10 or the network, federated learning can also be stopped. have. Accordingly, it is difficult to modify and optimize the learning plan.

Therefore, it should be possible to reconfigure to suit various situations, such as dynamically changing the range and number of user terminals 10 participating in federated learning. And, in order to flexibly and effectively proceed with various federated learning, the dependency between the server 20 and the user terminal 10 should be relaxed.

Referring to FIG. 3 , the federated learning system according to an embodiment of the present invention may be configured to include a plurality of user terminals 110 , a server 120 , and a data management unit 130 .

The user terminal 110 and the server 120 are computing devices capable of learning a neural network, and may be implemented in various electronic devices.

The neural network may be designed to simulate a human brain structure on a computer, and may include a plurality of network nodes having parameters that simulate neurons of a human neural network. The plurality of network modes may transmit and receive data according to a connection relationship, respectively, so as to simulate a synaptic activity of a neuron in which a neuron sends and receives a signal through a synapse. Here, the neural network may include a deep learning model developed from a neural network model. In a deep learning model, a plurality of network nodes may exchange data according to a convolutional connection relationship while being located in different layers. Examples of neural network models include deep neural networks (DNN), convolutional deep neural networks (CNN), Recurrent Boltzmann Machine (RNN), Restricted Boltzmann Machine (RBM), deep trust It includes various deep learning techniques such as neural networks (DBN, deep belief networks) and deep Q-networks, and can be applied to fields such as computer vision, speech recognition, natural language processing, and speech signal processing.

The plurality of user terminals 110 generates training data by learning the global model based on the user data. Here, the training data may be a local model or a local parameter of the local model.

In the conventional federated learning system, the server 20 selects the user terminal 10, and only the selected user terminal 10 participates in learning, but the federated learning system according to the embodiment of the present invention selects the user terminal 10 All user terminals 110 having a resource that can be learned without it can participate in learning. Accordingly, the server 120 may alleviate the burden of selecting the user terminal 110 .

The plurality of user terminals 110 transmits the learning data to the data management unit 130 when learning is completed. In this case, the plurality of user terminals 110 may transmit the generated local model itself or transmit local parameters of the local model.

The server 120 generates a global model, collects training data, and uses this to improve the global model.

The server 120 transmits the model data of the global model to the data management unit 130 , and receives training data from the data management unit 130 .

The data management unit 130 stores and manages model data and training data related to the global model, transmits the model data to the plurality of user terminals 110 , and transmits the training data to the server 120 .

Here, the model data may include global parameters of the global model, a learning time of the user terminal 110 and information on the type and size of user data to be used for learning.

Accordingly, the plurality of user terminals 110 may establish a learning plan based on the model data and perform learning according to the learning plan.

The data management unit 130 may generate metadata including the size of the training data, the creation date and time, and the distribution characteristics, and manage the training data based on the generated metadata.

The server 120 may determine the range and amount of the learning data, select the learning data to be collected, establish or change a collection plan of the learning data, based on the metadata, and collect the learning data according to the collection plan. can do. For example, the server 120 may select training data having a certain amount and a certain level of reliability or higher based on the metadata.

The data management unit 130 may manage the model data and the training data for each version, which will be described in detail later.

The federated learning system according to the embodiment of the present invention described above asynchronously performs a task between the user terminal 110 and the server 120 . That is, the user terminal 110 and the server 120 independently perform the operation without considering the operation state of each other. Accordingly, federated learning can be flexibly performed and the performance of the global model can be improved.

In addition, the federated learning system according to an embodiment of the present invention stores the data generated by the user terminal 110 and the server 120 respectively in the data management unit 130 , and the data management unit 130 is the user terminal 110 . and serves as a hub for transferring data stored in the server 120 . At this time, the user terminal 110 and the server 120 do not communicate with each other. Accordingly, the server 120 can perform federated learning by importing only the data stored in the data management unit 130 regardless of the state of the user terminal 110 and the network connection state, thereby reducing the burden on the server 120 . have.

In addition, the federated learning system according to an embodiment of the present invention establishes a communication connection between the conventional server 20 and the storage 30 and the server 20 and the user terminal 10 between the server 120 and the data management unit. (130) By changing to each other and the user terminal 110 and the data management unit 130 to each other, it is possible to reduce bandwidth, increase network efficiency, and prepare for network failure.

4 is a flowchart of a federated learning method according to an embodiment of the present invention, FIG. 5 is a detailed flowchart of the step of registering the model data of FIG. 4, and FIG. 6 is a detailed flowchart of the step of transferring the model data of FIG. , FIG. 7 is a detailed flowchart of the step of registering the learning data of FIG. 4 , and FIG. 8 is a detailed flowchart of the step of transferring the learning data of FIG. 4 .

Hereinafter, the associative learning method according to an embodiment of the present invention will be described in detail with reference to FIGS. 3 to 8 , but the same contents as those described above will be omitted.

In order for the user terminal 110 and the server 120 to register or request training data or model data to the data management unit 130 , a task name (Task_name), a version (Version), a model location (Model location), and a device name (Device) name) must be transmitted to the data management unit 130 .

where taskname is the unique learning job name to solve using federated learning. Here, the data management unit 130 may provide the user terminal 110 and the server 120 with conditions necessary to perform the learning task corresponding to the task name. In addition, the user terminal 110 and the server 120 may access the data management unit 130 through the task name to find a desired learning task.

The version is a value used when the user terminal 110 and the server 120 update model data and training data of the global model, and is in a form of a float. Here, when the plurality of servers 120 and the user terminal 110 jointly perform a learning task for one learning task, this version becomes a standard for managing the learning results.

The model location is information about a location where model data or training data is generated. Here, the location where the model data of the global model is generated is the server 120 , and the location where the training data of the local model is generated is the user terminal 110 .

The device name is a unique ID or name of the user terminal 110 and the server 120 . Here, the data management unit 130 may help the server 120 to select the learning data generated by the user terminal 110 by providing the performance and characteristics of each device corresponding to the device name.

When the user terminal 110 or the server 120 transmits the above information to the data management unit 130 , the data management unit 130 registers model data or user data corresponding to the received information, or the user terminal 110 or the server forward to (120).

In the federated learning method according to an embodiment of the present invention, first, the server 120 creates a global model and registers model data related to the global model in the data management unit 130 (S10).

Specifically, the server 120 requests the data management unit 130 to register the model data (S11). Then, the data management unit 130 registers the model data for each version. That is, the data management unit 130 checks whether there is storage of the model data (S12). At this time, if there is no storage, a storage is created (S13), and the model data is stored in the created storage (S14). And, if there is storage, the data management unit 130 checks the version of the model data ( S15 ), and compares the version of the model data with the latest version stored in the data management unit 130 . At this time, if the version of the model data is higher than the latest version, the version of the model data is upgraded (S17), and if the version of the model data is lower than or equal to the latest version, the model data of the corresponding version is updated (S18).

Next, the data management unit 130 transmits the model data to the plurality of user terminals 110 (S20).

Specifically, a plurality of user terminals 110 request the model data to the data management unit 130 (S21). In addition, the data management unit 130 transmits the latest version of the model data or the model data of the version requested by the plurality of user terminals 110 to the plurality of user terminals 110 . That is, the data management unit 130 checks whether the specific version of the model data (S22). At this time, in the case of the specific version of the model data, the specific version of the model data is searched (S23), and if it is not the specific version of the model data, the latest version of the model data is searched (S24).

Then, it is confirmed whether the data management unit 130 has completed finding the specific version of the model data or the latest version of the model data (S25). At this time, when the model data is found, the found model data is transmitted to the user terminal 110 (S26), and when the model data is not found, the latest version of the model data is requested from the server 120 (S27) and delivered and delivered The received model data is transmitted to the user terminal 110 (S26).

Next, the plurality of user terminals 110 generates training data by learning the global model based on the user data (S30).

Next, the plurality of user terminals 110 register the learning data to the data management unit 130 (S40).

Specifically, the user terminal 110 requests the data management unit 130 to register the learning data (S41). Then, the data management unit 130 registers the learning data for each version. That is, the data management unit 130 checks the version of the training data (S42), and checks whether there is a storage in which the training data of the corresponding version is stored (S43). At this time, if there is storage, the learning data is stored in the corresponding storage (S44), and if there is no storage, the storage is created (S45) and the training data is stored in the created storage (S44).

Next, the data management unit 130 transmits the learning data to the server 120 .

Specifically, the server 120 requests the learning data to the data management unit 130 (S51). Then, the data management unit 130 transmits the latest version of the training data or the training data of the version requested by the server 120 to the server 120 . That is, the data management unit 130 searches for the latest version of the training data (S52), and checks whether the training data satisfies the aggregation condition (S53). That is, it is checked whether the training data is greater than a certain amount and the reliability of the training data is greater than or equal to a certain level. At this time, when the learning data does not satisfy the aggregation condition, it waits until the aggregation condition is satisfied ( S54 ), and when the learning data satisfies the aggregation condition, the learning data is transmitted to the server 120 . Then, the server 120 collects the training data and improves the global model based on this (S60).

Next, the server 120 registers the model data of the improved global model in the data management unit 130 . Then, the data management unit 130 transmits the model data of the improved global model to the plurality of user terminals (10). This process may be repeated until the global model performance reaches a certain level or higher.

In the detailed description of the present invention, although specific embodiments have been described, various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention is not limited to the described embodiments, and should be defined by the following claims and their equivalents.

The federated learning system according to the present invention can be used in various fields such as artificial intelligence technology.

Claims

a plurality of user terminals for generating training data by learning a global model based on user data;

a server that generates the global model, collects the training data, and improves the global model by using it; and

A data management unit that stores and manages model data and the learning data related to the global model, transmits the model data to the plurality of user terminals, and transmits the learning data to the server

A federated learning system that includes.
The method of claim 1,

The model data is

including global parameters of the global model, learning time of the user terminal, and type and size information of the user data to be used for learning

Federated Learning System.
3. The method of claim 2,

The user terminal is

Establishing a learning plan based on the model data and performing learning according to the learning plan

Federated Learning System.
The method of claim 1,

The learning data is

a local model or a local parameter of said local model.

Federated Learning System.
The method of claim 1,

The data management unit

To generate metadata including the size of the learning data, the creation date and time, and distribution characteristics

Federated Learning System.
6. The method of claim 5,

the server

Based on the metadata, determining the range and number of the learning data, selecting the learning data to be collected, or establishing or changing a collection plan of the learning data

Federated Learning System.
The method of claim 1,

The data management unit

Managing the model data and the training data by version

Federated Learning System.
generating, by the server, a global model, and registering model data related to the global model to a data management unit;

transmitting the model data to a plurality of user terminals by the data management unit;

generating, by the plurality of user terminals, learning data by learning the global model based on user data;

registering, by the plurality of user terminals, the learning data to the data management unit;

transmitting, by the data management unit, the learning data to the server; and

improving the global model by collecting the training data by the server

A federated learning method comprising
9. The method of claim 8,

The step of registering the model data to the data management unit

requesting, by the server, the data management unit to register the model data; and

Registering the model data for each version by the data management unit

A federated learning method comprising
9. The method of claim 8,

The step of transferring the model data to the plurality of user terminals includes

requesting, by the plurality of user terminals, the model data to the data management unit; and

transmitting, by the data management unit, the latest version of the model data or the model data of the version requested by the plurality of user terminals to the plurality of user terminals

A federated learning method comprising
9. The method of claim 8,

The step of registering the learning data to the data management unit

requesting, by the user terminal, the data management unit to register the learning data; and

Registering the learning data for each version by the data management unit

A federated learning method comprising
9. The method of claim 8,

The step of delivering the learning data to the server

requesting, by the server, the learning data to the data management unit; and

transmitting, by the data management unit, the latest version of the learning data or the learning data of the version requested by the server to the server

A federated learning method comprising