KR20200083234A

KR20200083234A - Method for Operating Machine Learning Based Federated Distillation, Web Server and Terminal

Info

Publication number: KR20200083234A
Application number: KR1020190168078A
Authority: KR
Inventors: 김성륜; 오승은; 정은정; 김혜성
Original assignee: 연세대학교 산학협력단
Priority date: 2018-12-28
Filing date: 2019-12-16
Publication date: 2020-07-08
Also published as: KR102247322B1

Abstract

According to the present invention, disclosed are a federated distillation based learning driving method, a learning driving server and a learning driving terminal for solving privacy and communication overhead problems that occur in a distributed network by allowing a terminal to collect data samples, calculate a local average logit and transmit the local average logit and seed samples to an uplink of a server and allowing the server to perform distillation of a global model based on the seed samples and the local average logit.

Description

Running method based on federation, learning server and running terminal {Method for Operating Machine Learning Based Federated Distillation, Web Server and Terminal}

본 발명은 러닝 구동 방법에 관한 것으로서, 특히 연합된 디스틸레이션 (Federated distillation) 기반 러닝 구동 및 통신 오버헤드 경감 방법에 관한 것이다.The present invention relates to a running driving method, and more particularly to a federated distillation-based running driving and communication overhead reduction method.

단말이 보유하고 있는 샘플 수가 제한되어 있는 분산 네트워크 상황에서 각 단말들이 로컬 트레인(local train)을 할 때, 가지고 있는 샘플들에 편향(bias)된 모델을 생성하는 문제점이 발생한다. 이 때, 각 단말들이 서로 정보 교환을 함으로써 로컬 러닝(local learning) 상황에서 발생하는 오버피팅 (overfitting) 문제를 해결하며 전체적인 테스트 정확도를 향상시킬 수 있다. In a distributed network situation where the number of samples held by the terminal is limited, when each terminal performs a local train, a problem arises in generating a biased model for the samples possessed. At this time, each terminal can exchange information with each other to solve the overfitting problem occurring in a local learning situation and improve overall test accuracy.

분산 네트워크에서 단말들끼리 원시 데이터 샘플(raw data sample)들을 직접 교환하는 방식은 원시 데이터 샘플의 사이즈와 수를 고려하였을 때, 페이로드 (payload) 사이즈와 통신 오버헤드가 매우 크게 나타난다. 또한, 프라이버시 (privacy)에 대한 보호가 되지 않는다.In a method of directly exchanging raw data samples between terminals in a distributed network, payload size and communication overhead are very large when considering the size and number of raw data samples. In addition, there is no protection against privacy.

본 발명은 러닝 구동 방법에 관한 것으로, 단말이 데이터 샘플들을 수집하여 로컬 평균 로짓을 산정하고, 상기 로컬 평균 로짓과 시드 샘플들을 서버의 업링크로 전송하고, 상기 서버가 상기 시드 샘플과 상기 로컬 평균 로짓을 기반으로 글로벌 모델의 디스틸레이션(distillation)을 수행하여 분산 네트워크에 발생하는 프라이버시 및 통신 오버헤드 문제를 해결하는 것을 그 목적으로 한다.The present invention relates to a running driving method, a terminal collects data samples, calculates a local average logit, transmits the local average logit and seed samples to an uplink of a server, and the server averages the seed sample and the local average The objective is to solve the privacy and communication overhead problems that occur in distributed networks by performing distillation of a global model based on logit.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Still other unspecified objects of the present invention can be further considered within the scope of being easily deduced from the following detailed description and its effects.

상기 과제를 해결하기 위해, 본 발명의 일 실시예에 따른 분산 네트워크에서의 러닝 구동 방법은, 서버와 다수의 단말들로 구성되는 분산 네트워크에서의 러닝 구동 방법에 있어서, 상기 단말이 데이터 샘플들을 수집하여 로컬 평균 로짓을 산정하고, 상기 로컬 평균 로짓을 상기 서버의 업링크로 전송하는 단계, 상기 단말이 시드 샘플들을 상기 서버의 업링크로 전송하는 단계 및 상기 서버가 상기 시드 샘플과 상기 로컬 평균 로짓을 기반으로 글로벌 모델의 디스틸레이션(distillation)을 수행하는 단계를 포함한다.In order to solve the above problem, a running driving method in a distributed network according to an embodiment of the present invention is a running driving method in a distributed network composed of a server and a plurality of terminals, wherein the terminal collects data samples To calculate a local average logit, transmitting the local average logit to the uplink of the server, transmitting the terminal samples to the server uplink, and the server to the seed sample and the local average logit And performing distillation of the global model based on the.

여기서, 상기 서버가 글로벌 모델의 디스틸레이션(distillation)을 수행하는 단계 이전에, 정보 보호를 위해 상기 서버가 상기 시드 샘플들에 랜덤 노이즈를 부여하는 단계를 더 포함한다.Here, before the server performs the distillation (distillation) of the global model, the server further comprises the step of giving random noise to the seed samples for information protection.

여기서, 상기 서버가 상기 시드 샘플과 상기 로컬 평균 로짓을 기반으로 글로벌 모델의 디스틸레이션(distillation)을 수행하는 단계는, 상기 로컬 평균 로짓을 글로벌 모델 파라미터로 변환하는 단계 및 상기 글로벌 모델 파라미터와 상기 시드 샘플로 상기 글로벌 모델을 트레인(train) 하는 단계를 포함한다.Here, the server performing the distillation of the global model based on the seed sample and the local average logit comprises: converting the local average logit into global model parameters and the global model parameters and the And training the global model with a seed sample.

여기서, 상기 트레인(train)한 글로벌 모델을 상기 서버의 다운 링크로 전송하는 단계를 더 포함한다.Here, the method further includes transmitting the trained global model to the downlink of the server.

여기서, 상기 단말이 데이터 샘플들을 수집하여 로컬 평균 로짓을 산정하고, 상기 로컬 평균 로짓을 상기 서버의 업링크로 전송하는 단계는, 상기 단말이 데이터 샘플들 중에서 로컬 트레인(local train)을 진행하여 나온 로컬 로짓 별로 샘플을 구분하여 각각을 로컬 레이블로 저장하는 단계, 상기 단말이 각각의 로컬 레이블 별로 로컬 평균 로짓을 계산하는 단계 및 상기 단말이 계산된 상기 로컬 레이블 별 로컬 평균 로짓을 서버로 전송하는 단계를 포함한다.Here, the terminal collects data samples, calculates a local average logit, and transmits the local average logit to the uplink of the server. The terminal comes out by performing a local train among data samples. Separating samples for each local logit and storing each as a local label, the terminal calculating a local average logit for each local label, and transmitting the local average logit for each local label calculated by the terminal to a server It includes.

여기서, 상기 다수의 단말들은, 제1 단말 내지 제3 단말을 포함하며, 상기 서버가 상기 제1 단말 및 제2 단말로부터 각각 받은 상기 로컬 레이블 별 로컬 평균 로짓을 이용하여 글로벌 모델을 트레인(train) 하는 단계 및 상기 제3 단말이 상기 트레인(train)한 글로벌 모델을 상기 서버로부터 전달 받아 손실 함수에 반영하여 제2 로컬 트레인(local train)을 진행하는 단계를 더 포함한다.Here, the plurality of terminals includes a first terminal to a third terminal, and the server trains a global model using a local average logit for each local label received from the first terminal and the second terminal, respectively. The method further includes a step of performing a second local train by reflecting the global model trained by the third terminal from the server to a loss function.

여기서, 상기 서버가 상기 제1 단말 및 제2 단말로부터 각각 받은 상기 로컬 레이블 별 로컬 평균 로짓을 이용하여 글로벌 모델을 트레인(train) 하는 단계는, 기 설정된 트레인 정확도(train accuracy)가 타겟 이상이 될 때까지 반복된다.Here, when the server trains a global model using a local average logit for each of the local labels received from the first terminal and the second terminal, the preset train accuracy is greater than or equal to the target. Repeat until.

본 발명의 일 실시예에 따른 분산 네트워크의 러닝 구동 서버는, 다수의 단말들과 무선 링크를 통해 연결되며, 상기 단말들로부터 상기 단말이 데이터 샘플들을 수집하여 산정한 로컬 평균 로짓을 업링크로 전달 받고, 상기 단말로부터 시드 샘플들을 업링크로 전달 받아, 상기 로컬 평균 로짓을 글로벌 모델 파라미터로 변환하고, 상기 글로벌 모델 파라미터와 상기 시드 샘플로 상기 글로벌 모델을 트레인(train) 하며, 상기 트레인(train)한 글로벌 모델을 상기 서버의 다운 링크로 전송한다.The running drive server of the distributed network according to an embodiment of the present invention is connected to a plurality of terminals through a wireless link, and delivers the local average logit calculated by the terminals collecting data samples from the terminals to the uplink Receiving, receiving seed samples from the terminal on the uplink, converting the local average logit into global model parameters, train the global model with the global model parameters and the seed sample, and train One global model is sent to the server's downlink.

본 발명의 일 실시예에 따른 분산 네트워크의 러닝 구동 단말은, 서버와 무선 링크를 통해 연결되며, 데이터 샘플들을 수집하여 로컬 평균 로짓을 산정하고, 상기 로컬 평균 로짓을 상기 서버의 업링크로 전송하며, 시드 샘플들을 상기 서버의 업링크로 전송한다.A running driving terminal of a distributed network according to an embodiment of the present invention is connected to a server through a wireless link, collects data samples, calculates a local average logit, transmits the local average logit to the uplink of the server, , Seed samples are transmitted to the uplink of the server.

본 발명의 일 실시예에 따른 분산 네트워크의 러닝 구동 단말은, 서버와 무선 링크를 통해 연결되며, 상기 서버로부터 상기 서버가 로컬 평균 로짓을 글로벌 모델 파라미터로 변환하고, 상기 글로벌 모델 파라미터와 상기 시드 샘플로 상기 글로벌 모델을 트레인(train) 하며, 상기 트레인(train)한 글로벌 모델을 다운 링크로 전달 받아 손실 함수에 반영하여 로컬 트레인(local train)을 진행한다.The running driving terminal of the distributed network according to an embodiment of the present invention is connected through a wireless link with a server, and the server converts the local average logit into a global model parameter, and the global model parameter and the seed sample The train trains the global model, and receives the trained global model as a downlink and reflects it in a loss function to perform a local train.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 단말이 데이터 샘플들을 수집하여 로컬 평균 로짓을 산정하고, 상기 로컬 평균 로짓과 시드 샘플들을 서버의 업링크로 전송하고, 상기 서버가 상기 시드 샘플과 상기 로컬 평균 로짓을 기반으로 글로벌 모델의 디스틸레이션(distillation)을 수행하여 분산 네트워크에 발생하는 프라이버시 및 통신 오버헤드 문제를 해결할 수 있다.As described above, according to embodiments of the present invention, the terminal collects data samples, calculates a local average logit, transmits the local average logit and seed samples to an uplink of a server, and the server samples the seed And distillation of the global model based on the local average logit to solve the privacy and communication overhead problems occurring in the distributed network.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if the effects are not explicitly mentioned herein, the effects described in the following specification expected by the technical features of the present invention and the potential effects thereof are treated as described in the specification of the present invention.

도 1 및 도 2는 본 발명의 일 실시예에 따른 연합된 디스틸레이션 기반의 러닝 구동 방법의 분산 네트워크를 나타낸 도면이다.
도 3은 본 발명의 일 실시예에 따른 로짓 벡터의 포맷을 나타낸 도면이다.
도 4는 본 발명의 일 실시예에 따른 FD 알고리즘을 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 FLD 알고리즘을 나타낸 도면이다.
도 6은 본 발명의 일 실시예에 따른 학습 곡선을 나타낸 도면이다.
도 7 및 도 8은 본 발명의 일 실시예에 따른 연합된 디스틸레이션 기반의 러닝 구동 방법을 나타낸 흐름도이다.1 and 2 are views illustrating a distributed network of a method for driving a learning based on a federated distilation according to an embodiment of the present invention.
3 is a diagram illustrating a format of a logit vector according to an embodiment of the present invention.
4 is a diagram illustrating an FD algorithm according to an embodiment of the present invention.
5 is a diagram showing an FLD algorithm according to an embodiment of the present invention.
6 is a view showing a learning curve according to an embodiment of the present invention.
7 and 8 are flowcharts illustrating a method of driving a federation based on a distilation according to an embodiment of the present invention.

이하, 본 발명에 관련된 연합된 디스틸레이션 기반의 러닝 구동 방법, 러닝 구동 서버 및 러닝 구동 단말에 대하여 도면을 참조하여 보다 상세하게 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다.Hereinafter, an associated distiling-based running driving method, a running driving server, and a running driving terminal according to the present invention will be described in more detail with reference to the drawings. However, the present invention may be implemented in various different forms, and is not limited to the described embodiments. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. When an element is said to be "connected" or "connected" to another component, it is understood that other components may be directly connected or connected to the other component, but other components may exist in the middle. It should be.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from other components.

본 발명은 연합된 디스틸레이션 기반의 러닝 구동 방법, 러닝 구동 서버 및 러닝 구동 단말에 관한 것이다.The present invention relates to a federation-based learning driving method, a running driving server, and a running driving terminal.

도 1 및 도 2는 본 발명의 일 실시예에 따른 연합된 디스틸레이션 기반의 러닝 구동 방법의 분산 네트워크를 나타낸 도면이다.1 and 2 are views illustrating a distributed network of a method for driving a learning based on a federated distilation according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 분산 네트워크는 다수의 단말들(10)과 서버(20)로 구성된다. 여기서, 단말의 개수는 본 발명의 일 실시예에 한정되는 것이 아니며, 다수개의 단말을 포함할 수 있다.1 and 2, the distributed network is composed of a plurality of terminals 10 and the server 20. Here, the number of terminals is not limited to one embodiment of the present invention, and may include a plurality of terminals.

분산 네트워크에 발생하는 프라이버시 및 통신 오버헤드 문제를 해결하기 위해 페이로드 사이즈가 작으며 갖고 있는 샘플을 직접 전송하지 않으며 교환을 진행했을 때 전체 시스템 테스트 정확도를 향상시킬 수 있는 정보가 필요하다. 본 발명의 일 실시예에 따른 FD(Federated Distillation) 동작 방식은 검증 자료 레이블(Ground-truth label)을 활용하여 레이블 별로 샘플을 묶고 각 샘플에 해당하는 로짓을 평균내어 얻은 레이블 별 평균로짓벡터를 활용하여 분산 네트워크의 주요 문제점을 해결함과 동시에 각 단말의 테스트 정확도를 끌어올릴 수 있다.In order to solve the privacy and communication overhead problems that occur in distributed networks, the payload size is small and does not transmit the samples directly, and information is needed to improve the accuracy of the overall system test when the exchange is performed. In the federated distillation (FD) operation method according to an embodiment of the present invention, an average logit vector for each label obtained by grouping samples for each label and averaging a logit corresponding to each sample using a ground-truth label By utilizing it, it is possible to solve the main problems of the distributed network and increase the test accuracy of each terminal.

본 발명에서는 분산 네트워크에서 각 단말들이 통신 코스트가 적은 정보를 교환하고 이를 바탕으로 러닝을 구동하는 방법을 제안한다. 이를 통해 각 단말의 테스트 정확도(test accuracy)를 보장하며 단말 간 정보 교환 시 발생하는 통신 오버헤드를 감소시킬 수 있다. 또한, 분산 네트워크에서 발생하는 프라이버시 문제를 해결할 수 있다.In the present invention, a method is proposed in which each terminal exchanges information with a low communication cost in a distributed network and drives learning based on the information. Through this, it is possible to guarantee test accuracy of each terminal and reduce communication overhead that occurs when exchanging information between terminals. In addition, it is possible to solve the privacy problem occurring in the distributed network.

종래의 경우 단말이 보유하고 있는 샘플 수가 제한되어 있는 분산 네트워크 상황에서 각 단말들이 로컬 트레인(local train)을 할 때, 가지고 있는 샘플들에 편향(bias)된 모델을 생성하는 문제점이 발생한다. 이 때, 각 단말들이 서로 정보 교환을 함으로써 로컬 러닝(local learning) 상황에서 발생하는 오버피팅 (overfitting) 문제를 해결하며 전체적인 테스트 정확도를 향상시킬 수 있다.In the conventional case, in a distributed network situation in which the number of samples held by the terminal is limited, when each terminal performs a local train, a problem occurs in generating a biased model for the samples possessed. At this time, each terminal can exchange information with each other to solve the overfitting problem occurring in a local learning situation and improve overall test accuracy.

대표적인 방식으로 단말들이 서로 가지고 있는 원시 데이터 샘플(raw data sample)들을 직접 교환하는 방식이 있다. 직접적으로 원시 데이터 샘플(raw data sample)들을 교환하지 않는 대신에 로컬 트레이닝(local training)을 진행하며 일정 주기마다 중앙의 서버(server)에 학습한 모델의 가중치(weight)를 전송해주고 서버는 여러 단말로부터 받은 모델 가중치(weight)를 평균내어 각 단말로 전송해주는 평균 가중치(averaging weight) 기반 연합 학습(federated learning)이 있다.As a typical method, there is a method of directly exchanging raw data samples that terminals have with each other. Instead of directly exchanging raw data samples, local training is performed, and the weight of the trained model is transmitted to a central server at regular intervals, and the server is connected to multiple terminals. There is federated learning based on averaging weights, which averages model weights received from and transmits them to each terminal.

그 외, 온라인 디스틸레이션(online distillation(co-distillation))의 경우, 일정 주기마다 단말들은 가지고 있는 원시 데이터 샘플(raw data sample)들과 그것을 로컬 러닝 모델(local learning model)에 대입했을 때 나오는 로짓 벡터를 서버에 업로드해주고 서버는 샘플-로짓 페어를 평균내어 저장해둔다. 그 후, 단말들이 로컬 트레인(local train)을 진행할 때 서버에 샘플을 요청해주고 서버는 샘플에 해당하는 로짓을 단말에 전송해준다. In addition, in the case of online distillation (co-distillation), the terminals come out of raw data samples that they have every period and when they are assigned to a local learning model. The logit vector is uploaded to the server, and the server averages and stores the sample-logit pair. Thereafter, when the terminals are performing a local train, the server requests a sample, and the server transmits a logit corresponding to the sample to the terminal.

종래 기술에서 분산 네트워크에서 단말들끼리 원시 데이터 샘플(raw data sample)들을 직접 교환하는 방식은 원시 데이터 샘플(raw data sample)의 사이즈와 수를 고려하였을 때, 페이로드(payload) 사이즈와 통신 오버헤드가 매우 크게 나타난다. 또한, 프라이버시(privacy)에 대한 보호가 되지 않는다.In a conventional technique, a method of directly exchanging raw data samples between terminals in a distributed network, when considering the size and number of raw data samples, payload size and communication overhead Appears very loud. In addition, there is no protection against privacy.

연합 학습(federated learning)의 경우, 모델 가중치(weight)를 교환하기 때문에 원시 데이터 샘플(raw data sample)을 교환하는 방식 대비 프라이버시가 보장된다. 페이로드 사이즈 또한 비교적 줄어들지만 실제 변동(fluctuation)이 심한 채널에서 전송하기에 한계가 있다.In the case of federated learning, since model weights are exchanged, privacy is guaranteed compared to a method of exchanging raw data samples. The payload size is also relatively reduced, but there is a limit to the transmission in a channel that has a real fluctuation.

온라인 디스틸레이션(Online distillation)의 경우, 다운링크(downlink, DL)에서 페이로드 사이즈가 작으며 프라이버시가 보장된다. 그러나 업링크(uplink, UL)에서 페이로드 사이즈가 매우 크며 프라이버시 보호 또한 되지 않는다. 또한, 단말이 요청한 원시 데이터 샘플(raw data sample)들을 서버가 가지고 있어야 이득이 생기는 구조이기 때문에 단말들이 가진 샘플들끼리의 상관관계(correlation)에 따라 성능 상승폭이 결정되는 제약이 추가적으로 발생한다. In the case of online distillation, the payload size is small in the downlink (DL) and privacy is guaranteed. However, the payload size is very large in the uplink (UL) and privacy protection is not provided. In addition, since the server has a structure in which the server has the raw data samples requested by the terminal to have a gain, a limitation in determining a performance increase according to the correlation between the samples of the terminals is additionally generated.

본 발명의 실시예들에 의하면, 검증자료 레이블(Ground-truth label)을 활용하여 레이블 별로 샘플을 묶고 각 샘플에 해당하는 로짓을 평균하여 얻은 레이블 별 평균 로짓 벡터를 활용하여 분산 네트워크에 발생하는 프라이버시 및 통신 오버헤드 문제를 해결할 수 있다.According to embodiments of the present invention, privacy generated in a distributed network by using an average logit vector for each label obtained by grouping samples for each label by using a round-truth label and averaging the logit corresponding to each sample And communication overhead problems.

제1 단말(11)과 제2 단말(12)은 각각 데이터 샘플들을 수집하여 로컬 로짓으로 저장한다.Each of the first terminal 11 and the second terminal 12 collects data samples and stores them as a local logit.

이후, 로컬 평균 로짓을 산정하고, 상기 로컬 평균 로짓을 상기 서버의 업링크로 전송한다.Thereafter, a local average logit is calculated, and the local average logit is transmitted to the uplink of the server.

구체적으로, 단말이 데이터 샘플들 중에서 로컬 트레인(local train)을 진행하여 나온 로컬 로짓 별로 샘플을 구분하여 각각을 로컬 레이블로 저장하고, 상기 단말이 각각의 로컬 레이블 별로 로컬 평균 로짓을 계산한 후 상기 단말이 계산된 상기 로컬 레이블 별 로컬 평균 로짓을 서버로 전송한다.Specifically, among the data samples, the terminal classifies samples for each local logit generated by performing a local train and stores each as a local label, and after the terminal calculates a local average logit for each local label, the The terminal transmits the calculated local average logit for each local label to the server.

제1 단말 및 제2 단말은 각각 로컬 트레인(local train)을 진행하며 나온 로짓을 레이블 별로 저장한다.The first terminal and the second terminal each perform a local train and store the logit generated by each label.

로짓은 수학식 1을 이용하여 구현할 수 있다. 수학식 1은 예를 들어, 랜덤하게 뽑은 샘플 x의 검증자료 레이블(Ground-truth label)이 n인 경우이다.Logit can be implemented using Equation 1. Equation 1 is, for example, a case where the randomly drawn sample x has a round-truth label of n.

여기서, logit(x)는 x를 모델에 입력했을 때의 출력 값이며, count(n)은 검증자료 레이블(Ground-truth label)이 n인 샘플의 수를 저장하는 값이다. 위 과정은 뽑은 모든 샘플에 대해 반복된다.Here, logit(x) is an output value when x is input to the model, and count(n) is a value that stores the number of samples with a round-truth label of n. The above process is repeated for all samples drawn.

본 발명의 일 실시예에 따른 로짓 벡터의 포맷은 도 2에서 설명한다.The format of the logit vector according to an embodiment of the present invention is described in FIG. 2.

제1 단말 및 제2 단말(10a, 10b)은 각각 로컬 트레인(local train)을 진행하며 나온 로짓을 레이블 별로 저장한다.The first terminal and the second terminal 10a and 10b each perform a local train and store the logit generated by each label.

단말은 매 T_p 반복(iteration)마다 로컬 레이블 별 평균 로짓 벡터를 계산한다.The UE calculates an average logit vector for each local label every T _p iteration.

로컬 레이블 별 평균 로짓 벡터 계산은 수학식 2를 이용하여 구현할 수 있다. 수학식 2는 예를 들어, 단말 d와 검증자료 레이블(ground-truth label) n에 대해 나타낸 것이다.The average logit vector calculation for each local label can be implemented using Equation (2). Equation 2 shows, for example, the terminal d and the ground-truth label n.

여기서, sum(n)은 검증자료 레이블(ground-truth label)이 n인 샘플들에 해당하는 로짓벡터들의 벡터 합이고, local(d, n)은 단말 d에서 검증자료 레이블(ground-truth label) n에 대한 로컬 레이블 별 평균 로짓 벡터이다.

Here, sum(n) is a vector sum of logit vectors corresponding to samples whose ground-truth label is n, and local(d, n) is a ground-truth label at terminal d. The average logit vector by local label for n.

위 과정은 모든 검증자료 레이블(ground-truth label)들에 대해 시행된다.The above process is performed for all ground-truth labels.

제1 단말 및 제2 단말은 계산된 로컬 레이블 별 평균로짓벡터를 서버로 전송한다.The first terminal and the second terminal transmit the calculated average logit vector for each local label to the server.

서버(20)는 단말들로부터 받은 로컬 레이블 별 평균로짓벡터를 바탕으로 글로벌 레이블 별 평균로짓벡터를 계산한다.The server 20 calculates the average logit vector for each global label based on the average logit vector for each local label received from the terminals.

글로벌 레이블 별 평균로짓벡터 계산은 수학식 3을 이용하여 구현할 수 있다. 수학식 3은 예를 들어, 검증자료 레이블(ground-truth label) n에 대해 나타낸 것이다.The average logit vector calculation for each global label can be implemented using Equation (3). Equation 3 shows, for example, a ground-truth label n.

여기서, global(n)은 검증자료 레이블(ground-truth label)이 n에 대한 글로벌 레이블 별 평균로짓벡터이며 D는 분산 네트워크에 참여하는 모든 단말들의 수이다.Here, global(n) is the average logit vector for each global label for the ground-truth label, and D is the number of all terminals participating in the distributed network.

위 과정은 모든 검증자료 레이블(ground-truth label)들에 대해 시행된다. The above process is performed for all ground-truth labels.

제3 단말(30)은 서버로부터 받은 글로벌 레이블 별 평균로짓벡터를 손실 함수에 반영하여 로컬 트레인(local train)을 진행하며, 단말의 트레인 정확도(train accuracy)가 타겟 이상이 될 때까지 도 1에 나타난 과정을 반복한다.The third terminal 30 performs a local train by reflecting the average logit vector for each global label received from the server in the loss function, until the train accuracy of the terminal becomes greater than or equal to the target. Repeat the process shown in.

또한, 제1 단말(11)과 제2 단말(12)은 시드 샘플들을 상기 서버의 업링크로 전송한다.In addition, the first terminal 11 and the second terminal 12 transmit the seed samples to the uplink of the server.

제1 단말(11)과 제2 단말(12)은, 서버와 무선 링크를 통해 연결되며, 데이터 샘플들을 수집하여 로컬 평균 로짓을 산정하고, 상기 로컬 평균 로짓을 상기 서버의 업링크로 전송하며, 시드 샘플들을 상기 서버의 업링크로 전송한다.The first terminal 11 and the second terminal 12 are connected through a wireless link with the server, collect data samples, calculate a local average logit, and transmit the local average logit to the uplink of the server, Seed samples are sent to the uplink of the server.

여기서, 시드 샘플을 업링크로 전송하기 위해 단말이 서로 다른 라벨을 갖는 시드 샘플들을 무작위로 선택하고, 선택한 상기 시드 샘플들을 기 설정된 혼합비로 선형 결합한다.Here, in order to transmit the seed sample to the uplink, the UE randomly selects seed samples having different labels, and linearly combines the selected seed samples with a preset mixing ratio.

또한, 서버는 글로벌 모델의 디스틸레이션(distillation)을 수행하는 단계 이전에, 정보 보호를 위해 상기 서버가 상기 시드 샘플들에 랜덤 노이즈를 부여할 수 있다.In addition, before the step of performing distillation of the global model, the server may provide random noise to the seed samples by the server for information protection.

서버(20)는 시드 샘플과 상기 로컬 평균 로짓을 기반으로 글로벌 모델의 디스틸레이션(distillation)을 수행한다.The server 20 performs distillation of the global model based on the seed sample and the local average logit.

구체적으로, 서버는, 다수의 단말들과 무선 링크를 통해 연결되며, 상기 단말들로부터 상기 단말이 데이터 샘플들을 수집하여 산정한 로컬 평균 로짓을 업링크로 전달 받고, 상기 단말로부터 시드 샘플들을 업링크로 전달 받아, 상기 로컬 평균 로짓을 글로벌 모델 파라미터로 변환하고, 상기 글로벌 모델 파라미터와 상기 시드 샘플로 상기 글로벌 모델을 트레인(train) 하며, 상기 트레인(train)한 글로벌 모델을 상기 서버의 다운 링크로 전송한다.Specifically, the server is connected through a wireless link with a plurality of terminals, and receives the local average logit calculated by the terminal by collecting data samples from the terminals as an uplink, and uplinks the seed samples from the terminal Received, and converts the local average logit into a global model parameter, trains the global model with the global model parameter and the seed sample, and trains the trained global model as a downlink of the server. send.

글로벌 모델의 디스틸레이션(distillation)을 수행하는 것은, 상기 로컬 평균 로짓을 글로벌 모델 파라미터로 변환하고, 상기 글로벌 모델 파라미터와 상기 시드 샘플로 상기 글로벌 모델을 트레인(train) 한다.Performing distillation of the global model converts the local average logit into global model parameters, and trains the global model with the global model parameters and the seed samples.

이후, 서버는 상기 트레인(train)한 글로벌 모델을 상기 서버의 다운 링크로 전송하고, 제3 단말(13)은 트레인(train)한 글로벌 모델을 상기 서버로부터 전달 받아 손실 함수에 반영하여 트레이닝 데이터로 로컬 트레인(local train)을 진행하게 된다.Thereafter, the server transmits the trained global model to the downlink of the server, and the third terminal 13 receives the trained global model from the server and reflects the trained global model in the loss function to generate training data. You will proceed with a local train.

제3 단말(13)은, 서버와 무선 링크를 통해 연결되며, 상기 서버로부터 상기 서버가 로컬 평균 로짓을 글로벌 모델 파라미터로 변환하고, 상기 글로벌 모델 파라미터와 상기 시드 샘플로 상기 글로벌 모델을 트레인(train) 하며, 상기 트레인(train)한 글로벌 모델을 다운 링크로 전달 받아 손실 함수에 반영하여 로컬 트레인(local train)을 진행한다.The third terminal 13 is connected via a wireless link with a server, and the server converts a local average logit into global model parameters from the server, and trains the global model with the global model parameters and the seed sample. ), and receives the trained global model as a downlink and reflects it in the loss function to perform a local train.

즉, 서버가 상기 제1 단말 및 제2 단말로부터 각각 받은 상기 로컬 레이블 별 로컬 평균 로짓을 이용하여 글로벌 모델을 트레인(train) 하고, 상기 제3 단말이 상기 트레인(train)한 글로벌 모델을 상기 서버로부터 전달 받아 손실 함수에 반영하여 제2 로컬 트레인(local train)을 진행하게 되는 것이다.That is, the server trains a global model using the local average logit for each local label received from the first terminal and the second terminal, and the global model trained by the third terminal is the server. It is transmitted from and reflected in the loss function to proceed with the second local train.

기존 연합 학습(federated learning)과 비교하였을 때, 업링크 및 다운링크의 페이로드 사이즈를 줄이는 것이 가능하나 러닝의 최종 테스트 정확도(test accuracy) 측면에서 보았을 때 손실이 발생한다.Compared to conventional federated learning, it is possible to reduce the payload size of the uplink and downlink, but there is a loss in terms of the final test accuracy of running.

서버와 디바이스들로 구성된 일반적인 단말 시스템(cellular system)에서는 단말(device)들의 업링크 전송 파워(uplink transmission power)가 균등하게 나타난다. 채널 용량(channel capacity)이 부족한 업링크에서 레이블 별 평균 로짓 벡터 전송을 활용하고, 용량(capacity)이 상대적으로 넉넉한 다운링크(downlink)에서는 연합 학습(federated learning)에서처럼 모델 가중치(weight)전송을 활용하여 다운링크(downlink)- 업링크(uplink)의 채널 용량 제한(channel capacity constraint)를 만족시키며 최종 테스트 정확도(test accuracy)에서 향상된 성능을 기대할 수 있다. 이러한 구조가 성립되기 위해서는 단말(device)들의 업링크 전송 시 시드 샘플(seed sample) 몇 개를 추가로 보내줌으로써 시드 샘플과 평균 로짓 벡터 값을 바탕으로 중앙의 서버가 글로벌 모델(global model)을 트레인(train)하여 이의 모델 가중치(model weight)를 다운링크로 전송해 줄 수 있다.In a typical cellular system composed of servers and devices, uplink transmission power of devices appears equally. The average logit vector transmission for each label is used in the uplink where the channel capacity is insufficient, and the model weight transmission is used in the downlink having a relatively large capacity as in federated learning. Thus, it satisfies the channel capacity constraint of the downlink-uplink and can expect improved performance in final test accuracy. In order for this structure to be established, by sending several additional seed samples during uplink transmission of devices, the central server trains the global model based on the seed sample and the average logit vector value. You can (train) and send its model weight to the downlink.

도 3은 본 발명의 일 실시예에 따른 로짓 벡터의 포맷을 나타낸 도면이다.3 is a diagram illustrating a format of a logit vector according to an embodiment of the present invention.

로짓 벡터의 사이즈는 단말이 지도 학습(Supervised learning)을 통해 분류하고자 하는 총 레이블 수와 같다.The size of the logit vector is equal to the total number of labels that the terminal intends to classify through supervised learning.

입력 샘플에 대해 로짓 벡터(110)가 정해졌을 때, 벡터 내 각 원소의 값이 의미하는 바는 현재 단말이 가지고 있는 모델이 샘플을 해당 레이블(100)로 분류할 확률과 같다.When the logit vector 110 is determined for the input sample, the value of each element in the vector is equal to the probability that the model currently owned by the terminal classifies the sample as the corresponding label 100.

예를 들어, 단말 d의 총 데이터 샘플 수가 N이며, 분류하고자 하는 레이블의 집합(120)이 {1, 2, 3}이라 주어졌을 때, 로짓벡터는 도 3에 나타난 바와 같이 구현된다.For example, when the total number of data samples of the terminal d is N, and the set 120 of labels to be classified is given as {1, 2, 3}, the logit vector is implemented as shown in FIG. 3.

도 4는 본 발명의 일 실시예에 따른 FD 알고리즘을 나타낸 도면이다.4 is a diagram illustrating an FD algorithm according to an embodiment of the present invention.

도 4에 나타난 바와 같이 연합된 디스틸레이션(Federated Distillation) 알고리즘은 예측 함수(Prediction function): F(w, input), 손실 함수(Loss function): φ(F, label), Ground-truth label: y_input을 요구한다.As shown in FIG. 4, the federated distillation algorithm includes prediction functions: F(w, input), loss function: φ(F, label), and ground-truth label: Requires y _input .

설정된 S는 모든 장치의 전체 데이터 세트를 나타내며, B는 각 장치에서 묶인 집단을 나타낸다.The set S represents the entire data set of all devices, and B represents the grouping of each device.

함수 F(w, a)는 소프트맥스 함수(softmax function)에 의해 정규화된 로짓 벡터로서, 여기서 w와 a는 모델의 무게와 입력이다. The function F(w, a) is a logit vector normalized by the softmax function, where w and a are the weight and input of the model.

함수 φ(p, q)는 p와 q 사이의 교차 엔트로피로서, 손실 함수(Loss function)와 distillation 정규화(regularizer)에 모두 사용된다. The function φ(p, q) is the crossing entropy between p and q, which is used for both the loss function and the distillation regularizer.

여기서, η는 학습율(learning rate)상수, γ는 distillation 정규화(regularizer)의 가중치 파라미터이다. Here, η is a learning rate constant and γ is a weighting parameter of a distillation regularizer.

i번째 디바이스에서

는 트레이닝 샘플이 l번째 ground-truth label에 해당하고, k번 반복한 로컬 레이블 별 평균 로짓 벡터이다.on the i device

Is an average logit vector for each local label where the training sample corresponds to the l-th ground-truth label and repeats k times.

는 글로벌 레이블 별 평균 로짓 벡터이며, 수학식 4로 구현된다.

Is an average logit vector for each global label, and is implemented by Equation 4.

여기서, M은 분산 네트워크에 참여하는 모든 단말들의 수 이다.Here, M is the number of all terminals participating in the distributed network.

또한,

는 ground-truth label이 l인 샘플의 수이다.Also,

Is the number of samples with a ground-truth label of l.

도 5는 본 발명의 일 실시예에 따른 FLD 알고리즘을 나타낸 도면이다.5 is a diagram showing an FLD algorithm according to an embodiment of the present invention.

도 5에 나타난 바와 같이 FLD(Federated Learning after Distillation) 알고리즘은 아웃풋 업로드, 믹스업, 아웃풋-모델 변환, 역-믹스업, 모델 다운로드 과정을 포함한다.As shown in FIG. 5, the federated learning after distillation (FLD) algorithm includes output upload, mixup, output-model conversion, inverse-mixup, and model download processes.

아웃풋-모델 변환의 핵심 아이디어는 G^P _out,n의 지식을 가중치 벡터 G^P _mod를 가진 글로벌 모델로 변환하는 것이다.The key idea of output-model transformation is to transform the knowledge of G ^P _out,n into a global model with weight vector G ^P _mod .

이를 활성화하려면 처음에 (예: p = 1) 각각의 단말들은 로컬 데이터 세트에서 임의로 선택된 N_s시드 샘플들을 업로드한다.To activate this, initially (eg, p = 1), each terminal uploads randomly selected N _s seed samples from a local data set.

글로벌 가중치 벡터 w_s ^(k)는 수학식 5로 나타난다.The global weight vector w _s ^(k) is represented by Equation (5).

여기서, F_{s, n} ^[ik] 은 n 번째 레이블 인 경우 글로벌 모델의 출력 벡터이다.Here, F _{s, n} ^[ik] is the output vector of the global model for the n-th label.

서버는 모든 장치에서 다운로드 한 G^P _mod = w_s ^(ks)를 산출한다.The server calculates G ^P _mod = w _s ^(ks) downloaded from all devices.

도 6은 본 발명의 일 실시예에 따른 학습 곡선을 나타낸 도면이다.6 is a view showing a learning curve according to an embodiment of the present invention.

도 6은 IID가있는 비대칭 및 대칭 (P up = P dn = 40 dBm, W up = W dn = 10 MHz) 채널에서 FL, FD 및 MixFLD와 비교하여 Mix2FLD에서 임의로 선택된 장치의 학습 곡선 및 비 IID 데이터 세트를 나타낸 것이다.Figure 6 shows the learning curve and non-IID data for randomly selected devices in Mix2FLD compared to FL, FD and MixFLD in asymmetric and symmetric (P up = P dn = 40 dBm, W up = W dn = 10 MHz) channels with IID It shows the set.

도 6은 Mix2FLD가 비대칭 및 대칭 채널 조건에서 최고 정확도와 가장 빠른 수렴을 달성함을 보여준다. FL 업로드 모델 가중치와 비교하여 Mix2FLD의 모델 출력 업로드는 업 링크 페이로드 크기를 최대 622.4 배 줄인다. 업 링크 용량이 제한적인 비대칭 채널 (도 6의 (a) 및 (c))에서는 보다 빈번하고 성공적인 업로드가 가능하여 최대 12 % 더 높은 정확도와 4.6 배 빠른 수렴을 달성한다.FIG. 6 shows that Mix2FLD achieves highest accuracy and fastest convergence in asymmetric and symmetric channel conditions. Compared to the FL upload model weights, Mix2FLD's model output upload reduces the uplink payload size by up to 622.4 times. In asymmetric channels with limited uplink capacity ((a) and (c) in FIG. 6), more frequent and successful uploads are possible, achieving up to 12% higher accuracy and 4.6 times faster convergence.

FD와 비교하여 Mix2FLD는 글로벌 모델 가중치를 다운로드하기 위해 높은 다운 링크 용량을 활용하는데, 이는 모델 출력을 다운로드하는 것보다 더 높은 정확도를 제공한다. 또한 Mix2FLD의 글로벌 정보는 단순히 FD에서 사용되는 로컬 출력을 평균하는 것이 아니라 시드 샘플을 수집하고 글로벌 데이터 분포를 반영하여 구성된다. 이에 따라 Mix2FLD는 FD보다 최대 15 % 높은 정확도와 36 % 빠른 수렴을 달성합니다.Compared to FD, Mix2FLD utilizes high downlink capacity to download global model weights, which provides higher accuracy than downloading model output. Also, the global information of Mix2FLD is constructed by collecting seed samples and reflecting the global data distribution, not simply averaging the local output used in the FD. As a result, Mix2FLD achieves up to 15% higher accuracy and 36% faster convergence than FD.

IID 데이터 세트가 있는 대칭 채널 (도 6의 (b))에서 Mix2FLD와 FL은 가장 높은 정확도를 달성한다. 그럼에도 불구하고 Mix2FLD는 더 작은 업 링크 페이로드 크기와 더 빈번한 업데이트 덕분에 FL보다 3.1 배 더 빠르게 수렴한다.Mix2FLD and FL achieve the highest accuracy in a symmetric channel with an IID data set (Fig. 6(b)). Nevertheless, Mix2FLD converges 3.1 times faster than FL, thanks to the smaller uplink payload size and more frequent updates.

지연 시간, 프라이버시 및 정확도 트레이드 오프의 모든 경우에 Mix2FLD 및 MixFLD에서 시드 샘플 양 (N_s = 10)을 줄이면 정확성이 저하되어 빠른 수렴 시간이 제공되어 지연 시간 정확도의 트레이드 오프가 발생한다.Reducing the amount of seed samples (N _s = 10) in Mix2FLD and MixFLD in all cases of latency, privacy, and accuracy trade-offs degrades accuracy, providing fast convergence time, leading to trade-offs in latency accuracy.

도 7 및 도 8은 본 발명의 일 실시예에 따른 연합된 디스틸레이션 기반의 러닝 구동 방법을 나타낸 흐름도이다.7 and 8 are flowcharts illustrating a method of driving a federation based on a distilation according to an embodiment of the present invention.

도 7을 참조하면, 본 발명의 일 실시예에 따른 연합된 디스틸레이션 기반의 러닝 구동 방법은, 단계 S110에서 단말이 데이터 샘플들을 수집하여 로컬 평균 로짓을 산정하고, 상기 로컬 평균 로짓을 상기 서버의 업링크로 전송한다.Referring to FIG. 7, in a federated distiling-based running driving method according to an embodiment of the present invention, in step S110, a terminal collects data samples to calculate a local average logit, and the local average logit is the server It transmits on the uplink.

단계 S120에서 상기 단말이 시드 샘플들을 상기 서버의 업링크로 전송한다.In step S120, the terminal transmits seed samples to the uplink of the server.

단계 S130에서 상기 서버가 상기 시드 샘플과 상기 로컬 평균 로짓을 기반으로 글로벌 모델의 디스틸레이션(distillation)을 수행한다.In step S130, the server performs distillation of a global model based on the seed sample and the local average logit.

단계 S140에서 상기 트레인(train)한 글로벌 모델을 상기 서버의 다운 링크로 전송한다.In step S140, the trained global model is transmitted to the downlink of the server.

단계 S150에서 단말이 상기 트레인(train)한 글로벌 모델을 상기 서버로부터 전달 받아 손실 함수에 반영하여 로컬 트레인(local train)을 진행한다.In step S150, the terminal receives the train global model from the server and reflects it in the loss function to perform a local train.

구체적으로 설명하면, 도 8을 참조하면, 본 발명의 일 실시예에 따른 연합된 디스틸레이션 기반의 러닝 구동 방법은, 단계 S210에서 상기 단말이 데이터 샘플들 중에서 로컬 트레인(local train)을 진행한다.Specifically, referring to FIG. 8, in a federated distiling-based running driving method according to an embodiment of the present invention, in step S210, the terminal performs a local train among data samples. .

단계 S220에서 단말이 로컬 로짓 별로 샘플을 구분하여 각각을 로컬 레이블로 저장한다.In step S220, the terminal classifies samples for each local logit and stores each as a local label.

단계 S230에서 상기 단말이 각각의 로컬 레이블 별로 로컬 평균 로짓을 계산한다.In step S230, the terminal calculates a local average logit for each local label.

단계 S240에서 상기 단말이 계산된 상기 로컬 레이블 별 로컬 평균 로짓을 서버로 전송한다.In step S240, the terminal transmits the calculated local average logit for each local label to the server.

단계 S250에서 서버가 상기 제1 단말 및 제2 단말로부터 각각 받은 상기 로컬 레이블 별 로컬 평균 로짓을 이용하여 글로벌 모델을 트레인(train)한다.In step S250, the server trains the global model using the local average logit for each local label received from the first terminal and the second terminal, respectively.

단계 S260에서 상기 제3 단말이 상기 트레인(train)한 글로벌 모델을 상기 서버로부터 전달 받아 손실 함수에 반영하여 제2 로컬 트레인(local train)을 진행한다.In step S260, the third terminal receives the train global model from the server and reflects it in the loss function to proceed with the second local train.

단계 S270에서 기 설정된 트레인 정확도(train accuracy)가 타겟 이상이 되는지 확인하며, 타겟 미만인 경우 될 때까지 단계 S210 내지 단계 S260을 반복한다.In step S270, it is checked whether the preset train accuracy is greater than or equal to the target, and if it is less than the target, steps S210 to S260 are repeated.

이상의 설명은 본 발명의 일 실시예에 불과할 뿐, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명의 본질적 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현할 수 있을 것이다. 따라서 본 발명의 범위는 전술한 실시예에 한정되지 않고 특허 청구 범위에 기재된 내용과 동등한 범위 내에 있는 다양한 실시 형태가 포함되도록 해석되어야 할 것이다.The above description is only an embodiment of the present invention, and a person having ordinary knowledge in the technical field to which the present invention belongs may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the scope of the present invention is not limited to the above-described embodiment, it should be interpreted to include various embodiments within the scope equivalent to the contents described in the claims.

Claims

In the running driving method in a distributed network consisting of a server and a plurality of terminals,
The terminal collecting data samples, calculating a local average logit, and transmitting the local average logit to the uplink of the server;
Transmitting, by the terminal, seed samples to the uplink of the server; And
And performing, by the server, distillation of a global model based on the seed sample and the local average logit.

According to claim 1,
Before the server performs the distillation (distillation) of the global model,
The method of driving a running in a distributed network further comprising the step of: providing random noise to the seed samples by the server for information protection.

According to claim 1,
The server performing distillation of a global model based on the seed sample and the local average logit may include:
Converting the local average logit into global model parameters; And
And training the global model with the global model parameters and the seed sample.

According to claim 3,
And transmitting the trained global model to the downlink of the server.

According to claim 1,
The terminal collecting data samples, calculating a local average logit, and transmitting the local average logit to the uplink of the server,
A step in which the terminal classifies samples for each local logit from a local train among data samples and stores each as a local label;
The terminal calculating a local average logit for each local label; And
And transmitting, by the terminal, the calculated local average logit for each local label to a server.

The method of claim 5,
The plurality of terminals includes a first terminal to a third terminal,
The server trains a global model using the local average logit for each local label received from the first terminal and the second terminal, respectively; And
The third terminal receives the train from the global model (train) and reflects it in the loss function to proceed with the second local train (local train); further comprising the learning in a distributed network Driving method.

The method of claim 6,
The step of the server using the local average logit for each of the local labels received from the first terminal and the second terminal to train the global model,
Running method characterized in that it is repeated until the preset train accuracy (train accuracy) is above the target.

In a running server of a distributed network,
The server is connected to a plurality of terminals through a wireless link,
The terminal receives the local average logit calculated by collecting the data samples from the terminals on the uplink,
Receiving seed samples from the terminal to the uplink,
And converting the local average logit into global model parameters, training the global model with the global model parameters and the seed sample, and transmitting the trained global model to the downlink of the server. Distributed network running server.

In the running terminal of a distributed network,
The terminal is connected to the server through a wireless link,
Collecting data samples to calculate a local average logit, sending the local average logit to the uplink of the server,
Running terminal of the distributed network, characterized in that for transmitting the seed samples to the uplink of the server.

In the running terminal of a distributed network,
The terminal is connected to the server through a wireless link,
The server converts the local average logit into a global model parameter, trains the global model with the global model parameter and the seed sample, and receives the trained global model as a downlink. Running train terminal of a distributed network characterized in that the local train (local train) to reflect the loss function.