KR20220107690A

KR20220107690A - Bayesian federated learning driving method over wireless networks and the system thereof

Info

Publication number: KR20220107690A
Application number: KR1020210010619A
Authority: KR
Inventors: 이남윤; 이승훈
Original assignee: 포항공과대학교 산학협력단
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2022-08-02
Also published as: KR102554676B1

Abstract

The present invention relates to a method and system for driving Bayesian federated learning through a wireless network to optimally aggregate quantified gradient information. The method includes the steps of: calculating a local gradient, by a mobile device, using training dataset of the mobile device and global model parameters received from a server; compressing, by the mobile device, the local gradient using a 1-bit quantifier and a scalar quantifier and transmitting the local gradient to the server; and representing, by the server, a local gradient sum for the local gradient using a Bayesian aggregation function, and updating the global model parameter.

Description

Bayesian federated learning driving method and system through wireless network

본 발명은 무선 네트워크를 통한 베이지안 연합학습 구동 방법 및 시스템에 관한 것으로, 정량화된 기울기 정보를 최적으로 집계하기 위한 베이지안 연합학습 알고리즘을 제안하는 기술에 관한 것이다.The present invention relates to a method and system for driving Bayesian federated learning through a wireless network, and to a technique for proposing a Bayesian federated learning algorithm for optimally aggregating quantified gradient information.

연합학습은 로우 데이터(raw data)를 서버와 공유하지 않고 모바일 장치에 배치된 분산된 이기종 학습 데이터셋을 사용하여 서버에서 기계 학습 모델을 학습시키는 분산된 접근 방식이다. 이러한 연합학습은 사용자 생성 데이터의 개인 정보를 요구하는 애플리케이션으로 인해 업계에서 상당한 관심을 받았다. 연합학습은 로컬 데이터 셋을 사용한 모델 최적화, 모델 집계 즉, 모델 평균화의 두 가지 작업을 통해 분산 모델 학습을 반복적으로 수행한다. 예를 들면, 모든 라운드에서 서버는 사용 가능한 모바일 장치에 글로벌 모델을 전송하며, 각 모바일 장치는 로컬에서 사용 가능한 데이터로 글로벌 모델을 최적화한 다음 업데이트된 모델 매개변수(또는 업데이트된 로컬 기울기)를 통신 링크를 통해 서버로 보낸다. 이에, 서버는 모바일 장치에서 수신한 로컬 모델 또는 기울기를 평균화하여 글로벌 모델을 업데이트하고 반복하여 공유한다.Federated learning is a distributed approach that trains machine learning models on a server using a distributed, heterogeneous learning dataset deployed on a mobile device without sharing raw data with the server. This federated learning has received considerable interest in the industry due to applications that require the privacy of user-generated data. Federated learning repeatedly performs distributed model training through two tasks: model optimization using a local data set and model aggregation, that is, model averaging. For example, in every round, the server sends a global model to available mobile devices, each mobile device optimizing the global model with locally available data and then communicating updated model parameters (or updated local gradients). Send it to the server through the link. Accordingly, the server averages the local model or gradient received from the mobile device, updates the global model, and repeatedly shares it.

이와 같이, 수많은 모바일 장치에 배치된 로컬 데이터 셋을 사용하여 글로벌 모델을 최적화하는 것은 어려운 작업이다. 그 중에서 가장 큰 문제는 모바일 장치 및 서버에서 로컬 계산을 업데이트할 때 발생되는 높은 통신 비용이다. 통신 비용은 글로벌 모델 매개변수의 크기와 서버에 연결된 모바일 장치 수에 비례한다.As such, optimizing global models using local data sets deployed on numerous mobile devices is a difficult task. The biggest of these is the high communication cost of updating local computations on mobile devices and servers. The cost of communication is proportional to the size of the global model parameters and the number of mobile devices connected to the server.

또한, 로컬 기울기의 손실 압축은 통신 라운드 당 메시지 크기를 줄이는 실용적인 솔루션이나, 압축은 희소화 또는 정량화를 통해 수행할 수 있다. 압축을 통한 통신 효율성 향상의 상당한 진전에도 불구하고, 기존 연구에서는 이기종 로컬 데이터 분포, 장치별 다양한 통신 링크 신뢰성 및 정량화 효과를 공동으로 수용하여 로컬 기울기의 집계를 고려하지 않았다. 대부분의 기존 연구는 주로 별도의 통신 및 학습 시스템의 설계 접근 방식에 초점을 맞추고 있으므로, 로컬 기울기를 디코딩하고 이를 독립적으로 집계하였다. Also, while lossy compression of the local gradient is a practical solution to reduce the message size per communication round, compression can be performed through sparsity or quantification. Despite significant progress in communication efficiency improvement through compression, existing studies did not consider the aggregation of local gradients by jointly accommodating heterogeneous local data distributions, various communication link reliability and quantification effects by device. Since most existing studies mainly focus on the design approach of separate communication and learning systems, we decoded the local gradients and aggregated them independently.

무선 통합(Over the air aggregation)은 무선 매체의 중첩 특성을 활용하여 통신 및 학습 시스템을 공동으로 설계하는 새로운 패러다임이다. 이 접근 방식은 무선 및 학습 시스템을 공동으로 설계하는 데 새로운 기회를 열어 주지만, 업 링크 전송을 위해 모바일 장치에서 채널 상태 정보를 사용할 수 있는 시분할 이중 무선 시스템으로 제한된다.Over the air aggregation is a new paradigm for jointly designing communication and learning systems by utilizing the overlapping properties of wireless media. This approach opens new opportunities for jointly designing radio and learning systems, but is limited to time-division duplex radio systems that can use channel state information from mobile devices for uplink transmission.

이에, 본 발명은 모바일 장치가 각각 이기종 무선 링크 품질을 가진 직교 다중 액세스 채널(orthogonalized multiple access channels)을 사용하여 통신 라운드마다 서버로 로컬 기울기를 보내는 연합학습 시스템을 고려한다. Accordingly, the present invention considers a federated learning system in which a mobile device sends a local gradient to a server every communication round using orthogonalized multiple access channels each having a heterogeneous radio link quality.

J. Koneˇcn`y, H. B. McMahan, F. X. Yu, P. Richt´arik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.J. Konehcn`y, H. B. McMahan, F. X. Yu, P. Richt´arik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.

본 발명의 목적은 MSE(Mean-Squared Error; 평균 제곱 오차)를 최소화하기 위해 정량화된 로컬 기울기 정보를 최적으로 집계하기 위한 베이지안 연합학습(Bayesian Federated Learning; BFL) 알고리즘을 제안하고자 한다. An object of the present invention is to propose a Bayesian Federated Learning (BFL) algorithm for optimally aggregating quantified local gradient information to minimize mean-squared error (MSE).

또한, 본 발명의 목적은 모바일 장치에서 서버로 로컬 기울기를 보내는 연합학습 시스템으로, 베이지안 프레임 워크를 사용하여 로컬 기울기(local gradient)의 공동 감지 및 집계를 수행하고자 한다. Another object of the present invention is to perform joint detection and aggregation of local gradients using a Bayesian framework as a federated learning system that sends a local gradient from a mobile device to a server.

본 발명의 일 실시예에 따른 서버와 다수의 모바일 장치들로 구성되는 무선 네트워크를 통한 베이지안 연합학습(Bayesian Federated Learning; BFL) 구동 방법에 있어서, 모바일 장치에서, 자체적인 훈련 데이터셋과 서버로부터 수신된 글로벌 모델 파라미터를 사용하여 로컬 기울기(local gradient)를 산출하는 단계, 상기 모바일 장치에서, 1비트 정량기 및 스칼라 정량기를 사용하여 상기 로컬 기울기를 압축하여 상기 서버로 전송하는 단계 및 상기 서버에서, 베이지안 집계 함수를 이용하여 상기 로컬 기울기에 대한 로컬 기울기 합계를 나타내며, 상기 글로벌 모델 파라미터를 업데이트하는 단계를 포함한다.In the method for driving Bayesian Federated Learning (BFL) through a wireless network composed of a server and a plurality of mobile devices according to an embodiment of the present invention, the mobile device receives its own training dataset and the server Calculating a local gradient using the global model parameter obtained by compressing the local gradient using a 1-bit quantifier and a scalar quantifier in the mobile device and sending it to the server, and in the server, and representing the local gradient sum for the local gradient using a Bayesian aggregation function, and updating the global model parameter.

본 발명의 실시예에 따른 서버와 다수의 모바일 장치들로 구성되는 무선 네트워크를 통한 확장 가능한 베이지안 연합학습(scalable-Bayesian Federated Learning; scalable-BFL) 구동 방법에 있어서, 모바일 장치에서, 자체적인 훈련 데이터셋과 서버로부터 수신된 글로벌 모델 파라미터를 사용하여 로컬 기울기(local gradient)를 산출하며, 상기 모바일 장치에 대해 스칼라로 파라미터화 하는 단계, 상기 모바일 장치에서, 1비트 정량기 및 스칼라 정량기를 사용하여 상기 로컬 기울기 및 상기 스칼라를 정량화하여 상기 서버로 전송하는 단계 및 상기 서버에서, MMSE(Minimum Mean-Squared Error) 집계 함수를 이용하여 상기 정량화된 로컬 기울기 및 상기 정량화된 스칼라에 대한 합계를 나타내며, 상기 글로벌 모델 파라미터를 업데이트하는 단계를 포함한다.In the method for driving scalable-Bayesian Federated Learning (scalable-BFL) through a wireless network composed of a server and a plurality of mobile devices according to an embodiment of the present invention, in the mobile device, its own training data calculating a local gradient using the set and global model parameters received from the server, and parameterizing it as a scalar for the mobile device, in the mobile device, using a 1-bit quantifier and a scalar quantifier Quantifying the local gradient and the scalar and transmitting to the server, and in the server, using an MMSE (Minimum Mean-Squared Error) aggregation function to represent the sum for the quantified local gradient and the quantified scalar, the global updating model parameters.

본 발명의 일 실시예에 따른 서버와 다수의 모바일 장치들로 구성되는 무선 네트워크를 통한 베이지안 연합학습(Bayesian Federated Learning; BFL) 구동 시스템에 있어서, 자체적인 훈련 데이터셋과 서버로부터 수신된 글로벌 모델 파라미터를 사용하여 로컬 기울기(local gradient)를 산출하며, 1비트 정량기 및 스칼라 정량기를 사용하여 상기 로컬 기울기를 압축하여 상기 서버로 전송하는 모바일 장치 및 베이지안 집계 함수를 이용하여 상기 로컬 기울기에 대한 로컬 기울기 합계를 나타내며, 상기 글로벌 모델 파라미터를 업데이트하는 서버를 포함한다.In the Bayesian Federated Learning (BFL) driving system through a wireless network composed of a server and a plurality of mobile devices according to an embodiment of the present invention, a global model parameter received from its own training dataset and the server Calculates a local gradient using Represents the sum and includes a server that updates the global model parameters.

본 발명의 실시예에 따른 서버와 다수의 모바일 장치들로 구성되는 무선 네트워크를 통한 확장 가능한 베이지안 연합학습(scalable-Bayesian Federated Learning; scalable-BFL) 구동 시스템에 있어서, 자체적인 훈련 데이터셋과 서버로부터 수신된 글로벌 모델 파라미터를 사용하여 산출한 로컬 기울기(local gradient)를 1비트 정량기 및 스칼라 정량기를 사용하여 상기 로컬 기울기 및 스칼라를 정량화하여 상기 서버로 전송하는 모바일 장치 및 MMSE(Minimum Mean-Squared Error) 집계 함수를 이용하여 상기 정량화된 로컬 기울기 및 상기 정량화된 스칼라에 대한 합계를 나타내며, 상기 글로벌 모델 파라미터를 업데이트하는 서버를 포함한다.In the scalable-Bayesian Federated Learning (scalable-BFL) driving system through a wireless network composed of a server and a plurality of mobile devices according to an embodiment of the present invention, A mobile device that quantifies the local gradient and scalar calculated using the received global model parameter using a 1-bit quantifier and a scalar quantifier and transmits it to the server and a Minimum Mean-Squared Error (MMSE) ) representing the sum for the quantified local gradient and the quantified scalar using an aggregation function, and a server for updating the global model parameter.

본 발명의 일 실시예에 따르면, 베이지안 집계 함수를 사용하여 로컬 기울기(local gradient)의 공동 감지 및 집계를 수행함으로써, 통신 라운드(communication round) 당 평균 제곱 오차를 최소화할 수 있다. According to an embodiment of the present invention, by performing joint detection and aggregation of a local gradient using a Bayesian aggregation function, it is possible to minimize the mean squared error per communication round.

또한, 본 발명의 실시예에 따르면, 확장 가능한 베이지안 연합학습(scalable-BFL)의 베이지안 집계 함수를 사용함으로써, 모바일 장치 수의 확장에 따른 로컬 기울기 데이터 분포와 통신 링크 품질을 향상시킬 수 있다.In addition, according to an embodiment of the present invention, by using the Bayesian aggregation function of scalable Bayesian joint learning (scalable-BFL), it is possible to improve the local gradient data distribution and communication link quality according to the expansion of the number of mobile devices.

도 1은 연합학습을 설명하기 위해 도시한 것이다.
도 2는 본 발명의 일 실시예에 따른 베이지안 연합학습 구동 시스템을 통한 베이지안 연합학습 구동 방법의 동작 흐름도를 도시한 것이다.
도 3은 본 발명의 실시예에 따른 확장 가능한 베이지안 연합학습 구동 시스템을 통한 확장 가능한 베이지안 연합학습 구동 방법의 동작 흐름도를 도시한 것이다.
도 4는 본 발명의 실시예에 따른 확장 가능한 베이지안 연합학습(scalable-BFL)의 신경망 구조를 도시한 것이다.1 is a diagram for explaining the associative learning.
2 is a flowchart illustrating an operation of a method for driving Bayesian federated learning through a Bayesian federated learning driving system according to an embodiment of the present invention.
3 is a flowchart illustrating an operation of an extensible Bayesian federated learning driving method through an extensible Bayesian federated learning driving system according to an embodiment of the present invention.
4 is a diagram illustrating a neural network structure of scalable Bayesian joint learning (scalable-BFL) according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be embodied in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for the purpose of describing the embodiments, and is not intended to limit the present invention. In this specification, the singular also includes the plural, unless specifically stated otherwise in the phrase. As used herein, "comprises" and/or "comprising" refers to the presence of one or more other components, steps, operations and/or elements mentioned. or addition is not excluded.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly specifically defined.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예들을 보다 상세하게 설명하고자 한다. 도면 상의 동일한 구성요소에 대해서는 동일한 참조 부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. The same reference numerals are used for the same components in the drawings, and repeated descriptions of the same components are omitted.

도 1은 연합학습을 설명하기 위해 도시한 것이다.1 is a diagram for explaining the associative learning.

연합학습(federated learning)을 사용하면, 모바일 장치에서 예측 데이터를 관리하면서, 공통 예측모델을 사용하여 공동으로 학습할 수 있으므로, 별도 서버나 클라우드에 데이터를 저장하지 않아도 된다.With federated learning, it is possible to jointly learn using a common prediction model while managing prediction data on a mobile device, so there is no need to store data on a separate server or cloud.

도 1을 참조하여 설명하면, 모든 모바일 장치는 현재의 공통 예측모델(또는 글로벌 모델)을 다운로드한다(S101). 모바일 장치는 사용자의 사용에 따라 단말의 데이터(또는 훈련 데이터셋)를 바탕으로 예측모델의 학습을 개선할 수 있다(S102). 모바일 장치는 예측모델의 학습을 개선한 다음, 이러한 변경사항을 업데이트 데이터로 생성할 수 있다(S103). 수많은 모바일 장치의 예측모델들은 다양한 사용환경과 사용자의 특성이 반영되어 학습될 수 있다(S104). 각 모바일 장치의 업데이트 데이터는 통신을 통해, 클라우드 서버로 전송되며, 이는 공통 예측모델을 개선하는 데 사용될 수 있다(S105). 개선된 공통 예측모델은 다시 각 모바일 장치에 배포될 수 있다(S106). 모바일 장치는 이러한 예측모델을 재학습하고, 공통 예측모델은 개선하는 단계를 반복하면서, 공통 예측모델을 발전시키고 이를 공유할 수 있다.Referring to FIG. 1 , all mobile devices download the current common prediction model (or global model) ( S101 ). The mobile device may improve the learning of the predictive model based on data (or training dataset) of the terminal according to the user's use ( S102 ). After improving the learning of the predictive model, the mobile device may generate these changes as update data (S103). Numerous predictive models of mobile devices can be learned by reflecting various usage environments and user characteristics (S104). The update data of each mobile device is transmitted to the cloud server through communication, which can be used to improve the common prediction model (S105). The improved common prediction model may be distributed to each mobile device again (S106). The mobile device can develop and share the common predictive model while repeating the steps of re-learning the predictive model and improving the common predictive model.

도 2는 본 발명의 일 실시예에 따른 베이지안 연합학습 구동 시스템을 통한 베이지안 연합학습 구동 방법의 동작 흐름도를 도시한 것이다.2 is a flowchart illustrating an operation of a method for driving Bayesian federated learning through a Bayesian federated learning driving system according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 서버와 다수의 모바일 장치들로 구성되는 무선 네트워크를 통한 베이지안 연합학습(Bayesian Federated Learning; BFL) 구동 방법은 베이지안 연합학습(Bayesian Federated Learning; BFL)에 대한 프레임워크를 제시한다. 본 발명의 일 실시예에 따른 베이지안 연합학습은 베이지안 관점에서 로컬 기울기의 최적 집계 방법을 제공하는 것으로, MSE(Mean-Squared Error; 평균 제곱 오차)를 최소화한다는 의미에서 로컬 기울기의 최적 집계 방법을 제안한다.Referring to FIG. 2 , a Bayesian Federated Learning (BFL) driving method through a wireless network configured with a server and a plurality of mobile devices according to an embodiment of the present invention is a Bayesian Federated Learning (BFL) method. ) provides a framework for Bayesian federated learning according to an embodiment of the present invention provides an optimal aggregation method of local gradients from a Bayesian point of view, and proposes an optimal aggregation method of local gradients in the sense of minimizing Mean-Squared Error (MSE). do.

모바일 장치(200)는 서버(300)로부터 글로벌 모델(또는 공통 예측모델)을 다운로드한다(단계 S201). The mobile device 200 downloads the global model (or common prediction model) from the server 300 (step S201).

단계 S202에서, 모바일 장치(200)는 자체적인 훈련 데이터셋과 서버(300)로부터 수신된 글로벌 모델 파라미터를 사용하여 로컬 기울기(local gradient)를 산출한다. 모바일 장치(200)는 훈련 데이터셋과 글로벌 모델 파라미터를 기반으로, 로컬 기울기의 사전 정보(prior) 혹은 사후 정보(posterior)를 이용하여 로컬 기울기를 산출할 수 있다.In step S202 , the mobile device 200 calculates a local gradient using its own training dataset and global model parameters received from the server 300 . The mobile device 200 may calculate a local gradient using prior information (prior) or posterior information (posterior) of the local gradient based on the training dataset and global model parameters.

보다 상세하게 설명하자면 기존의 연합학습과 달리, 본 발명의 일 실시예에 따른 단계 S202에서 모바일 장치(200)는 로컬 기울기의 사전 정보(prior data)를 이용한다. 통신 라운드 t에서 모바일 장치 k로 계산한 로컬 기울기(

)의 사전 분포는

이다. 이때, 사전 분포를 특성화하는 것은 손실 함수(

)뿐만 아니라 데이터 표본의 기본 분포에 의존하기 때문에 매우 어렵다. 이러한 어려움을 극복하기 위해, 모바일 장치(200)는 적절한 모멘트 매칭 기법을 사용하여 로컬 기울기의 분포를 가우스 사전 분포로 모델링하여 로컬 기울기의 사전 분포를 특성화한다. 특히, 사전 정보(prior)는 평균 벡터(

) 및 공분산 행렬(

)을 갖는 다변량 정규 분포인 하기의 [수식 1]이라고 가정한다. In more detail, unlike the existing federated learning, the mobile device 200 uses prior data of the local gradient in step S202 according to an embodiment of the present invention. Local gradient computed with mobile device k in communication round t (

) is the prior distribution of

to be. In this case, characterizing the prior distribution is a loss function (

) as well as it is very difficult because it depends on the underlying distribution of the data sample. To overcome this difficulty, the mobile device 200 uses an appropriate moment matching technique to model the distribution of the local gradient as a Gaussian prior distribution to characterize the prior distribution of the local gradient. In particular, the prior information (prior) is an average vector (

) and the covariance matrix (

) is assumed to be the following [Equation 1], which is a multivariate normal distribution with .

[수식 1][Formula 1]

사전 분포는 데이터 표본의 기본 분포와 로컬 손실 함수에 따라 통신 라운드에서 변경될 수 있다.The prior distribution may change in communication rounds depending on the underlying distribution of the data sample and the local loss function.

에서 로컬 기울기

와

는 글로벌 모델 파라미터(

)에서 평가되기 때문에 통계적으로 상관 관계가 있다. 이를 설명하기 위해 선형 회귀 손실 함수를 고려한다.

local gradient in

Wow

is the global model parameter (

), so they are statistically correlated. To explain this, consider a linear regression loss function.

[수식 2][Equation 2]

여기서,

에 대해

를 나타내고,

를 나타낸다. 또한,

에 대한 모든 로컬 데이터 분포(

)는 각각

와

으로 독립적이라고 가정한다. 이 경우, 글로벌 모델 파라미터(

)를 사용한 모바일 장치(

)에서의 로컬 기울기는

를 나타낸다. here,

About

represents,

indicates In addition,

All local data distributions for (

) is each

Wow

is assumed to be independent. In this case, the global model parameter (

) using a mobile device (

) is the local gradient at

indicates

글로벌 모델 파라미터(

)에서 조건화된

와

사이의 상관 행렬은 다음과 같이 계산된다. global model parameters (

) conditioned in

Wow

The correlation matrix between

[수식 3][Equation 3]

이때, 데이터 분포

와

가 모바일 장치와 상관되어 있다면

와

사이의 상관 관계는 더욱 복잡할 수 있다. At this time, the data distribution

Wow

is associated with a mobile device

Wow

The correlation between them can be more complex.

단계 S203에서, 모바일 장치(200)는 1비트 정량기를 사용하여 로컬 기울기를 압축한다. In step S203, the mobile device 200 compresses the local gradient using a 1-bit quantifier.

모바일 장치(200)는 통신 비용을 최소화하기 위해 1비트 정량기를 사용하여 로컬 기울기를 압축하며, 1비트 정량화된 출력을 균일하게 분배하기 위해, 먼저 평균이 0인 정규화를 수행한다.The mobile device 200 compresses the local gradient using a 1-bit quantifier to minimize the communication cost, and first performs normalization with an average of 0 to evenly distribute the 1-bit quantified output.

[수식 4][Equation 4]

그럼 다음, 모바일 장치(200)는 정규화된 기울기를

신호를 사용하여 정량화한다.Then, the mobile device 200 calculates the normalized slope

The signal is used to quantify.

[수식 5][Equation 5]

이후에, 모바일 장치(200)는 업데이트된 로컬 기울기를 서버(300)로 전송한다(단계 S204).Thereafter, the mobile device 200 transmits the updated local gradient to the server 300 (step S204).

단계 S204에서, 모바일 장치(200)는 사전 분포인 평균 벡터(

)와 공분산 행렬(

)의 파라미터와 함께 이진 기울기 정보(binary gradient information,

)를 서버(300)로 전송할 수 있다. In step S204, the mobile device 200 determines the prior distribution of the mean vector (

) and the covariance matrix (

) along with the parameters of the binary gradient information (binary gradient information,

) may be transmitted to the server 300 .

서버(300)는 베이지안 집계 함수를 이용하여 로컬 기울기에 대한 로컬 기울기 합계를 산출(단계 S205)하며, 글로벌 모델 파라미터를 업데이트한다(단계 S206). The server 300 calculates the sum of the local gradients for the local gradients using the Bayesian aggregation function (step S205), and updates the global model parameters (step S206).

서버(300)는 t번째 통신 라운드에서,

에 대한 채널 출력(

)을 활용하여 글로벌 모델 파라미터를 업데이트한다. 이때, 서버(300)는 채널 분포(

), 로컬 기울기의 한계 분포(

) 및 로컬 기울기의 공통 분포(

)를 알고 있다고 가정하며, 이는 MSE 최적 집계 함수를 도출하기 위한 가정(genie-aided assumption)을 나타낸다.Server 300 in the t-th communication round,

Channel output for (

) to update global model parameters. At this time, the server 300 is channel distribution (

), the marginal distribution of the local gradient (

) and the common distribution of local gradients (

), which represents a genie-aided assumption for deriving the MSE optimal aggregate function.

가 로컬 기울기의 합(

)을 추정하기 위한 집계 함수라 하면, 서버(300)는 다음과 같이 집계 함수(

)를 획득할 수 있다.

is the sum of local gradients (

) is an aggregate function for estimating, the server 300 performs the aggregate function (

) can be obtained.

[수식 6][Equation 6]

[수식 6]의 최적화 문제에 대한 MSE-최적 집계 함수는 조건부 기대에 의해 획득된다.The MSE-optimal aggregate function for the optimization problem of [Equation 6] is obtained by conditional expectation.

[수식 7][Equation 7]

[수식 8][Equation 8]

[수식 8]의 조건부 분포는 체인 규칙과 정규화된 로컬 기울기(

)에서 조건화된

에 대한

사이의 독립성을 이용하여 다음과 같이 인수분해된다.The conditional distribution in [Equation 8] is based on the chain rule and the normalized local gradient (

) conditioned in

for

Using the independence between

[수식 9][Equation 9]

여기서, 최종 균등은

사실로부터 마르코프 체인을 형성하고,

에 대한

는 공동 분포(

)의 주변화로 획득될 수 있다.Here, the final equality is

form a Markov chain from facts,

for

is the joint distribution (

) can be obtained by marginalization of

이때, 서버(300)는 [수식 9]를 [수식 8]에 연결하여 베이지안 관점에서 최적의 경사 집계 함수인 베이지안 집계 함수를 제공한다. 또한, 서버(300)는 [수식 7]에서

를 사용하여 글로벌 모델 파라미터를 하기의 [수식 10]과 같이 확률적 경사 강하법에 기반하여 업데이트한다. In this case, the server 300 connects [Equation 9] to [Equation 8] to provide a Bayesian aggregation function that is an optimal gradient aggregation function from a Bayesian point of view. In addition, the server 300 in [Equation 7]

is used to update the global model parameters based on the stochastic gradient descent method as shown in [Equation 10] below.

[수식 10][Equation 10]

이후에, 단계 S207에서, 서버(300)는 업데이트된 글로벌 모델을 공유할 수 있다.Thereafter, in step S207 , the server 300 may share the updated global model.

도 3은 본 발명의 실시예에 따른 확장 가능한 베이지안 연합학습 구동 시스템을 통한 확장 가능한 베이지안 연합학습 구동 방법의 동작 흐름도를 도시한 것이며, 도 4는 본 발명의 실시예에 따른 확장 가능한 베이지안 연합학습(scalable-BFL)의 신경망 구조를 도시한 것이다.3 is a flowchart illustrating an extensible Bayesian federated learning driving method through an extensible Bayesian federated learning driving system according to an embodiment of the present invention, and FIG. 4 is an expandable Bayesian federated learning ( It shows the neural network structure of scalable-BFL).

도 3을 참조하면, 본 발명의 실시예에 따른 서버와 다수의 모바일 장치들로 구성되는 무선 네트워크를 통한 확장 가능한 베이지안 연합학습(Bayesian Federated Learning; BFL) 구동 방법은 확장 가능한 베이지안 연합학습(Scalable-BFL)에 대한 프레임워크를 제시한다. 본 발명의 실시예에 따른 확장 가능한 베이지안 연합학습은 모바일 장치의 수로 확장 가능하며 로컬 데이터 분포와 통신 링크 품질 모두의 이질성이 강하다.Referring to FIG. 3 , a method for driving scalable Bayesian Federated Learning (BFL) through a wireless network composed of a server and a plurality of mobile devices according to an embodiment of the present invention is A framework for BFL) is presented. The scalable Bayesian federated learning according to an embodiment of the present invention is scalable to the number of mobile devices and has strong heterogeneity in both local data distribution and communication link quality.

모바일 장치(200)는 서버(300)로부터 글로벌 모델(또는 공통 예측모델)을 다운로드한다(단계 S301). The mobile device 200 downloads a global model (or a common prediction model) from the server 300 (step S301).

단계 S302에서, 모바일 장치(200)는 자체적인 훈련 데이터셋과 서버(300)로부터 수신된 글로벌 모델 파라미터를 사용하여 산출한 로컬 기울기(local gradient)를 정규화하고, 두 개의 스칼라로 파라미터화한다.In step S302, the mobile device 200 normalizes a local gradient calculated using its own training dataset and global model parameters received from the server 300, and parameterizes it as two scalars.

보다 구체적으로, 모바일 장치(200)는 공통 평균(

)과 분산(

)을 사용하여 독립 분포(independent identically distributed; i.i.d.) 가우스처럼 로컬 기울기(

)의 모든 요소를 모델링한다. More specifically, the mobile device 200 has a common average (

) and variance (

) using the independent identically distributed (iid) Gaussian-like local gradient (

) to model all elements of

[수식 11][Equation 11]

그런 다음, 모바일 장치(200)는

의 샘플 평균과 분산을 다음과 같이 계산하여 평균과 분산을 추정한다.Then, the mobile device 200

Estimate the mean and variance by calculating the sample mean and variance of

[수식 12][Equation 12]

[수식 13][Equation 13]

사전 분포는 모바일 장치

에 대한 두 개의 스칼라

와

로 각 모바일 장치(200)의 사전 분포에 대한 뚜렷한 통계 정보를 획득할 수 있다. 이러한 단순화는 집계 기능의 이전 및 복잡성에 대한 정보를 제공하는 통신 비용을 크게 절감할 수 있다. 특히, 각 모바일 장치(200)는 서버(300)에 1비트 로컬 기울기 외에 2개의 스칼라

와

만 전송하면 된다. 이에 따라서, 확장 가능한 베이지안 연합학습(Scalable-BFL)은 베이지안 연합학습(BFL)에서, 큰 차원의 평균 벡터(

)와 공분산 행렬(

)을 보내는 것에 비해 통신 비용을 감소시킬 수 있다.Pre-distribution for mobile devices

two scalars for

Wow

With this method, clear statistical information about the prior distribution of each mobile device 200 can be obtained. This simplification can significantly reduce the cost of communication that informs the migration and complexity of aggregation functions. In particular, each mobile device 200 has two scalars in addition to the 1-bit local gradient to the server 300 .

Wow

You only need to send Accordingly, Scalable-BFL is a large-dimensional mean vector (

) and the covariance matrix (

) can reduce the communication cost compared to sending

모바일 장치(200)는 1비트 정량화 전에 다음과 같이 평균이 0인 정규화를 수행한다.The mobile device 200 performs normalization with an average of 0 as follows before 1-bit quantification.

[수식 14][Equation 14]

단계 S303에서, 모바일 장치(200)는 1비트 정량기를 사용하여 로컬 기울기를 이진 벡터로 압축하며, 스칼라 정량기를 사용하여 두 개의 스칼라를 정량화한다. In step S303, the mobile device 200 compresses the local gradient into a binary vector using a 1-bit quantifier, and quantifies two scalars using a scalar quantifier.

[수식 14]를 통한 정규화 후, 모바일 장치(200)는 1비트 정량기(

)를 사용하여 정규화된 로컬 기울기(

)를 이진 벡터로 압축한다. 또한, 모바일 장치(200)는 B비트 스칼라 정량기를 사용하여 두 개의 스칼라

와

를 정량화한다. 이때,

와

를 각각 정량화된 출력과 빈 경계(bin boundary)의 집합으로 설정한다. 그런 다음, 정량화 함수(

)는 입력을 다음과 같이 Q의 이산값 출력에 매핑한다.After normalization through [Equation 14], the mobile device 200 is a 1-bit quantifier (

) using the normalized local gradient (

) to a binary vector. In addition, the mobile device 200 uses a B-bit scalar quantifier to

Wow

to quantify At this time,

Wow

is set as a set of quantified outputs and bin boundaries, respectively. Then, the quantification function (

) maps the input to the discrete output of Q as

[수식 15][Equation 15]

이에 따라서,

및

은 각각 두 개의 스칼라

와

의 정량기 출력을 나타낸다. Accordingly,

and

are two scalars each

Wow

represents the output of the quantifier.

이후에, 모바일 장치(200)는 업데이트된 로컬 기울기 및 정량화된 두 개의 스칼라를 서버(300)로 전송한다(단계 S304).Thereafter, the mobile device 200 transmits the updated local gradient and the quantified two scalars to the server 300 (step S304).

도 4를 참조하면, 업링크 전송 패킷 구조(Uplink packet structure)가 도시되어 있다. 모바일 장치(Mobile device, 200)는 두 개의 스칼라

와

가 포함된 정량화된 로컬 기울기(

)를 통신 라운드 t에서 서버(300)로 전송한다.Referring to FIG. 4, an uplink transport packet structure is shown. Mobile device (200) is a two scalar

Wow

quantified local slope with (

) to the server 300 in a communication round t.

정량화된 두 개의 스칼라

와

를 전송할 때, 모바일 장치(200)는 강력한 채널 코드(예를 들면, 극성 모드)를 사용하여 정량화된 두 개의 스칼라에 대한 2B 정보 비트를 인코딩하여 전송하며, 서버(300)는 이를 완벽하게 디코딩한다. 예를 들면, 코드 속도가 r<1로 고정될 때,

이진 코드 부호의 총량은 정량화된 로컬 기울기(

)를 전송하기 위한 M 이진 기호와 함께 추가로 전송된다. 이때, 글로벌 모델 크기(

)가

보다 훨씬 크기 때문에, 추가 통신 오버헤드는 무시할 수 있다. Two quantified scalars

Wow

When sending, the mobile device 200 encodes and transmits the 2B information bits for the two scalars quantified using a strong channel code (eg, polar mode), and the server 300 decodes them perfectly. . For example, when the code rate is fixed at r<1,

The total amount of binary code sign is the quantified local slope (

) is additionally transmitted along with the M binary symbol to transmit At this time, the global model size (

)go

Since it is much larger than , the additional communication overhead is negligible.

서버(300)는 베이지안 집계 함수를 이용하여 정량화된 로컬 기울기 및 정량화된 두 개의 스칼라에 대한 합계를 산출(단계 S305)하며, 글로벌 모델 파라미터를 업데이트한다(단계 S306). 이후에, 단계 S307에서, 서버(300)는 업데이트된 글로벌 모델을 공유할 수 있다.The server 300 calculates the sum of the quantified local gradient and the quantified two scalars using the Bayesian aggregation function (step S305), and updates the global model parameter (step S306). Thereafter, in step S307, the server 300 may share the updated global model.

서버(300)는 정량화된 두 개의 스칼라

와

를 모두 사용하여 요소별 MMSE(Minimum Mean-Squared Error) 추정을 수행한다. 이는 비선형 베이지안 집계 함수에 대한 폐쇄형 표현식을 획득할 수 있으며, 단순화된 사전 분포를 통해 확장 가능한 MMSE 집계 함수를 획득할 수 있다.Server 300 quantifies two scalars

Wow

MMSE (Minimum Mean-Squared Error) estimation for each element is performed using all It is possible to obtain a closed expression for the nonlinear Bayesian aggregate function, and to obtain a scalable MMSE aggregate function through a simplified prior distribution.

서버(300)는 단순화된 이전 분포 가정 하에서 폐쇄형 형태로 MMSE 집계 함수를 설정한다.The server 300 sets the MMSE aggregation function in a closed form under the simplified previous distribution assumption.

일 예로,

는

와

에 대한 정량화된 두 개의 스칼라

및

와 함께 i.i.d. 가우스 랜덤 변수라 하며,

를 경사 집계 함수라 한다. 이에 의해,

를 최소화하는 집계 함수는 다음과 같다.For example,

Is

Wow

Two quantified scalars for

and

together with iid is called a Gaussian random variable,

is called the gradient aggregation function. Thereby,

The aggregate function that minimizes

[수식 16][Equation 16]

여기서, 1M은 차원이 M인 올원 벡터를 나타낸다. Here, 1M represents an all-one vector of dimension M.

전술한 바와 같은 집계 방법은 2단계 연산을 통해 구현된다. 첫 번째 단계는 사전 분포 파라미터(

) 및 통신 채널 파라미터(

)를 사용하여

에서

로 비선형 매핑을 수행한다. 두 번째 단계는 로컬 기울기(

)의 추정치를 얻기 위해 평균

으로

를 합한다. The aggregation method as described above is implemented through a two-step operation. The first step is the prior distribution parameters (

) and communication channel parameters (

)use with

at

to perform non-linear mapping. The second step is the local gradient (

) to get an estimate of the mean

by

add up

확장 가능한 베이지안 연합학습(Scalable-BFL)을 구현하기 위해, 서버(300)는 통신 채널 파라미터(

)를 추정해야 한다. 채널 파라미터(

)는 기존의 파일럿 신호를 사용하여 추정할 수 있으며, 추정 정확도는 파일럿 신호의 길이에 선형적으로 비례한다. 따라서, 업링크 스펙트럼 효율을 희생하여 채널 파라미터(

)를 정확하게 추정할 수 있다. In order to implement a scalable Bayesian federated learning (Scalable-BFL), the server 300 is a communication channel parameter (

) should be estimated. Channel parameters (

) can be estimated using the existing pilot signal, and the estimation accuracy is linearly proportional to the length of the pilot signal. Thus, at the expense of uplink spectral efficiency, channel parameters (

) can be accurately estimated.

도 4를 참조하면, 전술한 바와 같은 집계 함수는 2계층 신경망을 통해 구현될 수 있다.Referring to FIG. 4 , the aggregation function as described above may be implemented through a two-layer neural network.

에 대한

와

가 상관관계가 있는 가우스일 때, [수식 16]에서 도출된 집계 함수는 [수식 7]에 설명된 상관관계가 있는 이전 가정(

)에 따라 차선책이 된다는 점에 유의한다. 그럼에도 불구하고, 로컬 기울기와 상관관계가 있는 사전 분포 하에서, 확장 가능한 베이지안 연합학습(Scalable-BFL)은 모바일 장치의 수와 MSE 성능 손실을 감수하는 모델 크기 모두에서 계산적 확장성으로 인해 유망한 알고리즘으로 제안된다.

for

Wow

When is a correlated Gaussian, the aggregate function derived from [Equation 16] is

) to be suboptimal. Nevertheless, under a prior distribution correlated with local gradients, scalable Bayesian federated learning (Scalable-BFL) is proposed as a promising algorithm due to its computational scalability in both the number of mobile devices and the model size at the cost of MSE performance loss. do.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(Field Programmable Gate Array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, the devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In a Bayesian Federated Learning (BFL) driving method through a wireless network composed of a server and a plurality of mobile devices,
calculating, in the mobile device, a local gradient using its own training dataset and global model parameters received from a server;
at the mobile device, compressing the local gradient using a 1-bit quantifier and a scalar quantifier and transmitting it to the server; and
Representing the local gradient sum for the local gradient using a Bayesian aggregation function, in the server, and updating the global model parameter
A Bayesian federated learning driving method comprising

According to claim 1,
The step of calculating the local gradient is
Based on the training dataset and the global model parameter, the Bayesian federated learning driving method, characterized in that the local gradient is calculated using prior or posterior information of the local gradient.

3. The method of claim 2,
The step of calculating the local gradient is
A Bayesian joint learning driving method, characterized in that the distribution of the local gradient is modeled as a Gaussian prior distribution using the moment matching technique to characterize the prior or posterior distribution of the local gradient.

According to claim 1,
The step of compressing the local gradient and transmitting it to the server is
A Bayesian federated learning driving method that performs normalization with a mean of 0 and quantifies the normalized local gradient to uniformly distribute the 1-bit quantified output.

According to claim 1,
The step of compressing the local gradient and transmitting it to the server is
A Bayesian federated learning driving method, characterized in that the mobile device transmits binary gradient information together with the sample moment parameter of the gradient, which is information about the prior distribution, from the mobile device to the server.

According to claim 1,
The step of updating the global model parameter is
In the server through the channel distribution, the marginal distribution of the local gradient, and the joint distribution of the local gradient, using the Bayesian aggregation function, which is the optimal gradient aggregation function from a Bayesian point of view, through the MSE-optimal aggregation function estimated by conditional expectation, Bayesian joint learning driving method, characterized in that it represents the sum of the local gradients with respect to the local gradients.

7. The method of claim 6,
The step of updating the global model parameter is
A method for driving Bayesian federated learning, updating the global model parameters using the MSE-optimal aggregation function.

In a scalable-Bayesian Federated Learning (scalable-BFL) driving method through a wireless network composed of a server and a plurality of mobile devices,
calculating, in the mobile device, a local gradient using its own training dataset and global model parameters received from a server, and parameterizing the local gradient as a scalar for the mobile device;
quantifying, in the mobile device, the local gradient and the scalar using a 1-bit quantifier and a scalar quantifier and transmitting the quantification to the server; and
Representing, in the server, the sum of the quantified local gradient and the quantified scalar using a Minimum Mean-Squared Error (MMSE) aggregation function, updating the global model parameter;
A scalable Bayesian federated learning driving method including

9. The method of claim 8,
Normalizing the local gradient and parameterizing it as a scalar comprises
A scalable Bayesian joint learning driving method that uses a common mean and variance to model all elements of the local gradient, such as an independent distribution (iid) Gaussian distribution.

10. The method of claim 9,
Normalizing the local gradient and parameterizing it as a scalar comprises
A method for driving scalable Bayesian joint learning, characterized in that normalization with an average of 0 is performed before quantifying the local gradient by one bit.

11. The method of claim 10,
The step of sending to the server is
compressing the normalized local gradient into a heterogeneous vector using the 1-bit quantifier, and quantifying two scalars of a common mean and variance using a B-bit scalar quantifier, wherein the quantified two scalars are included. A method for driving scalable Bayesian federated learning, characterized in that it transmits the quantified local gradient to the server.

9. The method of claim 8,
The step of updating the global model parameter is
A scalable Bayesian joint learning driving method for updating the global model parameter by performing summation on the quantified local gradient and the quantified scalar using the Minimum Mean-Squared Error (MMSE) aggregation function.

In a Bayesian Federated Learning (BFL) driving system through a wireless network composed of a server and a plurality of mobile devices,
a mobile device that calculates a local gradient using its own training dataset and global model parameters received from a server, compresses the local gradient using a 1-bit quantifier and a scalar quantifier, and transmits it to the server; and
A server that represents the local gradient sum for the local gradient using a Bayesian aggregation function, and updates the global model parameter
Bayesian federated learning driving system including

In a scalable-Bayesian Federated Learning (scalable-BFL) driving system through a wireless network composed of a server and a plurality of mobile devices,
A mobile device that quantifies the local gradient and scalar using a 1-bit quantifier and a scalar quantifier and transmits the local gradient calculated using its own training dataset and global model parameters received from the server to the server ; and
A server that represents the sum of the quantified local gradient and the quantified scalar using a Minimum Mean-Squared Error (MMSE) aggregation function, and updates the global model parameter
A scalable Bayesian federated learning driving system that includes