WO2023197259A1

WO2023197259A1 - Devices and methods for providing a federated learning model

Info

Publication number: WO2023197259A1
Application number: PCT/CN2022/086887
Authority: WO
Inventors: Wenxuan YE; Xueli AN; Xueqiang Yan
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2023-10-19

Abstract

A backend system (100) is disclosed for providing a federated learning, FL, model for a plurality of client devices (110) of a mobile communication network. The backend system (100) comprises a plurality of aggregator entities (120), each aggregator entity (120) configured to aggregate local FL model data from a plurality of selected client devices of the plurality of client devices (110) for generating global FL model update data for updating the FL model. The backend system (100) further comprises a Distributed Ledger Technology, DLT, platform (130) configured to select an aggregator entity of the plurality of aggregator entities (120). The selected aggregator entity is further configured to upload the global FL model update data to the DLT platform (130).

Description

Devices and methods for providing a federated learning model

TECHNICAL FIELD

The present disclosure relates to devices and methods for providing a federated learning model. More specifically, the present disclosure relates to devices and methods for providing a federated learning model for a plurality of client devices of a mobile communication network.

BACKGROUND

Federated learning is an emerging machine learning setting where the learning task is solved by a loose federation of participating devices coordinated by a central server. Instead of directly uploading the local training dataset to the central server, each client computes an update to a current global model maintained by the central server. With the privacy protection nature, it has great potential in mobile communication systems, being treated as one of the vital solutions to achieve ubiquitous AI in 6G. And it is particularly attractive for vertical applications, such as Vehicle-to-Everything (V2X) and Industrial Internet of Things (IIoT) . However, traditional central-server-based approaches can suffer from single point of failure or limited scalability, no native trustworthy management of the models and limitations on the scope of local data usage and FL service provider.

SUMMARY

It is an objective to provide improved devices and methods for providing a federated learning model, in particular providing federated learning as a native service by a mobile communication system.

The foregoing and other objectives are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect, a backend system is provided for providing a federated learning, FL, model service of a plurality of FL models for a plurality of mobile client devices of a mobile communication network, in particular 5G network. The backend system comprises a plurality of aggregator entities. Each aggregator entity is configured to aggregate local FL model data, i.e. "model parameters" from a plurality of selected client devices of the plurality of client devices for generating global FL model update data for updating the FL model. The backend system further comprises a Distributed Ledger Technology, DLT, platform configured to select an aggregator entity of the plurality of aggregator entities. The selected aggregator entity is further configured to upload the global FL model update data to the DLT platform. The aggregator entities may be aggregator servers. The DLT platform may comprise or be implemented on one or more DLT servers. Thus, the backend system achieves distribution of aggregation points in federated learning systems by leveraging key features of DLT, such as decentralization, immutability, traceability and transparency.

In a further possible implementation form, the backend system further comprises one or more data storage entities. The one or more data storage entities are configured to store the global FL model update data from the selected aggregator entity. The selected client devices are configured to download the global FL model update data from the one or more data storage entities.

In a further possible implementation form, the one or more data storage entities are further configured to store the local FL model data from the plurality of selected client devices of the plurality of client devices. The selected aggregator entity is configured to download the local FL model data from the one or more data storage entities.

In a further possible implementation form, the DLT platform is configured to select the one or more aggregator entities of the plurality of aggregator entities based on a smart contract. A smart contract may be a computer program or a transaction protocol which is intended to automatically execute, control or document legally relevant events and actions according to the terms of a contract or an agreement.

In a further possible implementation form, the selected aggregator entity is configured to select the plurality of selected client devices of the plurality of client devices.

In a further possible implementation form, the DLT platform is configured to store one or more data keys, i.e. "pointers" for the local FL model data of the plurality of client devices, i.e. provided by the plurality of client devices.

In a further possible implementation form, the DLT platform is further configured to store one or more further data keys for the global FL model update data from the selected aggregator entity.

In a further possible implementation form, the DLT platform is configured to perform authentication, authorization and/or access control of the plurality of selected client devices and/or the selected aggregator entity.

In a further possible implementation form, the backend system further comprises a FL service lifecycle manager, FLSLM, entity. The FLSLM entity is configured to define and configure the global FL model, i.e. to provide a FL service definition.

In a further possible implementation form, the FLSLM entity is further configured to monitor the operation of the FL model deployed in the DLT platform.

In a further possible implementation form, the FLSLM entity is further configured to terminate operation of the FL model deployed in the DLT platform.

In a further possible implementation form, each of the plurality of aggregator entities is configured to periodically check the DLT platform whether the respective aggregator entity has been selected as the selected aggregator entity and/or for the local FL model data.

In a further possible implementation form, each client device is configured to periodically check the DLT platform whether the respective client device has been selected as one of the plurality of selected client devices and/or for the global FL model update data for downloading the global FL model update data to the respective client device.

In a further possible implementation form, the DLT platform is further configured to receive an upload request from the selected aggregator entity and/or the plurality of selected client devices for uploading data to the one or more storage entities. The upload request comprises a data key of the upload data.

According to a second aspect a method is provided for operating a backend system for providing a federated learning, FL, model service of a plurality of FL models for a plurality of mobile client devices of a mobile communication network, in particular 5G network, the backend system comprising a plurality of aggregator entities, each aggregator entity being configured to aggregate local FL model data, i.e. "model parameters" from a plurality of selected client devices of the plurality of client devices for generating global FL model update data for updating the FL model. The method comprises the followings steps: selecting an aggregator entity of the plurality of aggregator entities by a Distributed Ledger Technology, DLT, platform; and

uploading the global FL model update data to the DLT platform by the selected aggregator entity.

The method according to the second aspect of the present disclosure can be performed by the backend system according to the first aspect of the present disclosure. Thus, further features of the method according to the second aspect of the present disclosure result directly from the functionality of the backend system according to the first aspect of the present disclosure as well as its different implementation forms described above and below.

According to a third aspect a computer program product is provided, comprising a computer-readable storage medium for storing program code which causes a computer or a processor to perform the method of according to the second aspect when the program code is executed by the computer or the processor.

Details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, embodiments of the present disclosure are described in more detail with reference to the attached figures and drawings, in which:

Fig. 1 shows a schematic scheme illustrating a backend system according to an embodiment;

Fig. 2 shows a schematic scheme illustrating an exemplary federated learning system;

Fig. 3 shows a schematic scheme illustrating a backend system according to an embodiment;

Fig. 4 shows a schematic scheme illustrating a service lifecycle in a backend system according to an embodiment;

Fig. 5 shows a schematic scheme illustrating a task execution in one global epoch in a backend system according to an embodiment;

Figs. 6-16 show signaling diagrams illustrating the interaction of several components of a backend system according to embodiments;

Fig. 17a shows a schematic scheme illustrating exemplary federated learning systems;

Fig. 17b shows a schematic scheme illustrating a backend system according to an embodiment; and

Fig. 18 is a flow diagram illustrating a method of operating a backend system according to an embodiment.

In the following, identical reference signs refer to identical or at least functionally equivalent features.

DETAILED DESCRIPTION OF THE EMBODIMENTS

In the following description, reference is made to the accompanying figures, which form part of the disclosure, and which show, by way of illustration, specific aspects of embodiments of the present disclosure or specific aspects in which embodiments of the present disclosure may be used. It is understood that embodiments of the present disclosure may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.

For instance, it is to be understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps) , even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g. functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units) , even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.

Figure 1 shows a schematic scheme illustrating a backend system 100 according to an embodiment. As illustrated in figure 1, the backend system 100 may comprise a Distributed Data Storage Entity, DDSE, 140 (also referred to as data storage entity 140) and comprises a Distributed Ledger Technology, DLT, platform 130. A plurality of client devices 110a-e of a group 110 of client devices may be each connected via a communication network 101 to the DLT platform 130, which, in turn, may be connected to the DDSE 140. As will be appreciated, although five client devices 110a-e are illustrated in figure 1, the number of the group 110 of client devices is not limited to five and may be higher or lower.

The backend system 100 further comprise a plurality of aggregators 120a-c of a group 120 of aggregator entities which are connected to the DLT platform 130 and may be connected to the DDSE 140. As will be appreciated, although three aggregators 120a-c are illustrated in figure 1, the number of aggregator entities 120 is not limited to three and may be higher or lower. The aggregator entities 120 may be aggregator servers 120. The DLT platform 130 may comprise or be implemented on one or more DLT servers 130.

As illustrated in figure 1, the client devices 110a-e may be any terminal or device which is capable for data collection, model training computational and information interaction capabilities, such as mobile phones 110a-e, smart cars 110a-e, robots 110a-e, laptops 110a-e, and the like.

Before describing different embodiments of the backend system 100 in more detail, in the following some technical background as well as terminology will be introduced making use of one or more of the following abbreviations:

FL Federated Learning

V2X Vehicle to Everything

IIoT Industrial Internet of Things

DLT Distributed Ledger Technology

O & M Operations and Maintenance

FLSLM Federated Learning Service Lifecycle Manager

DDSE Distributed Data Storage Entity

info Information

Task: A learning service which involves multiple clients. For example, a classification task learns how to assign data to categories, and a clustering task to group data according to similarity.

Task meta information: An underlying definition or description for task, including but not limited to Task identifier (aunique identification number to identify an individual task, i.e. Task ID) , Task goal, Task publisher, Task release time, Task training data that involved in the task.

Task lifecycle: A course of events through which task passes during its lifetime, divided into three phases, task preparation, task execution and task termination.

Task O&M information: Information collected during the entire task lifecycle. For example, in the task execution process, it could be global training accuracy, participating clients and aggregators ID.

Task configuration information: Values setting for task execution-related parameters, including but not limited to model architecture, hyperparameters, model parameters, loss function, aggregation algorithm, global epoch, local epoch, task assignment type, data key algorithm, client selection strategy, client requirements.

Model architecture: A blueprint of the machine learning model, including but not limited to the number of layers, the number of hidden states, layer type, activation function.

Hyperparameter: Parameters that is external to the model, the value of which cannot be estimated from data parameters and must be hand-tuned, such as the learning rate for training a neural network, the value for k in k-nearest neighbors.

Model parameter: Tensors of the machine learning model which aim at optimizing the output for specific tasks, normally optimized with back-propagation. In Task configuration information, model parameters refer specifically to that of the initial global model.

Loss function: A method of evaluating how well specific algorithm models the given data.

Aggregation algorithm: A method for combining equivalent results, e.g. the training results on different clients.

Global epoch: One iteration of the learning process is completed, consisting of Aggregator selection and training initialization, Client selection, Client training initialization, Local model update and Global model aggregation.

Local epoch: One iteration of the model training algorithm working in a client’s local dataset, which is used in the step named Local model update.

Task assignment type: Type of the task assignment to be taken by the aggregator or client in each global epoch, e.g., client selection or model aggregation for the aggregator, or local model training for the clients.

Data key: An output result corresponding to the input through some mathematical function that converts an input of arbitrary length into an encrypted output in a one-way process, in which the same input produces the same output. The hash, as commonly used nowadays, can be used as a data key.

Data key algorithm: A mathematical function used to generate the data key, such as Message Digest 5 (MD5) and Secure Hash Algorithm (SHA) for hash.

Client selection strategy: A strategy based on which the participating clients are selected in each global epoch of model training, such as randomly or round-robin selecting.

Client requirements: Requirements for a client to participate in the task, such as the requirements on client capability requirements (e.g. the computational capability, the size of the local training data, power level) , and that on client connectivity requirements (e.g. radio link quality) .

Aggregator selection strategy: A strategy based on which the participating aggregators are selected in each global epoch.

Aggregator requirements: Requirements for an aggregator to participate in the task, such as the computational capability.

Task completion strategy: A strategy which defines the condition for that the lifecycle phase at which the task is located changes from task execution to task termination, (e.g., the maximum number of global epochs is reached) , and which defines the operation of the smart contract on the collection of data (e.g. the selected aggregator ID) or data key (e.g. data key of models) stored on the DLT platform that are generated by consuming the FL service into one transaction.

Figure 2 shows a schematic scheme illustrating an exemplary conventional federated learning system 10. Federated learning is an emerging machine learning setting where the learning task is solved by a loose federation of participating devices 11a-d, i.e. clients 11a-d which are coordinated by a central server 13. Instead of directly uploading the local training dataset to the server 13, each client 11a-d computes an update to the current global model maintained by the central server 13. In this way only local model updates are shared without uploading the raw data to the server 13. With the privacy protection nature, it has great potential in mobile communication systems, being treated as one of the vital solutions to achieve ubiquitous AI in 6G. And it is particularly attractive for vertical applications, such as Vehicle-to-Everything (V2X) and Industrial Internet of Things (IIoT) .

However, there are some issues in the exemplary conventional federated learning system 10. It primarily adopts the central server-based network topology, in which the central server 13 is a must to complete the selection of the clients 11a-d and the maintaining of the global model based on the local models trained and uploaded by clients 11a-d, which may suffer from single point of failure or limited scalability.

Moreover, there is no native trustworthy management of the models, including local models and the global model. In reality, low quality models could be uploaded to the central server 13 by unreliable clients 11a-d, leading to a degradation or even a collapse of training, hence a trusted data management mechanism guaranteeing model trustworthiness is essential. For the conventional FL, data is managed by a central server 13, which causes limitations on the scope of local data usage and FL service provider.

Considering the above problems, one possible solution is to introduce Distributed Ledger Technology (DLT) into the federated learning system 10. The features brought by DLT, such as immutability, traceability, transparency and decentralization, make it ideal to support a distributed data management system. Conventional approaches dealing with the inclusion of DLT in federated learning systems 10 on the management of models can be categorized as follows. After clients uploading the local model update to the immutable ledger, there are two main types of state-of-art aggregation. According to a first conventional approach clients themselves download the packaged local model block for model aggregation and according to a second approach smart contract deployed on the DLT platform performs model aggregation.

However, storing the model data (including local models and global models) directly on the DLT platform can easily lead to “blockchain bloat” problem given that a typical AI model requires several million bytes storage. All data stored on the ledger is tamper-proof and transparent, which would potentially leak private information hidden in the model parameters and greatly discourage clients 11a-d from participating in model training. There are some pitfalls to the aggregation approach. In the first conventional approach, each client 11a-d involved in the training downloads local model block and then acts as an aggregator, which means that each client 11a-d has access to all local models, thus increasing the burden of privacy protection. In the second conventional approach, the smart contract is used as an automatic aggregator, and all the local model parameters of the global epoch are stored in one transaction, which is a big burden on a DLT platform in terms of transaction size and storage. These shortcomings of conventional systems are addressed by embodiments of the invention, which are disclosed in detail in the following.

Figure 3 shows a schematic diagram illustrating a backend system 100 according to an embodiment. The backend system 100 is configured for providing a federated learning, FL, model for the plurality of client devices 110 of the mobile communication network 101. The backend system 100 comprises the plurality of aggregator entities 120. As further described in more detail below, each aggregator entity 120 is configured to aggregate local FL model data from a plurality of selected client devices 110’ of the plurality of client devices 110 for generating global FL model update data for updating the FL model. The backend system 100 further comprises the Distributed Ledger Technology, DLT, platform 130 configured to select an aggregator entity 120’ of the plurality of aggregator entities 120. The selected aggregator entity 120’ is further configured to upload the global FL model update data to the DLT platform 130.

The backend system 100 extends DLT-based data management for FL service. Unlike the conventional federated learning system 10 described above, where the client 11a-d interacts directly with the central server 13, the backend system 100 may accomplish authentication and authorization for data access through the DLT platform 130.

As will be described in more detail further below, the backend system 100 provides a native trustworthy data management mechanism for FL. In this context, the referred data comprises a first data type of data generated due to provision, operation and management of the FL service, including but not limited to task meta information, task configuration information, client registration information and aggregator registration information. Moreover, the referred data comprises a second data type generated by consuming the FL service, including but not limited to model parameters, the selected aggregator ID and clients ID. To enable a transparent data management, the backend system 100 eliminates centralized trust model, in which the DLT platform 130 is utilized to support the authentication and authorization functions of data download and upload. To enable a traceable data management, the backend system 100 may record all operations about data through the DLT platform 130 as tamper-proof system. To enable a distributed and off-chain data storage, the backend system 100 may adopt an off-chain distributed model storage scheme in order to reduce the impact of “blockchain bloat” , the privacy issues caused by the transparency and issues on “the right to be forgotten” caused by immutability of DLT platform.

As will be described in more detail further below, the backend system 100 realizes a distribution of aggregation points by selecting an aggregator 120’ among a number of aggregator entities 120 at each global epoch to perform the functions of the original central server. As a result, the distribution of the aggregation points removes the single point of failure of central design. The backend system 100 may utilize a smart contract-based approach to distribution aggregation points. The smart contract may be deployed to select an aggregator entity 120 based on a pre-defined aggregator selection strategy, thus enabling aggregator discretization. A smart contract can be generally defined by a computer program or a transaction protocol which is intended to automatically execute, control or document legally relevant events and actions according to the terms of a contract or an agreement.

Figure 3 illustrates the system architecture of the backend system 100. As illustrated in Figure 3, the DLT platform 130 may be coupled to a Federated Learning Service Lifecycle Manager, FLSLM, entity 150 by a first interface 160a, to the client devices 110 by a second interface 160b, to the aggregator entities 120 by a third interface 160c and to the DDSE 140 by a fourth interface 160d. Moreover, the DDSE 140 may be coupled to the client devices 110 by a fifth interface 160e, to the aggregator entities 120 by a sixth interface 160f and to the FLSLM entity 150 by a seventh interface 160g.

As further illustrated in figure 3, distinct from the conventional federated learning system 10 architecture, the backend system 100 comprises the DLT platform 130 to decouple the data management from the model aggregator, which decentralizes the data management to make the backend system 100 more trustworthy, decentralizes the central server 13 utilized in FL into the number of aggregator entities 120 and stores aggregator selection strategy, task completion strategy and the data key on the DLT platform 130.

As the backend system 100 may comprise the third interface 160c between the aggregator entities 120 and the DLT platform 130, unlike conventional federated learning systems 10 where the central server 13 has access to data naturally, the aggregator entities 120 may need to request data download and upload permission through the third interface 160c from the DLT platform 130.

Moreover, as the backend system 100 may comprise the sixth interface 160f between the aggregator entities 120 and the DDSE 140, unlike conventional federated learning systems 10 where the central server 13 interacts directly with clients 11a-d for model parameters, aggregator entities 120 may need to interact with the DDSE 140 through the sixth interface 160f after obtaining the permission from the DLT platform 130 in order to get the required data and upload the required data.

As further described below, the backend system 100 may define a transaction set based on the scenarios and a complete procedure set for task, for the purpose of storing all data that needs to be auditable as result of transaction in distributed immutable ledger of the DLT platform 130.

The Federated Learning Service Lifecycle Manager, FLSLM, entity 150 may be responsible for federated learning service lifecycle management, which contains three phases, namely service definition and configuration, service operation and maintenance, and service termination. The FLSLM entity 150 may be a native entity in the mobile communication network 101.

The client devices 110 may be responsible for downloading the global model parameters, training the local model with local data and then uploading the trained local model parameters.

The client devices 110 may be any terminal or device which is capable for data collection, model training computational and information interaction capabilities, such as mobile phones, cars, robots, laptops, and the like.

The aggregator entities 120 may be responsible for model aggregation and client selection. Model aggregation may comprise the aggregation of the local models into a global model and upload of the global model parameters into the DDSE 140. Client selection may comprise the selection of a specified number of client devices 110 to join the training based on the client selection strategy. The aggregator entities 120 may be comprised within the core network of mobile communication system 101, such as in form of network entity with capabilities for model aggregation and message exchange with other network entities, or may belong to a third party, in which case proper registration and authentication procedure may be required.

The Distributed Data Storage Entity, DDSE, 140 may be responsible for raw data storage, in particular for storing client registration information, task meta information, task configuration information and model parameters. The DDSE 140 may be a native entity in the mobile communication network 101.

The DLT Platform 130 may use distributed, shared ledgers for recording transactions, in which transactions may be recorded with an immutable cryptographic signature. The smart contract may be a computer program or a transaction protocol implemented on the DLT platform 130 that will be activated automatically when predetermined conditions are met, and it carries out trusted transactions and agreements without need for a third-party authority.

The DLT platform 130 may implement three functions, i.e. data management, aggregator selection and task completion. Data management refers to management of data with regard to the upload and download access permission and record. Aggregator selection means the selection of the aggregator according to the aggregator selection strategy. Task completion means the collection of data (e.g. the selected aggregator ID) or data key (e.g. data key of models) stored on the DLT platform 130 that are generated by consuming the FL service into one transaction, i.e. data collection. It may occur when the pre-defined conditions on task termination are met or task termination request from the FLSLM entity 150 is received, and the end of the corresponding smart contract call is used to mark the task termination. The DLT platform 130 may be a native entity in the mobile communication network 101.

Via the first interface 160a, in the first phase of service lifecycle management, the FLSLM entity 150 may complete the initialization of the task, such as the deployment of aggregator selection strategy and task completion strategy in form of smart contract. Then in the second phase, the FLSLM entity 150 may collect the task O&M information, and at the end, the FLSLM entity 150 may send task termination request to the DLT platform, then collect the data. Through the first interface 160a, the FLSLM entity 150 may interact with the DLT platform 130 to obtain the upload and/or download permission.

Via the second interface 160b, the client devices 110 may check in to know whether they are selected for local model training in each global epoch. All data download and upload operations from client devices 110 may need to be permitted by the DLT platform 130, i.e. through the second interface 160b, the clients 110 may interact with the DLT platform 130 to obtain the upload and/or download permission.

Via the third interface 160c, the aggregator entities 120 may check in to know whether they are selected for model aggregation or client selection in each global epoch. All data download and upload operations from aggregator entities 120 may need to be permitted by the DLT platform 130, i.e. through the third interface 160c, the aggregators 120 may interact with the DLT platform 130 to obtain the upload and/or download permission.

Via the fourth interface 160d, the DDSE 140 may need to verify the validity of the authorization before sending data back or writing it down.

The fifth interface 160e may implement requests and responses of data download and upload for the client devices 110. After clients 110 obtain the upload permission or download permission from the DLT platform 130 through the second interface 160b, clients 110 may interact with DDSE 140 to upload and/or download requested data.

The sixth interface 160f may implement requests and responses of data download and upload for the aggregator entities 120. After the aggregators 120 obtain the upload permission or download permission from the DLT platform 130 through the third interface 160c, the aggregators 120 may interact with the DDSE 140 to upload and/or download requested data.

The seventh interface 160g may implement requests and responses of data download and upload for the FLSLM entity 150. After the FLSLM entity 150 obtains the upload permission or download permission from the DLT platform 130 through the first interface 160a, the FLSLM entity 150 may interact with the DDSE 140 to upload and/or download requested data.

In the conventional applications of original DLTs, i.e. Bitcoin, transaction is a terminology referring to a transfer of Bitcoin value and nowadays, to maintain traceability and auditability, the associated operations are stored as transactions in an immutable ledger. In addition to the system architecture and functional entities described above, a transaction set in the backend system 100 may be based on the scenarios described further below to achieve native trustworthy data management.

In the following, a download requester refers to an entity that needs to download information from the DDSE 140, including but not limited to the FLSLM entity 150, the client devices 110, the aggregators 120, network functions of the service provider network or a third-party application provider. Moreover, an upload requester refers to an entity that needs to upload information to the DDSE 140, including but not limited to the FLSLM entity 150, the client devices 110 or the aggregators entities 120.

In the following, a data upload (Tup) may comprise an upload requester requesting access to upload data. A data download (Tdown) may comprise a download requester requesting access to download data. An aggregator selection (Tsel) may comprise a smart contract deployed on the DLT platform 140 being invoked to perform aggregator selection for implementing functions similar as by the central server 13 in the conventional FL system 10.

Moreover, a transaction confirmation (Tcon) may comprise the DDSE 140 sending transaction confirmation to the DLT platform 130 after accomplishing the required operations. A task completion (Tcomp) may comprise a smart contract being invoked to collect all data or data key stored on the DLT platform 130 that are generated by consuming the FL service, the occurrence of which is used to mark the task termination.

Figure 4 shows a schematic scheme illustrating a service lifecycle in the backend system 100 according to an embodiment and the corresponding operations. As described above, federated learning service lifecycle management may comprise three main phases, namely Service definition, deployment and configuration 410, Service operation and maintenance 420, and Service termination 430, which is in charge of task preparation, task execution and task termination of task lifecycle respectively.

The first main phase of the federated learning service lifecycle of service definition, deployment and configuration 410 may comprise a service definition, wherein the FLSLM entity 150 may define the task, e.g. task meta information, task configuration information. The first phase may further comprise the service deployment, wherein the FLSLM entity 150 may deploy an aggregator entity 120 of the aggregator entities 120 if necessary. Then the FLSLM entity 150 may deploy aggregator selection strategy and task completion strategy in form of smart contract on the DLT platform 130, as illustrated in figure 4 by Smart contract deployment. The first phase may further comprise the service configuration, wherein as illustrated by Task configuration in figure 4, the FLSLM entity 150 may store the data key of task meta information and that of task configuration information to the DLT platform 130 and raw data to DDSE 140, respectively.

The second main phase of the federated learning service lifecycle of Service Operation and Maintenance 420 may comprise, as illustrated by Task O&M information collection in figure 4, the FLSLM entity 150 collecting the task O&M information for the further processing, on the basis of which further processing, such as data monitoring or data analysis, may be performed.

The third main phase of the federated learning service lifecycle of Service Termination 430 may comprise the FLSLM entity 150 terminating services or services are terminated when meeting the pre-defined requirements. Then a smart contract may be invoked to collect all data or data key stored on the DLT platform 130 that are generated by consuming the FL service, and the FLSLM entity 150 may collect the data (e.g. global model) , as illustrated by Task termination in figure 4. As the last step, the FLSLM may remove the aggregator entity 120 deployed in the first main phase if necessary.

Figure 5 shows a schematic scheme illustrating a task execution in one global epoch in the backend system 100 according to an embodiment. The task execution may be divided into several global epochs, and each global epoch may comprise the following five steps: Aggregator selection and training initialization, Client selection, Client training initialization, Local model update and Global model aggregation.

The first step (illustrated by the symbol ① in figure 5) may comprise aggregator selection and training initialization. For each global epoch of client selection or model aggregation, the DLT platform 130 may use the smart contract to select the aggregator entity 120’. After being selected, the selected aggregator entity 120’ may need to check whether it has task meta information and task configuration information. If not, it may need to perform the task initialization.

The second step (illustrated by the symbol ② in figure 5) may comprise a client selection. After aggregator selection for the client selection and training initialization, the aggregator entity 120 may be in charge of selecting client devices 110 that will participate in this global epoch of local model training. Then the aggregator entity 120 may return the selected clients ID to the DLT platform 130.

The third step (illustrated by the symbol ③ in figure 5) may comprise a client training initialization. After being selected for local model update, the client device 110 may need to check whether it has task meta information and task configuration information. If not, it may need to perform the training initialization.

The fourth step (illustrated by the symbol ④ in figure 5) may comprise a local model update. The selected client device 110 may download the latest global model parameters from the DLT platform 130 and the DDSE 140, may perform local model training based on the local data and then may upload the local model parameters.

The fifth step (illustrated by the symbol ⑤ in figure 5) may comprise a global model aggregation. After aggregator selection for the model aggregation and training initialization, the selected aggregator entity 120 may download the related local model parameters from the DLT platform 130 and the DDSE 140, may aggregate the local model parameters based on the aggregation algorithm to get the global model, and then may uploads its parameters.

Figures 6 to 16 show signaling diagrams illustrating the interaction of several components of the backend system 100 according to embodiments. As illustrated in figures 6 and 7, by the introduction of the DLT platform 130, data access no longer implies direct interactions between client devices 110 and a central server 13. To enable data management, a download requester 600 may be required to obtain authorization from the DLT platform 130 before downloading data from the DDSE 140. In other words, the download requester 600 needs to request download access from the DLT platform 130. Similarly, an upload requester 700 needs to send upload access request, including a data key, to the DLT platform 130 before uploading data to the DDSE 140. Afterwards, the DDSE 140 may need to verify the data validation before returning the requested data or writing down the uploaded data.

In step 601 of figure 6, before downloading data from the DDSE 140, the download requester 600 may send a download access request by initiating transaction Tdown to the DLT platform 130.

In step 603 of figure 6, based on the permission of the download requester 600, the DLT platform 140 may return a download access response.

In step 605 of figure 6, the download requester 600 may initiate a download request to the DDSE 140 including access authorization.

In step 607 of figure 6, the DDSE 140 may initiate a data verification request to the DLT platform 130 and receive the corresponding response.

In

steps

609 and 611 of figure 6, the DDSE 140 may return the requested data to the download requester 600 and may send a download response confirmation as transaction Tcon to the DLT platform 130.

Likewise, in step 701 of figure 7, before uploading data to the DDSE 140, the upload requester 700 may send an upload access request by initiating transaction Tup to the DLT platform 130.

In step 703 of figure 7, based on the permission of the upload requester 700, the DLT platform 130 may grant the upload requester 700 upload authorization.

In step 705 of figure 7, the upload requester 700 may initiate an upload request to the DDSE 140 comprising the access authorization.

In step 706 of figure 7, the DDSE 140 may generate a data key of the uploaded data according to the pre-defined data key algorithm.

In step 707 of figure 7, the DDSE 140 may initiates a data verification request to the DLT platform 130 and may receive the corresponding response.

In

steps

709 and 711 of figure 7, the DDSE 140 may write data down and may send an upload response confirmation in form of transaction Tcon to the DLT platform 130.

As will be appreciated, all the procedures are described without exceptional case, e.g. invalid permission. However, when the DLT platform 130 or the DDSE 140 may encounter an abnormal access, it will directly return a message indicating that the access is denied. In this context, if from the core network of mobile communication system 101, the aggregator entity 120 may be regarded as honest and not curious. If belonging to a third party, the aggregator entity 120 may be regarded as honest but curious. The client device 110 may be regarded as malicious.

When the FLSLM entity 150 has completed the initial interaction with the DLT platform 130, each of them may have obtained the communication address of the other. Client devices 110 may have attached to the mobile communication network 101 and been authenticated by the mobile communication network 101, during which the DLT platform 130 may obtain the communication addresses of them, and after which the data key of client registration information and raw data may be stored in the DLT platform 130 and the DDSE 140 respectively. Aggregator entities 120 may be part of a mobile communication system core network or may belong to the third party and may involve proper registration and authentication procedure being completed, after which aggregator registration information is stored in the DLT platform 130.

The specific procedure of the smart contract deployment may be implemented as shown in Figure 8 after the FLSLM 150 defining task in order to deploy smart contract on the DLT platform 130.

In step 801 of figure 8, the FLSLM entity 150 may send a smart contract deployment request including task ID, FLSLM ID and a concrete smart contact to the DLT platform 130.

In step 803 of figure 8, the DLT platform 130 may deploy the smart contract.

In step 805 of figure 9, the DLT platform 130 may return a smart contract deployment response to the FLSLM entity 150.

The specific procedure of task configuration may be implemented as shown in Figure 9 in order for the FLSLM entity 150 to upload the task meta information and task configuration information to the DLT platform 130 and the DDSE 140.

In step 701 of figure 9, the FLSLM entity 150 may send an upload access request comprising task ID, FLSLM ID, data key of task meta information and that of task configuration information to the DLT platform 130 by initiating transaction Tup.

In step 703 of figure 9, the FLSLM entity 150 may obtain the upload access response, comprising the upload authorization, from the DLT platform 130.

In step 705 of figure 9, the FLSLM entity 150 may initiate an upload request to the DDSE 140 comprising task ID, FLSLM ID, task meta information and task configuration information.

In step 706 of figure 9, the DDSE 140 may generate the data key of the FLSLM entity 150 uploaded data according to the pre-defined data key algorithm, i.e. data key of task meta information and that of task configuration information.

In step 707 of figure 9, the DDSE 140 may initiate a data verification request comprising task ID, FLSLM ID and data key, to the DLT platform 130. Through data validation, the DDSE 140 may know whether the FLSLM entity 150 is authorized to write data for the specific task, and that the written data is consistent with what is recorded on the DLT platform 130.

In

steps

709 and 711 of figure 9, the DDSE may write the data down, return an upload response to the FLSLM entity 150, and send an upload response confirmation by initiating transaction Tcon to the DLT platform 130.

Figure 10 illustrates the specific procedure of aggregator selection and training initialization. For each global epoch of model aggregation or client selection, the DLT platform 130 may use the smart contract to select the aggregator 120. As illustrated in figure 10, one or more of the aggregator entities 120 may be chosen as the selected aggregator 120’. After being selected for the task, the selected aggregator 120’ may need to check whether it has task meta information and task configuration information. If not, then the following training initialization procedure may be performed. If it does, it may be skipped.

In step 1001 of figure 10, the DLT platform 130 may use the smart contract to select the aggregator entity 120 for this global epoch, which is recorded as transaction Tsel.

In step 1003 of figure 10, periodically, the aggregator entities 120 may check in by sending a task assignment participation request to the DLT platform 130 in order to check whether they are selected.

In step 1005 of figure 10, the DLT platform 130 may then return a task assignment participation response including task ID, task assignment type and global epoch index.

In step 601 of figure 10, the selected aggregator 120’ may send a download access request comprising task ID, aggregator ID by initiating transaction Tdown.

In step 603 of figure 10, the selected aggregator 120’ may then obtain the download access response, comprising data key of task meta information and that of task configuration information, from the DLT platform 130.

In step 605 of figure 10, the selected aggregator 120’ may initiate a download request to the DDSE 140 comprising task ID, aggregator ID, data key of task meta information and that of task configuration information.

In step 607 of figure 10, the DDSE 140 may initiate a data verification request comprising task ID, aggregator ID and data key. Through data validation, the DDSE 140 may know whether the selected aggregator 120’ is authorized to read data in the specific task, and that the requested data is consistent with what is recorded on the DLT platform 130.

In

steps

609 and 611 of figure 10, the DDSE 140 may return the requested data to the selected aggregator 120’, and may send a download response confirmation as transaction Tcon to the DLT platform 130.

The steps 601 to 611 of figure 10 may only relate to the selected aggregator 120’, involving conditional training initialization, which each aggregator entity 120 may need to complete when executing the assignment associated with a task and may only need to execute once for the same task.

As illustrated in figure 11, the selected aggregator 120’ may be in charge of selecting clients 110 for local model training in this global epoch. As a result, the selected aggregator entity 120’ may return the selected clients ID to the DLT platform 130.

In step 601 of figure 11, the selected aggregator 120’ may initiate transaction Tdown comprising task ID, aggregator ID as a download access request.

In step 603 of figure 11, the selected aggregator 120’ may then obtain the download access response, including data key of client registration information, from the DLT platform 130.

In step 605 of figure 11, the selected aggregator 120’ may initiates a download request to the DDSE 140 comprising task ID, aggregator ID, data key of client registration information.

In step 607 of figure 11, the DDSE 140 may send the data verification request comprising task ID, aggregator ID and data key. Through data validation, the DDSE 140 may know whether the selected aggregator 120’ is authorized to read data in the specific task, and that the requested data is consistent with what is recorded on the DLT platform 130.

In step 609 of figure 11, the DDSE 140 may return the requested client registration information to the selected aggregator 120’, and may send a download response confirmation to the DLT platform 130 through transaction Tcon.

In step 1101 of figure 11, the selected aggregator 120’ may perform client selection based on the client selection strategy.

In step 701 of figure 11, the selected aggregator 120’ may initiate transaction Tup comprising task ID, aggregator ID, global epoch index, selected clients ID as an upload access request.

In step 703 of figure 100, the selected aggregator 120’ may finally obtain the upload access response from the DLT platform 130.

As illustrated in figure 12, in step 1201 one or more of the client devices 110 may be chosen as the selected client device 110’. After being selected for local model update for the task, the selected client device 110’ may need to check whether it has task meta information and task configuration information. If not, then the following procedure will be performed. If it does, it skips the training initialization steps and proceeds directly to the next procedure.

Similar to step 1003 of figure 10, in step 1203 of figure 12, periodically, the client devices 110 may check in by sending a task assignment participation request to the DLT platform 130 to check whether they are selected.

Similar to step 1005 of figure 10, in step 1205 of figure 12, the DLT platform 130 may then return a task assignment participation response comprising task ID, task assignment type and global epoch index.

In step 601 of figure 12, the selected client device 110’ may initiate transaction Tdown comprising task ID and client ID as a download access request.

In step 603 of figure 12, the selected client device 110’ may then obtain the download access response, comprising data key of task meta information and that of task configuration information, from the DLT platform 130.

In step 605 of figure 12, the selected client device 110’ may initiate a download request to the DDSE 140 comprising task ID, client ID, data key of task meta information and that of task configuration information.

In step 607 of figure 12, the DDSE 140 may send a data verification request comprising task ID, client ID and data key. Through data validation, the DDSE 140 may know whether the selected client device 110’ is authorized to read data in the specific task, and that the requested data is consistent with what is recorded on the DLT platform 130.

In

steps

609 and 611 of figure 12, the DDSE 140 may return the task meta information and task configuration information to the selected client device 110’, and may send a download response confirmation to the DLT platform 130 as transaction Tcon.

The steps 601 to 611 of figure 12 are only related to the selected client devices 110’, involving conditional training initialization, which each client device 110 needs to complete when executing the assignment associated with a task and only needs to execute once for the same task.

As illustrated in figure 13, the selected client devices 110’ may download the latest global model parameters, perform local model training based on the local data and then upload the local model.

In step 601 of figure 13, the selected client device 110’ may send a download access request comprising task ID, client ID and global epoch index by initiating transaction Tdown.

In step 603 of figure 13, the selected client device 110’ may then obtain the download access response, comprising data key of global model parameters, from the DLT platform 130.

In step 605 of figure 13, the selected client device 110’ may afterwards initiate a download request to the DDSE 140 comprising task ID, client ID and data key of global model parameters.

In step 607 of figure 13, the DDSE 140 may send the data verification request including task ID, client ID and data key. Through data validation, the DDSE 140 may know whether the selected client device 110’ is authorized to read data in the specific task, and that the requested data is consistent with what is recorded on the DLT platform 130.

In

steps

609 and 611 of figure 13, the DDSE 140 may return the requested data to the selected client device 110’, and may send a download response confirmation to the DLT platform 130 through transaction Tcon.

In step 1301 of figure 13, the selected client device 110’ may train the model with local data to get the local model.

In step 701 of figure 13, the selected client device 110’ may then initiate transaction Tup by specifying task ID, client ID, global epoch index and data key of local model parameters as upload access request.

In step 703 of figure 13, the selected client device 110’ may afterwards obtain the upload access response, including upload authorization, from the DLT platform 130.

In step 705 of figure 13, the selected client device 110’ may initiate an upload request to the DDSE 140 comprising task ID, client ID and local model parameters.

In step 706 of figure 13, the DDSE 140 may generate the data key of client uploaded data according to the pre-defined data key algorithm, i.e. the data key of local model parameters.

In step 707 of figure 13, the DDSE 140 may send the data verification request comprising task ID, client ID and data key. Through data validation, the DDSE 140 may know whether the selected client device 110’ is authorized to write data in the specific task, and that the written data is consistent with what is recorded on the DLT platform 130.

In

steps

709 and 711 of figure 13, the DDSE 140 may write data down, return upload response to the selected client device 110’, and send an upload response confirmation to the DLT platform 130 through transaction Tcon.

As illustrated in figure 14, the selected aggregators 120’ may download the related local model parameters, aggregate the local model parameters based on the aggregation algorithm to get the global model, then upload its parameters to the DLT platform 130 and DDSE 140.

In step 601 of figure 14, the selected aggregator 120’ may send a download access request comprising task ID, aggregator ID and global epoch index by initiating transaction Tdown.

In step 603 of figure 14, the selected aggregator 120’ may then obtains the download access response, including data key of local model parameters, from the DLT platform 130.

In step 605 of figure 14, the selected aggregator 120’ may afterwards initiate a download request to the DDSE 140 comprising task ID, aggregator ID and data key of local model parameters.

In step 607 of figure 14, the DDSE 140 may send a data verification request comprising task ID, aggregator ID and data key. Through data validation, the DDSE 140 may know whether the selected aggregator 120’ is authorized to read data in the specific task, and that the requested data is consistent with what is recorded on the DLT platform 130.

In

steps

609 and 611 of figure 14, the DDSE 140 may return local model parameters to the selected aggregator 120’ and may send a download response confirmation to the DLT platform 130 in form of transaction Tcon.

In step 1401 of figure 14, the selected aggregator 120’ may perform model aggregation based on aggregation algorithm to get the global model.

In step 701 of figure 14, the selected aggregator 120’ may then initiate transaction Tup comprising task ID, aggregator ID, global epoch index and data key of global model parameters as upload access request.

In step 703 of figure 14, the selected aggregator 120’ may afterwards obtain the upload access response, including upload authorization, from the DLT platform 130.

In step 705 of figure 14, the selected aggregator 120’ may initiate an upload request to the DDSE 140 comprising task ID, aggregator ID and global model parameters.

In step 706 of figure 14, the DDSE 140 may generate the data key of aggregator uploaded data according to the pre-defined data key algorithm, i.e. the data key of global model parameters.

In step 707 of figure 14, the DDSE 140 may send the data verification request comprising task ID, aggregator ID and data key. Through data validation, the DDSE 140 may know whether the selected aggregator 120’ is authorized to write data in the specific task, and that the written data is consistent with what is recorded on the DLT platform 130.

In

steps

709 and 711 of figure 14, the DDSE 140 may write the data down, return an upload response to the selected aggregator 120’, and send an upload response confirmation to the DLT platform 130 through transaction Tcon.

As illustrated in figure 15, the FLSLM entity 150 may obtain task O&M information by interacting with the DLT platform 130 and the DDSE 140. For example, participating clients and aggregators ID can be read from the DLT platform 130, and the global model parameters from the DDSE 140.

In step 601 of figure 15, the FLSLM entity 150 may send a download access request comprising task ID, FLSLM ID to the DLT platform 130 by initiating transaction Tdown.

In step 603 of figure 15, the FLSLM entity 150 may then obtain the download access response, comprising task O&M information or data key of it, from the DLT platform 130.

In step 605 of figure 15, the FLSLM entity 150 may initiate a download request to the DDSE 140 comprising task ID, FLSLM ID, data key of task O&M information.

In step 607 of figure 15, the DDSE 140 may send a data verification request comprising task ID, FLSLM ID and data key. Through data validation, the DDSE 140 may know whether the FLSLM entity 150 is authorized to read data in the specific task, and that the requested data is consistent with what is recorded on the DLT platform 130.

In

steps

609 and 611 of figure 15, the DDSE 140 may return the requested data to FLSLM entity 150 and send a download response confirmation in form of transaction Tcon to the DLT platform 130.

As illustrated in figure 16, there may be two cases of task termination. According to a first task termination case the FLSLM entity 150 may decide to stop the corresponding task, and according to a second task termination case the DLT platform 130 may decide to stop the corresponding task.

In steps 1601 to 1605 of figure 16, according to the first task termination case, the FLSLM entity 150 may send a task termination request to the DLT platform 130, then the smart contract may be invoked to initiate the transaction Tcomp, through which data (e.g. the selected aggregator ID) or data key (e.g. data key of models) , stored on the DLT platform 130 that are generated by consuming the FL service, is collected. Then the DLT platform 130 may return the data collection.

In step 1607 of figure 16, according to the second task termination case, the smart contract named Task completion may be invoked by the DLT platform 130 automatically after some pre-defined conditions are met, e.g. the maximum number of global epochs is reached. In steps 601 to 603 of figure 16, when the FLSLM entity 150 then interacts with the DLT platform 130 to download information, the FLSLM entity 150 may get the response including the data collection.

In steps 605 to 611 of figure 16, the FLSLM entity 150 may then interact with the DDSE 140 to download the raw data.

After the task termination, the DLT platform 130 may notify the client devices 110 and aggregator entities 120 of the end, for example the DLT platform 130 may inform them of the termination when they try to upload or download data related to that task. As the last step, the FLSLM entity 150 may remove the deployed aggregator entity 120 if necessary.

In an embodiment, in the aggregator selection, multiple aggregator entities 120 may be selected for aggregation instead of just one, with the goal to avoid excessive computational effort when a great number of local model parameters need to be combined. The specific architecture of aggregator entities 120 may be implemented in form of hierarchy.

In an embodiment, besides the global epoch, global training accuracy may also be used as a metric for invoke of smart contract on task completion, marking task termination.

In the local model update, the client devices 110 may be asked to upload the local training accuracy based on the global model and local dataset in addition to the local model parameters. Then in the subsequent model aggregation phase, the aggregator entities 120 may aggregate the local training accuracy uploaded by the client devices 110 to obtain the global training accuracy and upload it to the DLT platform 130. After the global training accuracy reaches the preset requirement, transaction Tcomp may be initiated to collect the data without waiting for the task termination request from the FLSLM entity 150 or running until enough global epochs passed.

An application scenario of the backend system 100 is for example Vehicle-to-everything (V2X) communication. Vehicle-to-everything (V2X) communication refers to the information passing between vehicle and any entity that may affect or may be affected by the vehicle, such as V2I (Vehicle-to-Infrastructure) , V2N (Vehicle-to-Network) and V2V (Vehicle-to-Vehicle) . Driven by improvement requirements on road safety, traffic efficiency and energy savings, it is considered as one of the indispensable technologies. As uploading raw data among vehicles while improving the learning model performance is not a require, FL is suitable for supporting privacy preserving cooperative driving functionalities. A corresponding simulation for a FL approach developed for joint power and resource allocation in V2V communications shows that the tail distribution of queue can be estimated with an up to 79%reductions in the amount of exchanged data. With the adoption of the backend system 100 according to one of the embodiments described above, data management may be decoupled from aggregation method, and the client devices 110 involved in model training are not limited to certain central server tasks, the data for model training greatly increases, as further described below.

A further application scenario of the backend system 100 is for example the Industrial Internet of Things (IIoT) . As an extension of Internet of Things, Industrial Internet of Things (IIoT) refers to interconnected smart sensors, instruments and other devices networked together with computers' industrial applications with the target to enhance manufacturing and industrial processes. Taking healthcare as a specific application scenario, as compared to other domains, data in this area may be highly sensitive subject to health regulations. In this situation, FL is viewed as a suitable solution in providing intelligent services while promoting well client privacy. With the adoption of the backend system 100, the DLT platform 130 may carry out the authentication and authorization of data access, and data interaction of federated learning may be recorded in a tamper-proof ledger, which means that subsequently, if needed, client devices 110 could query the access record of local model trained by their own dataset, and the record is trustworthy. At the same time, all participants in the task are registered or entities in the communication network 101. Thus, clients' concern about data privacy leakage can be greatly reduced.

Figure 17a shows a schematic scheme illustrating the conventional federated learning system 10 and a further conventional federated learning system 10’ and figure 17b shows a schematic scheme illustrating the backend system 100 according to an embodiment. As illustrated in Figure 17a, the further exemplary federated learning system 10’ comprises further clients 11a’, 11b’ connected to a further central server 13’.

As illustrated in figure 17b, the backend system 100 achieves decoupling data management from the aggregator method by introducing the DLT platform 130 with the data storage entity 140. This is likely to change the existing business model where the client data collected or generated by a central server 13, 13’ can generally only be used for the relevant federated learning tasks, as in the conventional federated learning system 10, 10’, data management and aggregation method are both realized through the central server 13, 13’, and

clients

11a, 11b, 11a’, 11b’ and central server 13, 13’ communicate directly, and to some extent client data is bound to the corresponding central server 13, 13’.

As an example, the relationship between the elapsed time in a certain section of a road and the road condition is considered. The two types of

car clients

11a, 11b, 11a’, 11b’ in the figure 17a can only take part in a federated learning task with the corresponding central server 13, 13’. A first group of

clients

11a, 11b can only participate in the task illustrated in the upper half of figure 17a and a second group of clients 11a’, 11b’ can only participate in the task illustrated in the lower half of figure 17a. While in the backend system 100 illustrated in figure 17b, as the client 101a-d interact with the aggregators 120a-b through the DLT platform 130, cars of different client groups (illustrated by the white cars of clients 101a-b and by the black cars of clients 101c-d) can participate in the same task, for example, both client groups could join the first task, which then greatly expands the number of client devices 110 used for the task. This will likely change the existing business model and expand the ecosystem of data.

Moreover, instead of storing data directly to the DLT platform 130, which brings huge burden on DLT storage and privacy protection, the backend system 100 adopts the way of storing data key on-chain and raw data off-chain, which avoids the above-mentioned defects and realizes the data management with tamper-proof and traceability.

Moreover, the central server 13, 13’ is decentralized through selecting a particular aggregator 120’ among many aggregator entities 120 to perform client selection and model aggregation, which reduces a series of pitfalls brought by centralization, such as single point of failure.

Figure 18 is a flow diagram illustrating a method 1800 of operating the backend system 100 for providing the federated learning, FL, model for the plurality of client devices 110 of the mobile communication network 101, the backend system 100 comprising the plurality of aggregator entities 120, each aggregator entity 120 being configured to aggregate local FL model data from a plurality of selected client devices 110’ of the plurality of client devices 110 for generating global FL model update data for updating the FL model. The method 1800 comprises a step 1801 of selecting an aggregator entity 120’ of the plurality of aggregator entities 120 by the Distributed Ledger Technology, DLT, platform 130. The method further comprises a step 1803 of uploading the global FL model update data to the DLT platform 130 by the selected aggregator entity 120’.

As the method 1800 can be implemented by the backend system 100, further features of the method 1800 result directly from the functionality of the backend system 100 and its different embodiments described above and below.

The person skilled in the art will understand that the "blocks" ( "units" ) of the various figures (method and apparatus) represent or describe functionalities of embodiments of the present disclosure (rather than necessarily individual "units" in hardware or software) and thus describe equally functions or features of apparatus embodiments as well as method embodiments (unit = step) .

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described embodiment of an apparatus is merely exemplary. For example, the unit division is merely logical function division and may be another division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, functional units in the embodiments disclosed herein may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.

Claims

A backend system (100) for providing a federated learning, FL, model for a plurality of client devices (110) of a mobile communication network (101) , wherein the backend system (100) comprises:

a plurality of aggregator entities (120) , each aggregator entity (120) configured to aggregate local FL model data from a plurality of selected client devices (110’) of the plurality of client devices (110) for generating global FL model update data for updating the FL model; and

a Distributed Ledger Technology, DLT, platform (130) configured to select an aggregator entity (120’) of the plurality of aggregator entities (120) ;

wherein the selected aggregator entity (120’) is further configured to upload the global FL model update data to the DLT platform (130) .
The backend system (100) of claim 1, wherein the backend system (100) further comprises one or more data storage entities (140) , wherein the one or more data storage entities (140) are configured to store the global FL model update data from the selected aggregator entity (120’) and wherein the selected client devices (110’) are configured to download the global FL model update data from the one or more data storage entities (140) .
The backend system (100) of claim 2, wherein the one or more data storage entities (140) are further configured to store the local FL model data from the plurality of selected client devices (110’) of the plurality of client devices (110) and wherein the selected aggregator entity (120’) is configured to download the local FL model data from the one or more data storage entities (140) .
The backend system (100) of any one of the preceding claims, wherein the DLT platform (130) is configured to select the one or more aggregator entities (120’) of the plurality of aggregator entities (120) based on a smart contract.
The backend system (100) of any one of the preceding claims, wherein the selected aggregator entity (120’) is configured to select the plurality of selected client devices (110’) of the plurality of client devices (110) .
The backend system (100) of any one of the preceding claims, wherein the DLT platform (130) is configured to store one or more data keys for the local FL model data of the plurality of client devices (110) .
The backend system (100) of claim 6, wherein the DLT platform (130) is further configured to store one or more further data keys for the global FL model update data from the selected aggregator entity (120’) .
The backend system (100) of any one of the preceding claims, wherein the DLT platform (130) is configured to perform authentication, authorization and/or access control of the plurality of selected client devices (110’) and/or the selected aggregator entity (120’) .
The backend system (100) of any one of the preceding claims, wherein the backend system (100) further comprises a FL service lifecycle manager, FLSLM, entity (150) , wherein the FLSLM entity (150) is configured to define and configure the global FL model.
The backend system (100) of claim 9, wherein the FLSLM entity (150) is further configured to monitor the operation of the FL model deployed in the DLT platform (130) .
The backend system (100) of claim 9 or 10, wherein the FLSLM entity (150) is further configured to terminate operation of the FL model deployed in the DLT platform (130) .
The backend system (100) of any one of the preceding claims, wherein each of the plurality of aggregator entities (120) is configured to periodically check the DLT platform (130) whether the respective aggregator entity (120) has been selected as the selected aggregator entity (120’) and/or for the local FL model data.
The backend system (100) of any one of the preceding claims, wherein each client device (110) is configured to periodically check the DLT platform (130) platform whether the respective client device (110) has been selected as one of the plurality of selected client devices (110’) and/or for the global FL model update data.
The backend system (100) of any one of the preceding claims, wherein the DLT platform (130) is further configured to receive an upload request from the selected aggregator entity (120’) and/or the plurality of selected client devices (110’) , wherein the upload request comprises a data key of the upload data.
A method (1800) of operating a backend system (100) for providing a federated learning, FL, model for a plurality of client devices (110) of a mobile communication network (101) , the backend system (100) comprising a plurality of aggregator entities (120) , each aggregator entity (120) being configured to aggregate local FL model data from a plurality of selected client devices (110’) of the plurality of client devices (110) for generating global FL model update data for updating the FL model, wherein the method (1800) comprises:

selecting (1801) an aggregator entity (120’) of the plurality of aggregator entities (120) by a Distributed Ledger Technology, DLT, platform (130) ; and

uploading (1803) the global FL model update data to the DLT platform (130) by the selected aggregator entity (120’) .
A computer program product comprising a computer-readable storage medium for storing program code which causes a computer or a processor to perform the method (1800) of claim 15, when the program code is executed by the computer or the processor.