WO2021008675A1

WO2021008675A1 - Dynamic network configuration

Info

Publication number: WO2021008675A1
Application number: PCT/EP2019/068908
Authority: WO
Inventors: Tony Larsson; Johan HARALDSON; Martin Isaksson
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2021-01-21

Abstract

A method (200) is proposed for dynamically configuring a network comprising a plurality of client computing devices configured to perform training of a plurality of machine learning models. The method is performed at a server computing device and comprises the steps of: (202) receiving at least one parameter from each of the plurality of client computing devices; (204) clustering the plurality of client computing devices into a plurality of groups based on the received at least one parameter, wherein each group of the plurality of groups comprises at least one client computing device sharing a machine learning model within the group; (206) evaluating the clustering of the plurality of client computing devices into the plurality of groups based on an evaluation criterion; (208) updating the clustering of the plurality of client computing devices into the plurality of groups dynamically to satisfy the evaluation criterion.

Description

DYNAMIC NETWORK CONFIGURATION

Technical Field

The present disclosure relates to a method, a server computing device, a computer program, and a computer program product for dynamically configuring a network comprising a plurality of client computing devices configured to perform training of a plurality of machine learning models.

Background

In computer science, artificial intelligence (Al) is intelligence demonstrated by machines. A typical Al system takes actions that maximise its chances of successfully achieving a certain goal by using computational methods to automatically learn and improve from data and experience without being explicitly programmed. This is known as machine learning (ML). In order to train a computational model thoroughly, it is required that the model performs many training iterations on many different sets of data. The model can then be updated based on feedback from its performance. In general, the more data that can be accessed to train a machine learning model, the more accurate that model will become.

The computational power required to perform such training is vast. Therefore, a decentralised approach to machine learning has been developed. In decentralised learning, devices in a network collaboratively train a shared model using their own respective training data, without that training data leaving the device. This has a number of advantages. Using a large number of devices in a network provides significant increases in computational power available for training. By training models based on data locally stored in the training devices, such sensitive data can be used in training without being transferred over the network. Furthermore, this approach allows limitations in uplink bandwidth and network coverage to be mitigated.

Several algorithms have been presented to enable decentralised learning. The algorithm presented by Google, named“FederatedAveraging” (McMahan et al. ,“Federated Learning of deep networks using model averaging”, arXiv: 1602.05629, 201 6), has led to the coining of the term "federated learning”. This algorithm addresses several real-world challenges; the ability to handle unbalanced and non-IID (independent and identically distributed) data, massively distributed data (where there are more devices than the average number of data samples that can be used for training per device), reductions in communication needed for training, as well as limitations in device connectivity. Empirical results show that the FederatedAveraging algorithm works for different model architectures; multi-layer perceptron, convolutional neural networks, and recurrent neural networks.

In the FederatedAveraging algorithm, a server first initialises the weights of a neural network model. For every training round, the server sends the model weights to a fraction of client devices that are available to take part of the training, and the client devices return their evaluation of the model performance. Each client being part of the training initialises a local copy of the neural network model with the received weights and runs one or more epochs (where 1 epoch = 1 forward pass + 1 backward pass for all available training samples), resulting in a set of updated weights. The client then returns some evaluation of how the model performed along with some indication of the updated weights, for example the difference between the weights received from the server and the updated weights. The server can then decide how to update the model to increase its performance.

Despite the advantages of the decentralised learning approaches discussed above, there are a number of limitations. For example, current frameworks are based on assumptions or decisions that may be not valid or suitable for all types of real-life deployments. As one example, the objective in federated learning is to jointly, in a distributed manner train a shared machine learning model among multiple participants/client computing devices. This assumes that the distribution of the data in all client computing devices participating is somewhat similar, that is, independent and identically distributed. In many real-life scenarios the behaviour is not that similar to make it possible to create a common shared model, and the resulting model will therefore not perform optimally.

The methods, devices and systems described in the present disclosure aim to mitigate at least some of these issues and provide improvements to decentralised machine learning.

Summary

It is an object of the invention to provide an improved alternative to the above techniques and prior art.

More specifically, it is an object of the invention to provide an improved solution for decentralised learning where data distributions of client computing devices may be very different from each other. These and other objects of the invention are achieved by means of different aspects of the invention, as defined by the independent claims. Embodiments of the invention are characterized by the dependent claims.

According to a first aspect of the invention, it is presented a method for dynamically configuring a network comprising a plurality of client computing devices configured to perform training of a plurality of machine learning models. The method is performed at a server computing device and comprises the steps of: receiving at least one parameter from each of the plurality of client computing devices; clustering the plurality of client computing devices into a plurality of groups based on the received at least one parameter, wherein each group of the plurality of groups comprises at least one client computing device sharing a machine learning model within the group; evaluating the clustering of the plurality of client computing devices into the plurality of groups based on an evaluation criterion; updating the clustering of the plurality of client computing devices into the plurality of groups dynamically to satisfy the evaluation criterion.

According to a second aspect of the invention, it is presented a server computing device for dynamically configuring a network comprising a plurality of client computing devices configured to perform training of a plurality of machine learning models. The server computing device comprises a processing circuitry and a memory. The memory contains instructions executable by the processing circuitry whereby the server computing device is operative to: receive at least one parameter from each of the plurality of client computing devices; cluster the plurality of client computing devices into a plurality of groups based on the received at least one parameter, wherein each group of the plurality of groups comprises at least one client computing device sharing a machine learning model within the group; evaluate the plurality of groups of the clustered plurality of client computing devices based on an evaluation criterion; update the plurality of groups of the clustered plurality of client computing devices to satisfy the evaluation criterion.

According to a third aspect of the invention, a computer program is provided. The computer program comprises instructions which, when executed on a processing circuitry, cause the processing circuitry to carry out the method according to an embodiment of the first aspect of the invention.

According to a fourth aspect of the invention, a computer program product is provided. The computer program product having stored thereon a computer program comprises instructions which , when executed on the processing circuitry, cause the processing circuitry to carry out the method according to an embodiment of the first aspect of the invention.

The methods of the present disclosure allow a decentralized learning structure that dynamically creates a plurality of groups of client computing devices where each group shares a common machine learning model. In this way the client computing devices with similar characters will be grouped together and benefit from learning from each other. The common machine learning model shared within a group can thereby achieve a good performance.

The present disclosure also allows such decentralised learning systems to be implemented in a telecommunications network, where computing resources are distributed from centrally located nodes in the core network all the way to the base stations in the very edge of the access network. A telecommunications network involves a large number of nodes, both virtual and physical, which is ideally suited to decentralised learning scenarios where a large number of client devices are needed to teach a machine learning algorithm to learn a task. Furthermore, the state-of-the-art approaches to geographically distributed data are typically centralized but in a Radio Access Network (RAN) context it will be expensive or even impossible to transfer the required data, and time-critical applications will not tolerate the extra latency. Some of the data collected may also be privacy sensitive, which further prohibits sending it to a central location. Also, if a node is added to the RAN, it is beneficial that the added node uses the collective wisdom of the network.

However, network characteristics, traffic and radio environment can be different between networks and even within networks. In a telecommunications network with e.g. more than 10000 base stations it is often not possible to create one shared model for a certain task among all these base stations. The performance of such a shared model will not be good enough. Meanwhile, a model trained on only local data will also not be good enough, since it is trained on a smaller data set. The present disclosure allows such decentralised learning systems to be implemented in a telecommunications network efficiently where a plurality of shared machine learning models are created.

Brief Description of the Drawings

These and other aspects, features and advantages will be apparent and elucidated from the following description of various embodiments, reference being made to the accompanying drawings, wherein: FIG. 1 shows a network for decentralised learning with clustering according to an embodiment;

FIG. 2 shows a flow chart depicting a method of dynamically configuring a network for training a plurality of machine learning models according to an embodiment;

FIG. 3 shows a flow chart depicting a method of dynamically configuring a network based on divide-and-conquer clustering according to an embodiment;

FIG. 4 shows a threshold crossing time example;

FIG. 5 shows an example of time-series from a radio access network;

FIG. 6 show an example with two different threshold crossing sequences that are clustered into two clusters;

FIG. 7 shows a flow chart depicting a method of Threshold Crossing Time Sequences (TCTS) calculations on each client computing device;

FIG. 8 shows a flow chart depicting a method of clustering based on TCTS on a server computing device.

FIG. 9 shows a flow chart depicting a method of updating the clustering of a plurality of client computing devices into a plurality of groups;

FIG. 10 shows a flow chart depicting a method of cluster optimization based on different cluster evaluating criteria on a server computing device;

FIG. 1 1 shows a schematic view of a communication system according to an embodiment; and

FIG. 12 shows an example implementation of an apparatus.

Like reference numbers refer to like elements throughout the description.

Detailed Description

The present invention will now be described more fully hereinafter. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those persons skilled in the relevant art.

With reference to FIG. 1 , a network 100 for decentralised learning is shown. The system comprises a server computing device 102 and a number of client computing devices 104a, 104a1 , 104a2, 104a3 and 104b, 104b1 , 104b2. All the client computing devices can be indicated as 104. Client computing devices 104a, 104a1 , 104a2 and 104a3 belong to cluster 1 wherein the client computing device 104a controls and/or orchestrates the process of training a machine learning model within cluster 1 . Client computing devices 104b, 104b1 and 104b2 belong to cluster 2 wherein the client computing device 104b controls and/or orchestrates the process of training a machine learning model within cluster 2. The client computing devices 104a, 104b can also be referred to as cluster nodes since they control and/or orchestrate their corresponding clusters. The client computing devices in each cluster share a machine learning model within the cluster. In some embodiments, the server computing device 102 can be considered as a root node of a global model that controls and/or orchestrates the machine learning model shared by each cluster. In each cluster, the client computing device participates in training the model and/or using the machine learning model in an inference stage, where learning is performed during real-world applications. It will be appreciated that for each cluster, not all client computing devices within the cluster will necessarily use the machine learning model. Similarly, not all client computing devices within the cluster will necessarily participate in training the model. In some embodiments, a client computing device within a cluster may both use the model and participate in training the model. While client computing devices 104a, 104a1 , 104a2, 104a3, 104b, 104b1 and 104b2 being grouped into two clusters are shown in FIG. 1 , it will be appreciated that any suitable number of client computing devices grouped into any suitable number of clusters may be present in the network 100. In some embodiments, the network 100 may be a telecommunications network, and the client computing devices 104 may be edge computing resources of the telecommunications network. In particular, the client computing devices 104 may be access nodes of the telecommunications network. The server computing device 102 may also be an access node. An access node may comprise a base station (BS) (e.g. , a radio base station, a Node B, an evolved Node B (eN B) an N R NodeB (gN B)), a radio access node (RAN) or any other node suitable for performing the proposed method as described herein.

In embodiments where all decisions about how the client computing devices should be clustered are performed in the server computing device 102, the network 100 is considered to be operating in a synchronous mode. It is the normal case since a server computing device 102 has the information of all the client computing devices 104.

The server computing device 102 and client computing devices 104 communicate with each other using a suitable communication protocol allowing them to send messages with information to each other. Examples of such a communication protocol include WebSocket, TCP/IP, HTTP, HTTP/2, etc., although it will be envisaged that any suitable communication protocol could be used. FIG. 2 shows a method 200 of dynamically configuring a network, such as network 100, comprising a plurality of client computing devices 1 04 configured to perform training of a plurality of machine learning models. In some embodiments the plurality of machine learning models are federated learning models. The network comprises a plurality of client computing devices 104 configured to perform training of the plurality of machine learning models. The method 200 allows the client computing devices 104 to be clustered into a suitable number of groups to achieve good performance of their respective machine learning models. The method is performed by a server computing device 102. In some embodiments, the server computing device 102 sends initial model parameters to the client computing devices 104, which then perform training of the model locally and send evaluation indicators along with updated model parameters back to the server computing device 102. The server computing device 102 can then aggregate the parameters and send out new, updated model parameters to the client computing devices 104 for further training. This is an example of using federated learning to train the machine learning model. In some embodiments, the client computing devices 104 may decide by themselves when to do training and when to send updates. In other embodiments the server computing device 102 may decide when the client computing devices 104 should do the training and send updates.

At step 202, the server computing device receives at least one parameter from each of the plurality of client computing devices. Which kind of parameters that should be used depend on use cases and the principles of clustering the client computing devices into groups. The present disclosure considers two main principles concerning how to cluster the client computing devices: the first of these is to use values of an evaluation metric from previous runs to create groups for next round of training and this is repeated regularly, such as after each round, or triggered by a certain event. The second principle is to use a similarity metric with low computational complexity that allows the server computing device to create the groups based on how similar the data distributions of the client computing devices are. In some embodiments, these two principles may be combined and more than one parameter may be used. In a real-life deployment, multiple parameters may be sent to the server computing device to get a more descriptive data for clustering. In some embodiments the network 100 may be a telecommunications network and at least one parameter which is related with a telecommunications network is used. In some embodiments the parameters related with a telecommunications network are extracted from Performance Management (PM) data and Configuration Management (CM) data. PM data represents metric measurements composed by different network elements as counters. These metrics may comprise of events, success rate, reset events, resource usage, signalling, etc. CM data is data relating to system configuration of network hardware and software elements on a system. In some embodiments the CM and PM data is analysed and information such as vendor, software versions, enabled features, important Key Performance Indicators (KPIs) may be used for clustering.

At step 204, the server computing device clusters the plurality of client computing devices into a plurality of groups based on the received at least one parameter, wherein each group of the plurality of groups comprises at least one client computing device sharing a machine learning model within the group. In some embodiments, the client computing devices in each group train a machine learning model based on an initial set of weights, share a set of updated weights within the group, and return the set of updated weights to the server computing device 102 directly or indirectly via the cluster node 104a, 104b. In some embodiments the server computing device may have the information for all groups. Examples of service communication used for this purpose can for instance be some sort of message bus with separate topics and publish/subscribe support. Publish/subscribe is a messaging pattern where the receivers of messages subscribe to a topic that they are interested in. When a new message in said topic is available then it is published so that each receiver can read it. In some embodiments a server computing device 102 may subscribe to a topic such as “Device updates”, to which topic each client computing device 104 publishes their respective updated weights. In some embodiments each client computing device 1 04 may subscribe to a topic such as“Global weights”, to which topic the server computing device 1 02 may publish new aggregated weights.

At step 206, the server computing device evaluates the clustering based on an evaluation criterion. In some embodiments, evaluation is performed at each client computing device based on evaluation data. At step 208, the server computing device updates the clustering of the plurality of client computing devices into the plurality of groups to satisfy the evaluation criterion.

In some embodiments, a divide-and-conquer approach is used for the clustering step 204. As shown in FIG. 3, at step 302, all client computing devices are initially within one single group. The server computing device collects the updates of a set of weights of the machine learning model at each client computing device, and the value of an evaluation metric at each client computing device. In some embodiments the evaluation metric may be based on the performance of the machine learning model. In some embodiments, the evaluation metric may be based on classification accuracy, mean squared error, Fi score (also F-score or F-measure), or logarithmic loss. At step 304, the server computing device splits the client computing devices into groups based on the value of the evaluation metric. For example, as shown at step 305, the client computing devices with good model performance, i.e. the performance value of the machine learning model is higher than a pre-defined threshold value, can be clustered into one group. The client computing devices with bad model performance, i.e. the performance value of the machine learning model is lower than a pre-defined threshold value, can be clustered into another group, as shown at step 306. Optionally the server computing device may split the client computing devices into groups based on some other definitions of the model performances indicating different ranges of evaluation metrics. The splitting should preferably be performed when the machine learning model running at each client computing device has reached a stable state. Each group should then try to create a shared machine learning model within the group. At step 308, the client computing devices grouped as having bad model performance are trained and split into two sub-groups where the client computing devices with good performance form a new sub-group as shown at step 309, and the client computing devices with bad performance form another sub-group as shown at step 310. The procedure to create new sub-groups is iteratively repeated until at least one of the following conditions is fulfilled: a number of client computing devices within any group of the plurality of groups is below a threshold number; a performance value of any of the respective machine learning models is below a threshold performance value. At the end of an iteration, some client computing devices with bad model performance are further split up and continue to form a sub-group with good model performance, as shown at step 31 1. The remaining client computing devices are left for further processing using some other methods since no good model performance can be achieved with the machine learning model currently used. Reasons for this can vary from case to case, for example can be faulty hardware, too few data sample, wrong type of machine learning model etc. In some embodiments the client computing devices are monitored at run-time. When the performance of the machine learning model degrades for certain client computing devices, the degradation will be detected and these client computing devices will be moved to another group. In some embodiments the client computing devices will be removed entirely for further processing when the degradation is due to, for example, faulty hardware.

In some embodiments the splitting of the client computing devices is not only based on the performance of the machine learning model. Other kind of data, such as CM and PM data in the case of a telecommunications network, can be analysed to find commonalities and differences between client computing devices within that group, and to trigger changes regarding to which group a client computing device should belong to. In some embodiments the server computing device receives at least one parameter comprising one or more features describing a data distribution for each of the plurality of client computing devices, and the clustering the plurality of client computing devices into the plurality of groups is based on a similarity of the received one or more features describing the data distribution. The assumption here is that the server computing device does not have access to all the data from each client computing device, and instead some aggregated information about the data distribution is sent to the server computing device. This can be for example some sort of density function like the Probability Density Function (PDF) that calculates the probability of observing a given value. In some embodiments the similarity of the received one or more features describing the data distribution may be used on its own for the clustering. In some embodiments this method is combined with other clustering method like the divide-and- conquer method described by FIG. 3. The details of using the similarity of the received one or more features describing data distribution for clustering the plurality of client computing devices are explained as following.

A full similarity analysis would require all client computing devices to send all the data to the server computing device, so that the server computing device can compare and analyse the similarities. Since it is preferred that the data of the client computing device should not leave the client computing device because of privacy consideration, the similarity analysis will be performed in a manner based on data aggregation with low computational complexity, in other word, a lightweight manner. A lightweight version of the data analysis should satisfy the following requirements: the data of a client computing device should not leave the client computing device; some descriptive property that can be used for similarity analysis should be sent from the client computing device to the server computing device; the descriptive property should be as small as possible in size to save bandwidth, also as little information as possible should be revealed about the underlying data.

In some embodiments the server computing device receives one or more features describing a data distribution comprising a plurality of Threshold Crossing Time Sequences (TCTS or TCT sequences) wherein at least one TCTS is received for each of the plurality of client computing devices. TCTS is a way to represent a time-series. Clustering the plurality of client computing devices into the plurality of groups may be based on a similarity of the received plurality of TCTS. The advantage of using TCTS similarity is that 1 ) it only requires the actual threshold crossing time to be sent, and 2) it fits for the use cases where the time-series data has some sort of seasonal pattern, such as in a telecommunications network where the data has daily patterns. The basic idea behind threshold crossing time is that one only keeps track of and compares the points in time when the curve crosses the threshold and is above the threshold (J. AbίbIV et al.,“Similarity Search on Time Series Based on Threshold Queries”, International Conference on Extending Database Technology, EDBT 2006, pp. 276-294, Springer, Heidelberg). FIG. 4 shows one example with a horizontal line representing the threshold, and a dashed line at the bottom represents the sequences in time that are above the threshold. In some embodiments, similarity metrics such as dynamic time warping and Euclidian distance may be used. Dynamic time warping allows to use different length vectors depending on the number of data points available and sampling intervals. Euclidean distance allows to incorporate features that are not strictly part of the time-series such as configuration parameters.

In a use case of a telecommunications network, base stations in a business/commercial area and base stations in a residential area may have different daily patterns regarding average number of users. An example can be shown in FIG. 5. The two curves (commercial and residential) in FIG. 5 may be separated into two different clusters since their threshold crossing pattern differs. In FIG. 6, the different threshold crossing sequences of these two curves are illustrated by the dotted lines. These dotted lines are the sequences in time when the curve is above the threshold value. The result of this will be that the two curves end up in two different groups after clustering based on this pattern.

Referring to FIG. 7, a method 700 of using TCTS for similarity analysis at each client computing device is shown. The method start-up is at step 702. At step 704, a client computing device waits until data is received. In a use case of a telecommunications network for example, 24 hours of data is typically collected and analysed before any TCTS can be calculated due to the periodical busy-hour pattern in a base station. At step 706, the client computing device needs to calculate a median of the data. This can also be a mean value or some other percentile, but in this case, a median is used as an example. This has to be done continuously when streaming time-series data as shown at step 705 enters the system, which means that some sort of running median algorithm may be used with sliding windows. At step 708, this median is set as threshold value T. At step 710, based on the threshold value T, the client computing device needs to calculate the threshold crossing time sequences, in other words, the starting and ending time for all sequences when the time-series is above the threshold T. The TCTS is for instance in the format: [(s1 , e1 ), (s2, e2), ...] where s1 , s2, ... are starting times and e1 , e2 are ending times. At step 712, each client computing device sends the calculated TCTS to a server computing device. The server computing device has a database for all TCTS of all client computing devices participating in the system. I n some embodiments, the one or more features describing a data distribution further comprise at least one of: a standard deviation , a variance, a mean value, a median , percentiles, a minimum value, or a maximum value. These values may be combined for the similarity analysis of the data distribution .

Referring to FIG. 8, a method 800 of clustering at a server computing device based on TCTS similarity analysis is shown. At step 802 the method starts. At step 804 the server computing device waits until new TCTS data is received (as shown from step 805). The server computing device collects all TCTS from the client computing devices and stores TCTS in a database together with an identifier for each client computing device. Initially it will only have one TCTS per client. This will however gradually increase as time passes by and more TCT sequences are collected. At step 806 there is a TCTS filter since the server computing device will either use all available sequences or limit the number of TCTS to the X latest sequences for each client computing device to ensure that the data is fresh enough. At step 808, the server computing device calculates a distance matrix for all TCTS and client computing devices. At step 810, the distance matrix is clustered using a clustering algorithm comprising at least one of: Density-based spatial clustering of applications with noise (DBSCAN), Hierarchical DBSCAN (HDBSCAN), k-means or hierarchical clustering. The result is X number of clusters. Hierarchical clustering may enable the solution to have similarity metrics on different levels of the hierarchy and prune the hierarchical cluster tree at different locations depending on the usage. The output from the clustering is a set of groups. A group in this context is a group of client computing devices (e.g. base stations) that have similar data based on the metric described above and will be suitable to share a machine learning model. At step 812, it is determined if the set of groups is the first clustering of the system. If the clustering is the first clustering performed for the plurality of client computing devices, at step 813, each group will then independently create a shared machine learning model within the group based on distributed learning principles. After step 813 the procedure will return to step 804 to wait for new TCTS data. If the clustering is not the first clustering performed for the plurality of client computing devices, at step 814 it is determined if the new clustering is different from a current clustering. If the new clustering is different, at step 816, it is further determined how the new clustering should affect the updating of the clustering, the details of which are described below by FIG.9. If the new clustering is the same as the current clustering, the procedure will return to step 804 to wait for new TCTS data.

Referring to FIG. 9, a method embodiment 900 of updating the clustering of the plurality of client computing devices into the plurality of groups at a server computing device is shown. At step 902, the server computing device receives at least one updated parameter from at least one of the plurality of client computing devices; at step 904, the server computing device determines if the received at least one updated parameter leads to a reassignment of the at least one of the plurality of client computing devices to a different group of the plurality of groups; if the received at least one updated parameter leads to a reassignment, at step 906, the new clustering with the reassignment is evaluated based on an evaluation criterion. The server computing device will perform one of the following steps after evaluating the new clustering with reassignment: reassign the at least one of the plurality of client computing devices as shown at step 908; mark the at least one of the client computing devices as a candidate for reassignment during a future clustering as shown at step 910; separately apply the machine learning model of a current group to which the at least one of the plurality of client devices is assigned and the machine learning model of the different group, and reassign said at least one of the plurality of client computing devices to the different group if the machine learning model of the different group has a better performance than the machine learning model of the current group, as shown at step 912. An embodiment showing how to choose between steps 908, 910 and 912 based on different evaluation criteria are further described below in FIG. 10.

In some embodiments, the received at least one parameter comprises a performance value and/or updates to weights of the respective machine learning models of the plurality of groups.

In some embodiments, the clusters are evaluated based on one of the following:

evaluating a stability of each group of the plurality of groups; evaluating for each of the plurality of client computing devices a performance of the respective machine learning models of the plurality of groups; and evaluating updates to weights of the respective machine learning models of the plurality of groups.

In some embodiments updating the clustering of the plurality of client computing devices into the plurality of groups is performed until at least one of the following conditions is fulfilled: a number of client computing devices within any group of the plurality of groups is below a threshold number; a performance value of any of the respective plurality of machine learning models is below a threshold performance value.

Referring to Fig. 10, a method embodiment 1000 shows how the new clustering with reassignment may be handled based on different evaluation criteria at a server computing device. The method starts at step 1002. At step 1004, it is determined if the new clustering with reassignment should be evaluated before making any changes. If no evaluation should be performed, the clustering will be updated directly at step 1005. If the new clustering with reassignment should be evaluated, it is determined at step 1006 if the evaluation of the new clustering with reassignment should be based on a stability of each group of the plurality of groups of the new clustering, or a performance value of the respective machine learning model of the plurality of groups. If the stability is chosen as the criterion, at step 1007, a counter for each affected client computing device reassigned to another group will be increased by one. The affected client computing devices will be marked as candidates for reassignment during a future clustering. At step 1009, it is determined if the counter value of any affected client computing device marked as candidate is greater than a pre-defined threshold representing maximum number of reassignments of that affected client computing device. If the counter has a greater value than the pre-defined threshold, the counter will be reset at step 101 1 . At the following step 1012, that specific affected client computing device being included in the new clustering should be moved to a different group. If the counter value of any affected client computing device is equal or lower than the pre-defined threshold, the clustering will not be updated with the new clustering as shown at step 1014.

At step 1006, if a performance of the machine learning model is chosen as the criterion, the performance of the machine learning model from current group and the performance of the machine learning model from the possible new group for each affected client computing device will be compared at step 1008. At the following step 1010, it is determined if the machine learning model performance for the new group is greater than the machine learning model performance of the current group. If the machine learning model performance for the new group is greater, for the specific affected client computing device, it will be moved to the new group as shown at step 1012. If the machine learning model performance for the new group is equal or lower than the machine learning model performance of the current group, the clustering will not be updated with the new clustering as shown at step 1014.

The methods discussed above allows a number of decentralised learning groups to be created and adjusted dynamically based on the one or more features of client computing devices. In this way, the client computing devices in each group can efficiently share a machine learning model with high model performance. Furthermore, only very limited data from each client computing device will be sent to the server computing devices without revealing much private information of each client computing device.

As discussed above, the network 100 may be a telecommunications network, where the client computing devices 104 are edge computing resources of the telecommunications network. In particular, the client computing devices 104 may be access nodes of the telecommunications network. The server computing device 102 may also be an access node. One benefit of using a decentralised learning approach in a telecommunications network is that access nodes such as base stations can communicate with each other to train the model, and transmission costs associated with sending data centrally are avoided.

An example communication system 1 100 is shown in FIG. 1 1 . The communication system 1 100 is a distributed system, such that parts of the system are implemented in a cloud 1 102, a fog 1 104, an edge 1 106 and a user equipment layer 1 108.

The cloud 1 102 comprises a host computer 1 1 10 implemented as a cloud-implemented server. In other embodiments, the host computer may be embodied in the hardware and/or software of a standalone server, a distributed server or as processing resources in a server farm. The host computer 1 1 10 may be under the ownership or control of a service provider, or may be operated by the service provider or on behalf of the service provider.

The host computer 1 1 10 comprises hardware configured to set up and maintain a wired or wireless connection with an interface of a different communication device of the communication system 1 100. The host computer 1 1 10 may further comprise processing circuitry, which may have storage and/or processing capabilities. In particular, the processing circuitry may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The host computer 1 1 10 further comprises software, which is stored in or accessible by the host computer 1 1 10 and executable by the processing circuitry. The software includes a host application. The host application may be operable to provide a service to a remote user, for example a user connecting via an over the top (OTT) connection. In providing the service to the remote user, the host application may provide user data which is transmitted using the OTT connection.

The fog 1 104 is implemented between the cloud 1 102 and the edge 1 106, and may comprise a core network 1 1 12. The core network 1 1 12 may be a 3GPP-type cellular network. The fog 1 104 may also comprise a fog computer 1 1 14. Connections between the host computer 1 1 10 and the core network 1 1 12 may extend directly from the host computer 1 1 10 to the core network 1 1 12 and/or the fog computer 1 1 14, or may go via an optional intermediate network (not shown). The intermediate network may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network, if any, may be a backbone network or the Internet; in particular, the intermediate network may comprise two or more sub- networks (not shown). The fog computer 1114 may be considered part of the core network 1112, or separate from the core network 1112 for example operated and handled by an entity different from the telecom network operator.

The edge 1106 comprises a number of base stations 1116a, 1116b. Base stations may also be called access nodes. The base stations may be implemented in an access network. The base stations 1116 comprise hardware enabling them to communicate with the core network 1112, and via the core network 1112 with the host computer 1110. The base stations 1116 also comprises hardware enabling them to communicate with the user equipment (UE) 1118 located in the user equipment layer 1108. Each base station 1116 is configured to set up and maintain a wired or wireless connection with an interface of a different communication device of the communication system 1100, for example a UE 1118 located in a coverage area (not shown in FIG.11) served by the base station. Each base station 1116 may also be configured to facilitate a connection to the host computer 1110. The connection may be direct or it may pass through the core network 1112 and/or through one or more intermediate networks outside the communication system 1100. Each base station 1116 may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. Each base station 1116 further has software stored internally or accessible via an external connection. The proposed methods may be implemented at the edge 1106.

The user equipment layer 1108 comprises a number of user equipment elements 1118. In FIG. 11 , a first UE 1118a is wirelessly connectable to, or configured to be paged by, a corresponding base station 1116a. A second UE 1118b, third UE 1118c and fourth UE 1118d are wirelessly connectable to a corresponding base station 1116b. While a plurality of UEs 1118 are illustrated in this example, the disclosed embodiments are equally applicable to a situation where a sole UE is in the coverage area or where a sole UE 1118 is connecting to a corresponding base station 1116.

Each UE 1118 may include a radio interface configured to set up and maintain a wireless connection with a base station 1116 serving a coverage area in which the UE 1118 is currently located. The hardware of the UE 1118 further includes processing circuitry, which may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. Each UE 1118 further comprises software, which is stored in or accessible by the UE 1118 and executable by the processing circuitry. The software may include a client application operable to provide a service to a human or non-human user via the UE 1118, with the support of the host computer 1 1 10. In the host computer 1 1 10, an executing host application may communicate with the executing client application via the OTT connection, or via other connections, terminating at the UE 1 1 18 and the host computer 1 1 10. In providing the service to the user, the client application may exchange user data (also referred to as application data, or data) with the host application. The OTT connection, or other connection, may transfer the user data. The client application may interact with the user to generate the user data that it provides. Example UEs 1 1 18 are mobile telephones, smartphones, tablets, laptops, and internet of things (loT) devices such as connected sensors, meters etc. The UEs in the present context may be, for example, permanently or temporarily mounted on equipment (containers, etc.) or a fixed structure (wall, roof, etc.,), portable, pocket-storable, hand-held, computer-comprised, wearable and/or vehicle-mounted mobile devices, just to mention a few examples. The UEs 1 1 18 are also commonly referred to as, communication devices, wireless devices, wireless terminals, mobile terminals, mobile stations, user equipment (UE), mobile telephones, cellular telephones, etc. These terms can typically be regarded as synonyms, but some of them are also in some contexts used to denote a communication device in relation to a specific telecom standard, but the latter aspect is not of importance in the present context.

FIG. 12 discloses an example implementation of an apparatus 1200, which may be configured to perform any of the methods described herein (for example, any of the methods 200, 300, 700, 800, 900, 1000). As discussed above, this may be one of the computing devices in the network 100, for example a server computing device 102 or a client computing device 104 of the network 100. The apparatus 1200 may comprise a computer program product.

The apparatus 1200 may comprise a processor, or a processing circuitry 1210, and a memory, or a memory circuitry 1220. The memory circuitry 1220 may store a computer program, comprising instructions which, when executed on the processing circuitry 1210, cause the processing circuitry to carry out any of the methods described herein.

Generally, all terms used herein are to be interpreted according to their ordinary meaning in the relevant technical field, unless a different meaning is clearly given and/or is implied from the context in which it is used. All references to a/an/the element, apparatus, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any methods disclosed herein do not have to be performed in the exact order disclosed, unless a step is explicitly described as following or preceding another step and/or where it is implicit that a step must follow or precede another step. Any feature of any of the embodiments disclosed herein may be applied to any other embodiment, wherever appropriate. Likewise, any advantage of any of the embodiments may apply to any other embodiments, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the preceding description.

Claims

1 . A method (200) for dynamically configuring a network (100) comprising a plurality of client computing devices (104) configured to perform training of a plurality of machine learning models, the method performed at a server computing device (102) and comprising:

receiving (202) at least one parameter from each of the plurality of client computing devices (104);

clustering (204) the plurality of client computing devices (104) into a plurality of groups based on the received at least one parameter, wherein each group of the plurality of groups comprises at least one client computing device sharing a machine learning model within the group;

evaluating (206) the clustering of the plurality of client computing devices (104) into the plurality of groups based on an evaluation criterion;

updating (208) the clustering of the plurality of client computing devices (104) into the plurality of groups dynamically to satisfy the evaluation criterion.

2. The method (200) of claim 1 , wherein the received (202) at least one parameter comprises one or more features describing a data distribution for each of the plurality of client computing devices (104), and the clustering (204) the plurality of client computing devices (104) into the plurality of groups is based on a similarity of the received features describing the data distribution.

3. The method (200) of claim 2, wherein the received (202) one or more features describing a data distribution comprise a plurality of threshold crossing time sequences wherein at least one threshold crossing time sequence is received for each of the plurality of client computing devices (104), and the clustering (204) the plurality of client computing devices (104) into the plurality of groups is based on a similarity of the received plurality of threshold crossing time sequences.

4. The method (200) of claim 3, wherein a running median algorithm with sliding windows is used to calculate a threshold value of each of the plurality of threshold crossing time sequences.

5. The method (200) of claim 3, wherein the received (202) one or more features describing a data distribution further comprise at least one of: a standard deviation, a variance, a mean value, a median, percentiles, a minimum value, or a maximum value.

6. The method (200) of any of claims 3 to 5, wherein the clustering (204) the plurality of client computing devices (104) into the plurality of groups comprises calculating a distance matrix for the plurality of threshold crossing time sequences, and performing a clustering algorithm comprising at least one of: Density-based spatial clustering of applications with noise (DBSCAN), Hierarchical DBSCAN (HDBSCAN), k-means or hierarchical clustering.

7. The method (200) of any of claims 1 to 6, wherein the updating (208) the clustering of the plurality of client computing devices (104) into the plurality of groups comprises: receiving at least one updated parameter from at least one of the plurality of client computing devices (104);

determining if the received at least one updated parameter leads to a reassignment of the at least one of the plurality of client computing devices (104) to a different group of the plurality of groups; and

if the received at least one updated parameter leads to a reassignment of the at least one of the plurality of computing devices to a different group, performing one of:

reassigning the at least one of the plurality of client computing devices (104);

marking the at least one of the client computing devices (104) as a candidate for reassignment during a future clustering;

separately applying the machine learning model of a current group to which the at least one of the plurality of client devices is assigned and the machine learning model of the different group, and reassigning said at least one of the plurality of client computing devices (104) to the different group if the machine learning model of the different group has a better performance than the machine learning model of the current group.

8. The method (200) of claim 1 , wherein the evaluating (206) the clustering comprises at least one of:

evaluating a stability of each group of the plurality of groups; evaluating for each of the plurality of client computing devices (104) a performance value of the respective machine learning models of the plurality of groups;

evaluating updates to weights of the respective machine learning models of the plurality of groups.

9. The method (200) of claim 1 , wherein the received (202) at least one parameter comprises a performance value and/or updates to weights of the respective machine learning models of the plurality of groups and the updating (208) the clustering of the plurality of client computing devices (104) into the plurality of groups is performed until at least one of the following conditions is fulfilled: a number of client computing devices (104) within any group of the plurality of groups is below a threshold number;

a performance value of any of the respective machine learning models is below a threshold performance value.

10. The method (200) of any of claims 1 to 9, wherein:

the network (100) comprises a telecommunications network; and the plurality of client computing devices (104) comprises a plurality of access nodes of the telecommunications network (100).

1 1 . The method (200) of claim 10, wherein the received (202) at least one parameter comprises features extracted from at least one of:

configuration management data of the telecommunications network; and performance management data of the telecommunications network.

12. The method (200) of any of claims 1 to 1 1 , wherein the plurality of machine learning models are federated learning models.

13. A server computing device (102) for dynamically configuring a network (100) comprising a plurality of client computing devices (104) configured to perform training of a plurality of machine learning models, the server computing device (102) comprising a processing circuitry (1210) and a memory (1220), the memory (1220) containing instructions executable by the processing circuitry whereby the server computing device (102) is operative to:

receive at least one parameter from each of the plurality of client computing devices (104); cluster the plurality of client computing devices (104) into a plurality of groups based on the received at least one parameter, wherein each group of the plurality of groups comprises at least one client computing device sharing a machine learning model within the group;

evaluate the plurality of groups of the clustered plurality of client computing devices (104) based on an evaluation criterion;

update the plurality of groups of the clustered plurality of client computing devices (104) to satisfy the evaluation criterion.

14. The server computing device (102) of claim 13, wherein the received at least one parameter comprises one or more features describing a data distribution for each of the plurality of client computing devices (104), and further configured to cluster the plurality of client computing devices (104) into the plurality of groups based on a similarity of the received features describing the data distribution.

15. The server computing device (102) of claim 14, wherein the received one or more features describing a data distribution comprise a plurality of threshold crossing time sequences wherein at least one threshold crossing time sequence is received for each of the plurality of client computing devices (104), and further configured to cluster the plurality of client computing devices (104) into the plurality of groups based on a similarity of the received plurality of threshold crossing time sequences.

16. The server computing device (102) of claim 15, wherein a running median algorithm with sliding windows is used to calculate a threshold value of each of the plurality of threshold crossing time sequences.

17. The server computing device (102) of claim 15, wherein the received one or more features describing a data distribution further comprise at least one of: a standard deviation, a variance, a mean value, a median, percentiles, a minimum value, or a maximum value.

18. The server computing device (102) of any of claims 15 to 17, further configured to cluster the plurality of client computing devices (104) into the plurality of groups by calculating a distance matrix for the plurality of threshold crossing time sequences, and by performing a clustering algorithm comprising at least one of: Density-based spatial clustering of applications with noise (DBSCAN), Hierarchical DBSCAN (HDBSCAN), k-means or hierarchical clustering.

19. The server computing device (102) of any of claims 13 to 18, further configured to update the plurality of groups of the clustered plurality of client computing devices (104) by: receiving at least one updated parameter from at least one of the plurality of client computing devices (104);

20. The server computing device (102) of claim 13, further configured to evaluate the plurality of groups of the clustered plurality of client computing devices (104) by:

evaluating a stability of each group of the plurality of groups;

evaluating for each of the plurality of client computing devices (104) a performance value of the respective machine learning models of the plurality of groups;

21 . The server computing device (102) of claim 13, wherein the received (202) at least one parameter comprises a performance value and/or updates to weights of the respective machine learning models of the plurality of groups and further configured to update the plurality of groups of the clustered plurality of client computing devices (104) until at least one of the following conditions is fulfilled: a number of client computing devices (104) within any group of the plurality of groups is below a threshold number;

22. The server computing device (102) of any of claims 13 to 21 , wherein:

23. The server computing device (102) of claim 22, wherein the received (202) at least one parameter comprises features extracted from at least one of:

24. The server computing device (102) of any of claims 13 to 23, wherein the server computing device (102) comprises an access node.

25. The server computing device (102) of any of claims 13 to 24, wherein the plurality of machine learning models are federated learning models.

26. A computer program, comprising instructions which, when executed on a processing circuitry (1210), cause the processing circuitry (1210) to carry out the method according to any of claims 1 to 12.

27. A computer program product having stored thereon a computer program comprising instructions which, when executed on the processing circuitry, cause the processing circuitry to carry out the method according to any of claims 1 to 12.