WO2023229501A1

WO2023229501A1 - Managing a machine learning process

Info

Publication number: WO2023229501A1
Application number: PCT/SE2022/050502
Authority: WO
Inventors: Hossein SHOKRI GHADIKOLAEI; Milad GANJALIZADEH; Johan HARALDSON
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2023-11-30

Abstract

The present disclosure relates to a computer-implemented method (100) for managing a machine learning (ML) process. The method comprises: obtaining (110) an indication of a performance of the ML process resulting from application of a first policy to an environment comprising wireless devices, wherein the first policy is for selecting a first set of the wireless devices to participate in the ML process; obtaining (120) an indication of one or more ultra reliable low latency communication (URLLC) requirements for URLLC of the wireless devices; and generating (130) a second policy for selecting a second set of the wireless devices to participate in the ML process, wherein the second policy is generated based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements. The present disclosure also relates to an orchestrator node, a system and a computer program product.

Description

MANAGING A MACHINE LEARNING PROCESS

Technical Field

The present disclosure relates to a computer-implemented method for managing a machine learning (ML) process. The present disclosure also relates to an orchestrator node, a system, and a computer program product.

Background

Cyber-physical systems (CPSs) are engineered systems that integrate computation and communication with physical processes. CPSs have been implemented in many applications, such as smart grid applications and factory automation.

Within CPSs, the continuous interactions with the physical world put strict reliability requirements on the underlying communication system. The communication reliability requirements for CPS control are defined in the Third Generation Partnership Project (3GPP) Technical Specification (TS) 22.104. Failing to fulfil the communication reliability requirements may result in faulty system behaviours, cause economic losses, or even put users of the CPS in danger. The fifth generation (5G) of wireless networks and its extensions have the capability to satisfy the stringent communication requirements of a CPS using ultra-reliable low-latency communication (URLLC).

Machine Learning (ML) and Artificial Intelligence (Al) processes have been deployed in many networks that use URLLC, including CPSs. Distributed ML processes, such as federated learning or split learning, are examples of such ML and Al processes. In distributed ML processes, a set of devices communicate their local parameters to an orchestrating server, or a set of orchestrating servers on an uplink channel. The orchestrating server, or a set of servers may be referred to as an ‘orchestrator’ or as a ‘network orchestrator node’. Following the receipt of the local parameters, the orchestrator maintains a global parameter, and shares this global parameter, with the set of devices on a downlink channel. A problem with distributed ML processes is that a large portion of the uplink messages sent from the devices to the orchestrator are redundant. Many of these messages contain almost no new useful information of the device’s local parameters, since this information can be retrieved from previously communicated messages from a given device, as well as from previously communicated messages from other devices. This can lead to network congestion, reduced throughput, poor latency and unnecessary energy expenditure.

There are some existing techniques that aim to eliminate redundant communications based on gradient information of the distributed ML process. However, these existing techniques are only valid for a distributed gradient descent class of algorithms. Thus, the existing techniques are not applicable in most practical scenarios, which use minibatch gradients or use other classes of algorithms, such as federated learning.

Summary

It is thus an object of the present disclosure to obviate or eliminate at least some of the above-described disadvantages associated with existing techniques.

Therefore, according to a first aspect of the present disclosure, there is provided a computer-implemented method for managing a machine learning (ML) process. The method comprises: obtaining an indication of a performance of the ML process resulting from application of a first policy to an environment comprising wireless devices, wherein the first policy is for selecting a first set of the wireless devices to participate in the ML process; obtaining an indication of one or more ultra reliable low latency communication (URLLC) requirements for URLLC of the wireless devices; and generating a second policy for selecting a second set of the wireless devices to participate in the ML process, wherein the second policy is generated based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements.

According to a second aspect of the present disclosure, there is provided an orchestrator node for managing a ML process. The orchestrator node comprises processing circuitry configured to: obtain an indication of a performance of the ML process resulting from application of a first policy to an environment comprising wireless devices, wherein the first policy is for selecting a first set of the wireless devices to participate in the ML process; obtain an indication of one or more URLLC requirements for URLLC of the wireless devices; and generate a second policy for selecting a second set of the wireless devices to participate in the ML process, wherein the second policy is generated based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements.

According to a third aspect of the present disclosure, there is provided a system for managing a ML process. The system comprises an environment comprising wireless devices. The system further comprises an orchestrator node comprising processing circuitry configured to: obtain an indication of a performance of the ML process resulting from application of a first policy to the environment, wherein the first policy is for selecting a first set of the wireless devices to participate in the ML process; obtain an indication of one or more URLLC requirements for URLLC of the wireless devices and generate a second policy for selecting a second set of the wireless devices to participate in the ML process, wherein the second policy is generated based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements.

According to a fourth aspect of the present disclosure, there is provided a computer program product comprising a computer readable medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform a method according to the first aspect.

Controlling which wireless device(s) of a set of wireless devices in an environment are to participate in a ML process (such as by sending their local parameters to the orchestrator node) and optionally also controlling a remainder of the wireless devices to remain silent can reduce network congestion, increase throughput, improve latency and lead to energy savings.

Brief description of the drawings

For a better understanding of the technique, and to show how it may be put into effect, reference will now be made, by way of example, to the accompanying drawings, in which:

Figure 1 is a flowchart illustrating an example of a computer-implemented method;

Figure 2 is a diagram illustrating an example of a system; Figure 3 is a diagram illustrating another example of a system;

Figure 4 is a bar chart illustrating an example of results;

Figure 5 is a graph illustrating another example of results;

Figures 6a and 6b are graphs illustrating another example of results;

Figures 7a and 7b are graphs illustrating another example of results;

Figure 8 is a block diagram illustrating an example of an orchestrator node.

Detailed Description

Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Other embodiments, however, are contained within the scope of the subject-matter disclosed herein, the disclosed subject-matter should not be construed as limited to only the embodiments set forth herein; rather, these embodiments are provided by way of example to convey the scope of the subject-matter to those skilled in the art.

As described above, many wireless network environments, such as CPSs, use URLLC to meet the strict communication reliability requirements of the wireless network environment. Due to interference, vibrations and the presence of many metallic objects in environments, such as a CPS environment, the support for real-time control via URLLC wireless communication in a CPS environment is a challenging task.

A key performance indictor (KPI) for URLLC communication availability and reliability is assessed according to the application layer availability a_ij. As defined in the 3GPP TS 22.104, the application layer availability a_ij is the percentage of the amount of time the end-to-end communication service is delivered according to an agreed quality of service (QoS) as observed by the application layer, divided by the amount of time the system is expected to deliver the end-to-end service according to the specification in a specific area. In some examples, the application can tolerate consecutive packet losses within the duration of the so-called ‘survival time’ and still be considered available. According to examples of the present disclosure, the term ‘packet loss’ refers to an event in which a protocol data unit (PDU) is not successfully delivered within a specified deadline to the target PDU layer (e.g. user plane function).

URLLC performance KPIs, for example, application layer availability a_ij, and application layer reliability Ʈ_ij, are a function of allocated bandwidth and power to a URLLC slice. The higher the bandwidth and power allocated to the URLLC slice, the more likely the underlying communication service can meet the URLLC requirements. Reducing the transmission power may reduce the Signal to Interference and Noise Ratio (SINR) for devices of the wireless network, which negatively impacts packet reception reliability. However, this can reduce the likelihood of the network environment meeting URLLC requirements. Moreover, transmission power reduction enforces a gNodeB (gNB) scheduler of the wireless network to use a lower code rate to compensate for the target block error rate, which results in higher latency for packet transmission. However, this can again reduce the likelihood of the network environment meeting URLLC requirements. Reduction in the available bandwidth also has a significant role on the QoS level of the devices of a wireless network using URLLC, which again may reduce the likelihood of the network environment meeting URLLC requirements. As such, the power and bandwidth allocation for a URLLC slice affects the ability of the wireless devices of a network environment meeting the required URLLC requirements.

As described above, many wireless network environments using URLLC, such as CPSs, also use distributed ML processes. In these wireless network environments, the bandwidth and power are shared between the URLLC slice and the distributed ML process. As such, enabling the wireless network environment to meet the URLLC requirements, as well as allocating URLLC power and bandwidth to support a distributed ML process, can be a challenging task with limited URLLC bandwidth and power. This task is compounded by the additional challenge that the distributed ML process can result in redundant signalling, as described above, which can negatively impact the URLLC performance of the wireless network environment in meeting URLLC requirements, as well as the performance of the ML process.

The present disclosure thus presents a solution for managing a ML process. The solution can be applied to an environment. The environment referred to herein can be any environment that comprises wireless devices. For example, the environment referred to herein can be a network, which may also be referred to as a network environment. In some of these examples, the network can be a telecommunications network. In some examples, the network can be a mobile network, such as a fourth generation (4G) mobile network, a fifth generation (5G) mobile network, a sixth generation (6G) mobile network, or any other generation mobile network. In some examples, the network can be a radio access network (RAN). In some examples, the network can be a local network, such as a local area network (LAN). In some examples, the network may be a content delivery network (CDN). In some examples, the network may be a software defined network (SDN). In some examples, the environment referred to herein can be a fog computing environment or an edge computing environment. In some examples, the environment referred to herein can be a virtual environment or an at least partially virtual environment. In some examples, the environment referred to herein can be a cyber-physical systems (CPSs) environment.

The environment referred to herein comprises wireless devices. As used herein, the term wireless device can refer to a device capable, configured, arranged and/or operable to communicate wirelessly with one or more network nodes and/or one or more other wireless devices. Communicating wirelessly may involve transmitting and/or receiving wireless signals, such as by using electromagnetic waves, radio waves, infrared waves, and/or other types of signals suitable for conveying information through air. In some examples, a wireless device may be configured to transmit and/or receive information without direct human interaction. For instance, a wireless device may be designed to transmit information to the environment (e.g. a network) on a predetermined schedule, when triggered by an internal or external event, or in response to requests from the environment (e.g. network). In some examples, a wireless device may be registered to and/or operated by a user. Thus, unless otherwise noted, the term wireless device can be used interchangeably herein with user equipment (UE).

Examples of a wireless device as referred to herein include, but are not limited to, a smart phone, a mobile phone, a cell phone, a voice over IP (VoIP) phone, a wireless local loop phone, a desktop computer, a personal digital assistant (PDA), a wireless camera, a gaming console or device, a music storage device, a playback appliance, a wearable terminal device, a wireless endpoint, a mobile station, a tablet, a laptop, a laptop-embedded equipment (LEE), a laptop-mounted equipment (LME), a smart device, a wireless customer-premise equipment (CPE), a vehicle-mounted wireless terminal device, a wireless robot, a wireless robot controller, etc. A wireless device as referred to herein may support device-to-device (D2D) communication, for example, by implementing a third generation partnership project (3GPP) standard for sidelink communication, vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), vehicle-to- everything (V2X) and may in this case be referred to as a D2D communication device.

As yet another specific example, in an Internet of Things (loT) scenario, a wireless device as referred to herein may represent a machine or other device that performs monitoring and/or measurements, and transmits the results of such monitoring and/or measurements to another wireless device and/or a network node. A wireless device as referred to herein may in this case be a machine-to-machine (M2M) device, which may in a 3GPP context be referred to as a machine type communication (MTC) device. As one particular example, a wireless device as referred to herein may be a user equipment (UE) implementing the 3GPP narrow band internet of things (NB-loT) standard. Particular examples of such machines or devices are sensors, metering devices such as power meters, industrial machinery, or home or personal appliances (e.g. refrigerators, televisions, etc), personal wearables (e.g. watches, fitness trackers, etc). In other scenarios, a wireless device as referred to herein may represent a vehicle or other equipment that is capable of monitoring and/or reporting on its operational status or other functions associated with its operation. A wireless device as referred to herein may represent the endpoint of a wireless connection, in which case the wireless device may be referred to as a wireless terminal. Furthermore, a wireless device as referred to herein may be mobile, in which case it may be referred to as a mobile device or a mobile terminal.

The environment referred to herein can also comprise at least one network node according to some embodiments. As used herein, network node can refer to equipment capable, configured, arranged and/or operable to communicate directly or indirectly with at least one wireless devices as referred to herein and/or with one or more other network nodes or equipment in the environment, such as to enable and/or provide wireless access to the at least one wireless device and/or to perform other functions (e.g. administration) in the environment. Examples of a network node as referred to herein include, but are not limited to, an access points (AP), such as a radio access point, or a base station (BS), such as a radio base station (e.g. a Node B, an evolved Node B (eNB), or an NR NodeB (gNB). A base station may be categorised based on the amount of coverage it provides (or, stated differently, its transmit power level) and may then also be referred to as a femto base station, a pico base station, a micro base station, or a macro base station. A base station may be a relay node or a relay donor node controlling a relay. A network node as referred to herein may include one or more (or all) parts of a distributed radio base station, such as centralised digital units and/or remote radio units (RRUs), sometimes referred to as Remote Radio Heads (RRHs). Such remote radio units may or may not be integrated with an antenna as an antenna integrated radio. Parts of a distributed radio base station may also be referred to as nodes in a distributed antenna system (DAS).

Yet further examples of a network node as referred to herein can include multi-standard radio (MSR) equipment such as MSR BSs, network controllers such as radio network controllers (RNCs) or base station controllers (BSCs), base transceiver stations (BTSs), transmission points, transmission nodes, multicell/multicast coordination entities (MCEs), core network nodes (e.g. mobile switching centers (MSCs), mobile management entities (MMEs)), operation and maintenance (O&M) nodes, operation support system (OSS) nodes, self-optimised network (SON) nodes, positioning nodes (e.g. evolved serving mobile location centers (E-SMLCs)), and/or minimization of drive tests (MDTs). A network node as referred to herein may be a virtual network node. More generally, however, a network node as referred to herein may represent any suitable device (or group of devices) capable, configured, arranged, and/or operable to enable and/or provide at least one wireless device with access to the environment (e.g. a wireless network) or to provide some service to at least one wireless device that has accessed the environment (e.g. the wireless network).

Examples according to the present disclosure present a solution for managing a ML process where an orchestrator of an environment selects a set of wireless devices of the environment to participate in a ML process based on an assessment of the performance of an ML process, and additionally based on URLLC requirements for the environment. In this way, the orchestrator can select devices to participate in the ML process, which can not only improve the performance of the ML process in terms of, for example, training delay, but can also improve the URLLC performance of the environment and enable the devices of the environment to meet the URLLC requirements, and thereby maintain a secure communication throughout the environment. Additionally, the solution can be used to control the allocated bandwidth and power of the URLLC slice for the environment. In this way, the solution can thus lead to more efficient URLLC power and bandwidth allocation for the environment to meet URLLC requirements.

As will be described in more detail below, in one example, the solution may be based on a Reinforcement Learning (RL) approach for selecting the devices of the environment to participate in the distributed ML process. The RL approach may also be capable of managing the power and bandwidth control for the URLLC slice. The RL approach aims to maximize the ML performance, for example, in terms of reduced ML training delay and reduced ML training loss, whilst maintaining a required level of availability and reliability for the URLLC slice to meet URLLC requirements.

Figure 1 is a flowchart illustrating process steps in a computer-implemented method 100 for managing a machine learning (ML) process. An orchestrator node (e.g. processing circuitry of the orchestrator node) can be configured to operate in accordance with the method 100 illustrated in Figure 1. In some examples, the orchestrator node referred to herein may also be referred to as a network orchestrator.

The method 100 comprises, in step 110, obtaining an indication of a performance of the ML process resulting from application of a first policy to an environment comprising wireless devices, wherein the first policy is for selecting a first set of the wireless devices to participate in the ML process. In some examples, the environment may comprise a CPS environment comprising wireless devices, e.g. including wireless industrial devices, or any other environment, such as any of those mentioned earlier. In some examples, the method 100 may therefore comprise obtaining the first policy and applying the first policy to the environment. For example, the first policy may be obtained from a memory.

In some examples, the indication of the performance of the ML process may comprise an indication of one or more ML performance parameters. In some examples, the one or more ML performance parameters may comprise at least one of: a ML training delay; a ML training loss; a drift of a local model of each of the wireless devices with respect to a global model and a last time a base station of the environment received ML data from each of the wireless devices. In some examples, the method 100 may further comprise obtaining the first policy and applying the first policy to the environment. The first policy may therefore, in some examples, be used to control a first set of the wireless devices of the environment to participate in the ML process. In some examples, the first set of the wireless devices may comprise a sub-set of the wireless devices of the environment, which may be controlled to participate in the ML process and the remainder of the wireless devices of the environment may be controlled to not participate in the ML process.

The method 100 further comprises, in step 120, obtaining an indication of one or more URLLC requirements for URLLC of the wireless devices. In some examples, the indication of the one or more URLLC requirements may comprise at least one of an indication of URLLC availability of the wireless devices and an indication of URLLC reliability of the wireless devices.

The method 100 further comprises, in step 130, generating a second policy for selecting a second set of the wireless devices to participate in the ML process, wherein the second policy is generated based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements. In some examples, the method 100 may further comprise applying the second policy to the environment comprising instructing the second set of wireless devices to participate in the ML process. The second policy may therefore, in some examples, be used to control a second set of the wireless devices of the environment to participate in the ML process. In some examples, the second set of the wireless devices may comprise a sub-set of the wireless devices of the environment, which may be controlled to participate in the ML process and the remainder of the wireless devices of the environment may be controlled to not participate in the ML process. In other examples, the second policy and the first policy may further dictate other variables of the ML process such as training algorithm hyperparameters, diversity methods and routing from the wireless devices.

In some examples, the first set of wireless devices and the second set of wireless devices referred to herein may, at least partially, overlap. Thus, in some examples, one or more wireless devices of the environment may be comprised in the first set of the wireless devices and comprised in the second set of the wireless devices. In some examples, the method 100 may further comprise obtaining a state of each of the wireless devices and step 130 of generating the second policy may be further based on the state of each of the wireless devices. For example, the state of each of the wireless devices may comprise at least one of: an indication of a channel condition of each wireless device of the environment; and an indication of the load condition between a network node (such as a base station, e.g. gNB) of the environment and a wireless device of the environment.

In some examples, applying the second policy to the environment may comprise: outputting the second policy to a planner unit, decoding the second policy, using the planner unit and instructing the second set of wireless devices to participate in the ML process based on the decoded second policy. In some examples, instructing the second set of devices to participate in the ML process may comprise transmitting instructions to a virtual planner implemented on each of the second set of wireless devices.

In some examples, the first policy may comprise an initial policy for generating a learned policy for selecting a set of wireless devices to participate in the ML process and the second policy may comprise the learned policy. As described in greater detail below, in such examples, there may be an exploration phase in which a policy is learned for selecting devices to participate in the ML process. In such examples, an initial policy may first be applied to the environment to select a first set of wireless devices to participate in the ML process and, based on the resulting indication of the performance of the ML process and based on the indication of the one or more URLLC requirements, a second policy may be learned to apply to the environment in a subsequent training iteration. In such examples, the first policy may for example be obtained from a memory located either at the orchestrator node or elsewhere. In such examples, the first policy may comprise a randomly generated policy which selects a random set of wireless devices to participate in the ML process.

In other examples, the first policy may comprise a learned policy and generating the second policy may comprise updating the learned policy. As described in greater detail below, in such examples, there may be an exploitation phase in which a learned policy is applied to the environment. A learned policy can be a policy that has already been learnt from at least one previous training iteration, following application of a previous policy for selecting a set of wireless devices of the environment to participate in the ML process. For example, the learned policy may have been learnt based on the resulting rewards obtained from the environment following the application of the previous policy to the environment, as will be described in more detail below. In such examples, the first policy may thus comprise an already learned policy, which may be applied to the environment. Based on the resulting indication of the performance of the ML process and based on the indication of the one or more URLLC requirements, the learned policy may be updated.

In some examples, the method 100 may further comprise obtaining an indication of one or more URLLC performance parameters indicating an URLLC communication performance of each of the wireless devices. The step 130 of generating the second policy may thus be further based on the indication of the one or more URLLC performance parameters.

In some examples, the method 100 may further comprise comparing the indication of the one or more URLLC performance parameters to the indication of the one or more URLLC requirements and generating a URLLC assessment based on the comparison of the indication of the one or more URLLC performance parameters to the indication of the one or more URLLC requirements. As will be described in more detail below, in some examples, the URLLC assessment may comprise a value of a URLLC performance reward function, where the URLLC reward function may indicate a URLLC performance of the wireless devices of the environment. In such examples, the step 130 of generating the second policy based on the indication of the one or more URLLC requirements and based on the indication of the one or more URLLC performance parameters may comprise generating the second policy based on the URLLC assessment.

In some examples, the method 100 may further comprise obtaining the one or more URLLC performance parameters and translating the one or more URLLC performance parameters into one or more URLLC key performance indicators (KPIs) of the wireless devices. As will be described in more detail below, URLLC performance parameters, such as a packet error ratio of each of the wireless devices and a mean down time of each of the wireless devices, may be translated into URLLC KPIs, such as URLLC availability and URLLC reliability, of the wireless devices of the environment as a whole. In such examples, comparing the indication of the one or more URLLC performance parameters to the indication of the one or more URLLC requirements may comprise comparing the one or more URLLC KPIs to the indication of the one or more URLLC requirements, and generating the URLLC assessment may be based on the comparison of the one or more URLLC KPIs to the indication of the one or more URLLC requirements. In such examples, the one or more URLLC requirements and the one or more URLLC KPIs comprise at least one of an indication of URLLC availability of the wireless devices and an indication of URLLC reliability of the wireless devices.

In some examples, translating the one or more URLLC performance parameters into the one or more URLLC KPIs may comprise estimating the one or more URLLC KPIs based on the one or more URLLC performance parameters. In such examples, the one or more URLLC performance parameters may comprise at least one of: a URLLC availability of each of the wireless devices; and a URLLC reliability of each of the wireless devices. In such examples, an orchestrator may thus obtain the URLLC availability and/or the URLLC reliability of each of the wireless devices and estimate one or more URLLC KPIs for the wireless devices of the environment as a whole, based on the URLLC availability and/or the URLLC reliability of each of the wireless devices. In such examples, the URLLC availability and/or the URLLC reliability of each of the wireless devices may be measured by each wireless device individually and these measurement may subsequently be obtained by the orchestrator.

In other examples, translating the one or more URLLC performance parameters into the one or more URLLC KPIs may comprise approximating the one or more URLLC KPIs based on one or more URLLC performance parameters. In such examples, the one or more URLLC performance parameters comprise at least one of: a packet error ratio of each of the wireless devices and a mean down time of each of the wireless devices. In some examples, the one or more URLLC performance parameters further comprise at least one of: a signal to interference and noise ratio, SINR, of each of the wireless devices; a downlink (DL) transmission delay of each of the wireless devices; an uplink (UL) transmission delay of each of the wireless devices; and a path gain of each of the wireless devices. As will be described in more detail below, the URLLC KPIs of the wireless devices of the environment as a whole may be approximated based on the one or more URLLC performance parameters, such as the packet error ratio, of each of the wireless devices. In some examples, the method 100 may further comprise obtaining the one or more ML performance parameters from the environment and generating an ML performance assessment based on the one or more ML performance parameters. The indication of the one or more ML performance parameters may thus comprise the ML performance assessment. In such examples, the ML performance assessment may comprise the value of a ML performance reward function obtained based on the one or more ML performance parameters.

Thus, in some examples, the step 130 of generating the second policy may comprise a using a Reinforcement Learning (RL) process to generate the policy. The RL process may thus comprise: obtaining a ML performance value of a ML performance reward function based on the one or more ML performance parameters, wherein the ML performance assessment comprises the value of the ML performance reward function and obtaining a URLLC performance value of a URLLC performance reward function based on the one or more URLLC KPIs and the indication of the one or more URLLC requirements, wherein the URLLC assessment comprises the value of the URLLC performance reward function. Generating the second policy may thus comprise generating the second policy based on the ML performance value and the URLLC performance value.

In some examples, the method 100 may further comprise generating a URLLC policy for controlling a power and a bandwidth allocation of URLLC for the wireless devices based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements. The method 100 may thus further comprise, in some examples, controlling the power and the bandwidth allocation of URLLC for the wireless devices based on the URLLC policy. As described above, the power and bandwidth allocation for a URLLC slice may be controlled based on URLLC requirements, and additionally, the performance of the ML process from selecting a first set of wireless devices to participate in the ML process.

In some examples, controlling the power and the bandwidth allocation of URLLC for the wireless devices based on the URLLC policy may comprise outputting a request to a URLLC network management function to allocate the power and the bandwidth allocation of URLLC for the wireless devices based on the URLLC policy. In such examples, the method 100 may thus further comprise receiving, from the URLLC network management function, a response accepting the request and responsive to receiving the response, controlling the power and the bandwidth allocation of URLLC for the wireless devices based on the URLLC policy.

Figure 1 discussed above provides an overview of the method 100 which may be performed according to some examples of the present disclosure. There now follows a more detailed discussion of how different process steps of the method 100 discussed above may be implemented.

Figure 2 is a diagram illustrating an example of a system 200. The system 200 comprises an orchestrator 210, which may also be referred to herein as an orchestrator node. As described above, the orchestrator 210 may comprise one or more orchestrating servers. Orchestrator 210 may thus be operable to implement the process steps of the method 100 described above with reference to Figure 1 .

An environment 230 comprises a plurality of wireless devices and, in some examples, may be a CPS environment comprising a plurality of industrial wireless devices. The orchestrator 210 is configured to receive one or more URLLC requirements 201 for the environment 230 from an application function 250. As described above, the one or more URLLC requirements 201 can comprise an indication of URLLC availability of the wireless devices of the environment 230 and an indication of URLLC reliability of the wireless devices of the environment 230.

The orchestrator 210 is further configured to receive an indication of a performance of a ML process from a data processing unit (DPU) 240. The indication of the performance of the ML process may comprise an Al reward (or reward function) 202 computed by DPU 240. In examples, according to the present disclosure, the Al reward (or reward function) 202 may alternatively be referred to as an ML reward 202. As will be described in more detail below, DPU 240 may be configured to use a Reinforcement Learning (RL) process to generate the ML reward 202 indicating a performance of the ML process. The ML reward 202 may thus be obtained from application of a first policy to the environment 230 for selecting a first set of the wireless devices of the environment 230 to participate in the ML process. The ML reward 202 referred to herein can thus be indicative of a measure of a performance of ML process resulting from application of the first policy to the environment 230, i.e. a measure of the performance of the ML process resulting from the first set of the wireless devices of the environment 230 participating in the ML process. As such, the ML reward 202 may also be referred to as a performance measure or a performance metric. In some examples, the ML reward 202 can be a value. In some of these examples, the higher the value of the ML reward 202, the better the performance of the ML process and the lower the value of the ML reward 202, the worse the performance of the ML process. The aim can be to maximise the performance of the ML process.

The orchestrator 210 may thus be configured to generate a second policy 204 for selecting a second set of the wireless devices of the environment 230 to participate in the ML process based on the ML reward 202 and the one or more URLLC requirements 201.

The orchestrator 210 can be further configured to receive a state 203 of each of the wireless devices of the environment 230. In some examples, the state 203 of each of the wireless devices of the environment 230 may comprise at least one of: an indication of a channel condition of each device of the environment 230 and an indication of the load condition between a network node (such as a base station, e.g. gNB) of the environment 230 and a wireless device of the environment 230. In some examples, the orchestrator 210 may thus be configured to generate the second policy 204 further based on the state 203 of each of the wireless devices.

The orchestrator 210 may be configured to apply the second policy 204 to the environment 230 to instruct a second set of wireless devices of the environment 230 to participate in the ML process. The orchestrator 210 may be configured to output the second policy 204 to a planner unit 220, which is configured to apply the second policy to the second set wireless devices of the environment 230. For example, planner unit 220 may receive the second policy 204 from the orchestrator 210 and decode the second policy 204 into a format for application to a second set wireless devices of the environment 230 and subsequently instruct the second set of wireless devices to participate in the ML process based on the decoded second policy. In some examples, instructing the second set of wireless devices of the environment 230 to participate in the ML process may comprise transmitting instructions to a virtual planner implemented on each of the second set of wireless devices of the environment 230 to control the second set of wireless devices to participate in the ML process.

Following application of the second policy 204 to the environment 230, the DPU 240 can obtain observations 205 from the environment 230 resulting from the participation of the second set of wireless devices of the environment 230 in the ML process. In some examples, the observations 205 may comprise one or more URLLC performance parameters from the wireless devices of the environment 230 such as one or more URLLC performance parameters which explicitly impact the URLLC performance of the wireless devices of the environment 230 and/or one or more URLLC performance parameters which implicitly impact the URLLC performance of the wireless devices of the environment 230.

As will be described in greater detail below, the one or more URLLC performance parameters may be translated into one or more URLLC KPIs. The one or more URLLC KPIs may be compared to the one or more URLLC requirements 201 to generate a URLLC assessment of the URLLC performance of the environment 230. Based on said URLLC assessment, the orchestrator 210 may thus be configured to generate the second policy 204.

In some examples, the observations 205 may further comprise one or more ML performance parameters resulting from the participation of the second set of wireless devices of the environment 230 in the ML process. The one or more ML performance parameters may thus result from one iteration of the ML process. In some examples, the one or more ML performance parameters may comprise at least one of a ML training delay; a ML training loss; a drift of a local model of each of the wireless devices with respect to a global model; and a last time a base station of the environment 230 received ML data from each of the wireless devices. In some examples, the one or more ML performance parameters comprised in the observations 205 may further comprise the training iteration of the ML process.

The DPU 240 may be configured to obtain the observations 205 from the environment 230. The DPU 240 may be configured to obtain one or more ML performance parameters from the environment 230, which form part of the observations 205. The DPU 240 may further be configured to generate the ML reward 202 based on the one or more ML performance parameters comprised in the observations 205. The DPU 240 may use a RL process to obtain the ML reward 202, which may comprise obtaining a ML performance value of a ML performance reward function based on the one or more ML performance parameters, where the ML performance value comprises the ML reward 202. In some examples, the ML reward 202 may be referred to as a ML performance assessment.

The orchestrator 210 can be further configured to generate a URLLC policy 206 for controlling a power and a bandwidth allocation of URLLC for the wireless devices of the environment 230 based on the ML reward 202 and the one or more URLLC requirements 201. As will be described in greater detail below, the orchestrator 210 may further be configured to generate the URLLC policy 206 based on a URLLC assessment, which as described above, may be determined based on the one or more URLLC performance parameters comprised in the observations 205.

The orchestrator 210 may thus be configured to control the power and a bandwidth allocation of URLLC (e.g. a URLLC slice) for the wireless devices of the environment 230 based on the URLLC policy 206. For example, the orchestrator 210 may output a request to a management function (such as a network management function (NMF) or, more specifically, a network slice subnet management function (NSSMF), e.g. of an access network) 260 to allocate the power and the bandwidth allocation of URLLC for the wireless devices of the environment 230 based on the URLLC policy 206. The management function 260 may accept the request and transmit a response to the orchestrator 210 accepting the request. Responsive to receiving the response from the management function 260, the orchestrator 210 may thus control the power and bandwidth allocation of URLLC (e.g. a URLLC slice) for the environment 230 based on the URLLC policy 206.

In some examples, actions associated with the URLLC policy 206 may comprise at least one of: setting the transmission power of resource blocks allocated to the URLLC (e.g. URLLC slice); setting the transmission power of resource blocks allocated to the ML process (e.g. a ML slice for the ML process); setting the bandwidth allocated to the URLLC (e.g. URLLC slice); and setting the bandwidth allocated to the ML process (e.g. ML slice for the ML process). Herein, a URLLC slice can refer to a slice (or segment or part) of the environment 230 that is dedicated to URLLC.

Figure 3 illustrates another example of a system 300. The system 300 comprises elements in common with system 200 described above with reference to Figure 2, where said common elements are denoted with corresponding reference numerals and may operate with the same functionality as described above.

The system 300 comprises an orchestrator 210, which further comprises a RL orchestrator (RLO) 212, which may be configured to implement a RL agent. As one skilled in the art will be familiar with, in RL applications, an RL agent is given a state of an environment 230 and produces an action for execution in that environment 230. This transitions the environment 230 to a new state and the agent receives a reward for the action dictated by a reward function. The reward is often a scalar quantity, which is provided to the agent and determined on the basis of the new state that the environment 230 has been transitioned to by execution of the selected action. A new state associated with a positive outcome may result in a higher reward whereas a new state associated with a negative outcome may result in a lower reward.

The determination of whether an outcome is positive or negative may be based on assessing key performance indicators (KPIs) relevant to the environment 230. Through trial-and-error, the RL agent strives to learn the optimal policy for every state in the environment 230, meaning the action that yields the highest reward. Formally, this means that RL is parametrized by a state, action and reward.

The environment 230 may comprise wireless devices 232, network nodes (such as base stations, e.g. gNBs) 234 and an Al master node 236. The state of the environment 230 may be represented by the performance of wireless devices 232, network nodes 234 and an Al master node 236 and the quality of the communication between these entities. The second policy 204 and the URLLC policy 206 may dictate the actions that are applied to the environment 230 by the RL orchestrator 212, for example, by selecting the set of wireless devices 232 to participate in the ML process and optionally also allocating the bandwidth and power for the URLLC (e.g. URLLC slice). Based on the actions applied to the environment 230, an ML reward 202 may be obtained and, as will be described in more detail below, a URLLC reward 209 may also be obtained. Based on the ML reward 202 and the URLLC reward 209, the RL orchestrator 212 may generate a new policy for selecting a set of wireless devices of the environment 230 to participate in the ML process. Additionally, based on the ML reward 202 and the URLLC reward 209, the RL orchestrator 212 may generate a new URLLC policy for controlling a power and a bandwidth allocation of URLLC for the wireless devices 232 of the environment 230. In some examples, generating the new policies may comprise updating learned policies.

The orchestrator 210 can further comprise a URLLC manager 214 configured to generate a URLLC reward 209. The URLLC manager 214 may be configured to receive the one or more URLLC requirements 201 , e.g. from the application function 250. The one or more URLLC requirements 201 may comprise a URLLC reliability 207 and/or a URLLC availability 208. The URLLC manager 214 may be further configured to receive URLLC KPIs indicating an URLLC communication performance of the wireless devices 232. The URLLC KPIs can be output by a translation entity 310, as will be described in more detail below. Said URLLC KPIs may comprise URLLC reliability 207 and/or URLLC availability 208 of the wireless devices 232. URLLC manager 214 may be configured to obtain a URLLC reward 209 as a value of a URLLC performance reward function based on the URLLC KPIs and the one or more URLLC requirements 201 .

In some examples, the URLLC reward may be referred to as a ‘URLLC assessment’ and the one or more estimated URLLC KPIs may be referred to as ‘an indication of one or more URLLC performance parameters indicating an URLLC communication performance of each of the wireless devices’ 232. In such examples, the URLLC manager 214 may generate a URLLC assessment based on comparing an indication of the one or more URLLC requirements 201 to an indication of the one or more URLLC performance parameters.

The orchestrator 210 can further comprise a translation entity 310, which can be configured to receive one or more URLLC performance parameters of the wireless devices 232, e.g. from the DPU 240. As described above, the one or more URLLC performance parameters may be obtained by the DPU 240 in the observations 205 obtained from the environment 230.

As described above, the one or more URLLC performance parameters may comprise one or more URLLC performance parameters which explicitly impact the URLLC performance of the wireless devices 232 and/or one or more URLLC performance parameters which implicitly impact the URLLC performance of the wireless devices 232. In some examples, the one or more URLLC performance parameters which explicitly impact the URLLC performance of the wireless devices of the environment 230 may comprise at least one of: a packet error ratio of each of the wireless devices; and a mean down time of each of the wireless devices. In some examples, the mean down time of each of the wireless devices may comprise the mean time that the packets from a wireless device have been lost consecutively on the network layer. In some examples, the one or more URLLC performance parameters which implicitly impact the URLLC performance of the wireless devices of the environment 230 may comprise at least one of: a signal to interference and noise ratio (SINR) of each of the wireless devices; a downlink (DL) transmission delay of each of the wireless devices; an uplink (UL) transmission delay of each of the wireless devices; and a path gain of each of the wireless devices. For each of the one or more URLLC performance parameters which implicitly impact the URLLC performance of the wireless devices of the environment 230, statistics related to these one or more URLLC performance parameters may be obtained. For example, the 5^th percentile, the 95^th percentile, the median or the mean may be obtained representing the SINR of each of the wireless devices; a DL transmission delay of each of the wireless devices; an uplink UL transmission delay of each of the wireless devices; and a path gain of each of the wireless devices.

In some examples, the one or more URLLC performance parameters may comprise at least one of: a URLLC availability of each of the wireless devices 232 and a URLLC reliability of each of the wireless devices 232. In such examples, each of the wireless devices may thus be configured to measure their own respective URLLC availability and/or URLLC reliability. The DPU 240 may thus be configured to obtain these measurements as part of the observations 205.

The translation entity 310 can be configured to output one or more URLLC KPIs indicating an URLLC communication performance of the wireless devices 232 based on the one or more URLLC performance parameters. Said one or more URLLC KPIs may comprise URLLC reliability 207 and/or URLLC availability 208 of the wireless devices 232. The translation entity 310 may thus comprise a URLLC reliability estimator/approximator 312 and/or a URLLC availability estimator/approximator 314. The translation entity 310 may thus be configured to estimate or approximate the one or more URLLC KPIs, which may be subsequently compared to the one or more URLLC requirements 201 by the URLLC manager 214.

In examples according to the present disclosure, the “estimation” performed by the translation entity 310 may comprise an estimate of the one or more URLLC KPIs, for example, URLLC availability and URLLC reliability. In these examples, the one or more URLLC KPIs can be estimated for a period of time, such as based on one or more predefined (e.g. short) time measurements of the URLLC KPIs on the application layer. The predefined (e.g. short) time measurements can be less than the period of time. For example, each of the wireless devices 232 may be configured to measure their own URLLC availability and/or URLLC reliability on the application layer over one or more short time measurements. These URLLC availability and/or URLLC reliability short time measurements from each wireless device 232 may be obtained by the DPU 240 as part of the observations 205. The URLLC availability and/or URLLC reliability short time measurements may then be forwarded to the translation entity 310, which is further configured to estimate the URLLC availability 208 and/or URLLC reliability 207 for the wireless devices 232 as a whole over a period of time, which is longer than the period of the short time measurement. The estimated URLLC availability 208 and/or URLLC reliability 207 for the wireless devices 232 as a whole may thus comprise the one or more URLLC KPIs, which may be subsequently compared to the one or more URLLC requirements 201 by the URLLC manager 214. In such examples, the one or more URLLC performance parameters, received by the translation entity 310 from DPU 240, may thus comprise at least one of: a URLLC availability of each of the wireless devices 232 and a URLLC reliability of each of the wireless devices 232.

In examples according to the present disclosure, the “approximation” performed by the translation entity 310 may comprise an approximation of the one or more URLLC KPIs, for example, URLLC availability 208 and URLLC reliability 207. In these examples, the one or more URLLC KPIs can be approximated based on the one or more URLLC performance parameters, which explicitly impact the URLLC performance of the wireless devices 232 and which implicitly impact the URLLC performance of the wireless devices 232, as described above. In some examples, the “estimation” of the one or more URLLC KPIs may lead to a more accurate determination of the one or more URLLC KPIs compared to the “approximation” of the one or more URLLC KPIs. However, in other examples, the “approximation” of the one or more URLLC KPIs may lead to reduced computational load on the application layer compared to the “estimation” of the one or more URLLC KPIs.

Thus, in some examples, the translation entity 310 may be configured to estimate the one or more URLLC KPIs indicating an URLLC communication performance of the wireless devices 232 based on one or more URLLC performance parameters. In such examples, the one or more URLLC performance parameters may comprise at least one of: a URLLC availability of each of the wireless devices 232 and a URLLC reliability of each of the wireless devices 232. In such examples, the translation entity 310 can comprise a URLLC reliability estimator 312 and/or a URLLC availability estimator 314. The URLLC reliability estimator 312 and the URLLC availability estimator 314 can be configured to receive the one or more URLLC performance parameters from the DPU 240 and output an estimated URLLC reliability 207 and/or an estimated URLLC availability 208, respectively. For example, the URLLC reliability 207 and URLLC availability 208 may be estimated based on finite time measurements made on the application layer, as described above. An example of such a technique that can be used is described in M. Ganjalizadeh, A. Alabbasi, A. Azari, H. S. Ghadikolaei and M. Petrova, "An RL-based Joint Diversity and Power Control Optimization for Reliable Factory Automation," 2021 IEEE Global Communications Conference (GLOBECOM), 2021.

In other examples, the translation entity 310 may be configured to approximate the one or more URLLC KPIs indicating an URLLC communication performance of the wireless devices 232 based on one or more URLLC performance parameters. In such examples, the translation entity 310 can comprise a URLLC reliability approximator 312 and/or a URLLC availability approximator 314. Thus, in a similar manner to that described above, the URLLC reliability approximator 312 and the URLLC availability approximator 314 can be configured to receive one or more URLLC performance parameters from the DPU 240, for example including the one or more URLLC performance parameters, which explicitly and/or implicitly impact URLLC performance of wireless devices 232, and output an approximated URLLC reliability 207 and/or an approximated URLLC availability 208, respectively, of the wireless devices 232. As described above, parameters such as packet error ratio and SI NR have an impact on URLLC availability and URLLC reliability. Based on such one or more URLLC parameters, the URLLC reliability estimator 312 and a URLLC availability estimator 314 may estimate the URLLC reliability 207 and URLLC availability 208, respectively, of the wireless devices 232. To approximate the URLLC reliability 207 and URLLC availability 208, URLLC reliability approximator 312 and URLLC availability approximator 314 may each implement a respective mapping function to calculate the URLLC reliability and the URLLC availability respectively based on the one or more URLLC performance parameters. Said mapping functions may be based on a finite-state Markov chain (FSMC) for example, as described in M. Ganjalizadeh, A. Alabbasi, J. Sachs, and M. Petrova, “Translating cyber-physical control application requirements to network level parameters,” in IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, 2020, vol. 2020-August and in International patent application WO 2021/084027.

Based on one or both of the URLLC reward 209 and the ML reward 202, the RL orchestrator 212 can be configured to generate the policy 204 for selecting a set of the wireless devices 232 to participate in a ML process and the URLLC policy 206 for controlling a power and a bandwidth allocation of URLLC for the wireless devices 232.

There now follows a discussion of a RL based example according to the present disclosure for selecting wireless devices of a CPS environment 230 to participate in a ML process.

The CPS environment 230 typically comprises sensors and actuators with associated wireless devices 232 providing communication to network nodes 234. The wireless devices 232 may be responsible for performing various functions that facilitate automated production. The underlying communication system supporting the CPS environment 230 is responsible for various functions, for example, the timely delivery of sensor data to network nodes 234 and from there to an orchestrator 210. The underlying communication system is additionally responsible for communicating computed or emergency tasks to the actuators. In some examples, the application layer is typically responsible for communicating the computed or emergency tasks to the actuators, which performs the requested action upon receiving corresponding messages. In such a CPS environment, the wireless devices 232 may comprise URLLC wireless devices, which produce URLLC traffic affecting the URLLC performance of the wireless devices 232 of the environment, and ML wireless devices, which produce ML traffic resulting from participation in the ML process. In some examples, a wireless device 232 of the environment 230 may be a URLLC wireless device and a ML wireless device. The policy for selecting wireless devices to participate in the ML process may thus comprise selecting ML wireless devices to participate in the ML process.

The RL method assesses DL and UL transmissions of a (e.g. 5G) deployment within the CPS environment 230. The CPS environment 230 may thus comprise several network nodes 234 configured in a multi-cell setting. In the present example, index i denotes a cell and j denotes a wireless device. Thus, a set of cells C := {C₁, C₂, .. -C|_C|} is included in the environment 230. Each cell Ci serves a set of wireless devices U := {1 ,2,3, ... , K_i}, which is the total number of wireless devices served by a cell Ci. K thus denotes the total number of wireless devices served by all the cells.

The RL orchestrator 212 generates a URLLC policy 206 for URLLC power control, bandwidth allocation for the environment 230. Power control p_ij represents the transmission power level to a wireless device u_ij, and may be quantized into several power levels. The RL orchestrator 212 further generates a policy 204 for selecting a set of wireless devices 232 of the environment 230 to participate in a ML process to enhance ML performance while meeting the one or more URLLC requirements of the CPS environment 230.

As described above, the RL orchestrator 212 comprises an RL agent. The RL framework thus comprises the environment 230 which is described by a set of states, where, as described above, the RL agent applies actions to transition the environment 230 to a new state and receives rewards based on the actions.

The state space describes the environment 230 where the RL orchestrator 212 learns based on a sequence of action-reward pairs. The state at a time slot t, may be denoted by S(t) and includes the one or more URLLC performance parameters, e.g. the one or more URLLC performance parameters which explicitly impact the URLLC reliability and/or URLLC availability and/or one or more URLLC performance parameters which implicitly impact the URLLC reliability and/or URLLC availability. In some examples, the one or more URLLC performance parameters may comprise at least one of: a URLLC availability of each of the wireless devices 232 and a URLLC reliability of each of the wireless devices 232. A step period, At is defined as the time it takes to collect observations 205 from the environment in response to an action applied by the RL orchestrator 212.

As described above, the one or more URLLC performance parameters which explicitly impact the URLLC reliability and URLLC availability may comprise a packet error ratio of each of the wireless devices and a mean down time of each of the wireless devices 232. The URLLC performance parameters which implicitly impact the URLLC reliability and URLLC availability may comprise a SINR of each of the wireless devices 232, a downlink (DL) transmission delay of each of the wireless devices 232, an uplink (UL) transmission delay of each of the wireless devices 232, and a path gain of each of the wireless devices 232. The RL orchestrator 212 can consider statistics related to the one or more URLLC performance parameters, which implicitly impact the URLLC reliability and/or URLLC availability, for example, 5^th percentile, median, mean, and 95^th percentile.

The state at the time slot t, may further include one or more ML performance parameters, which as described above may comprise a ML training delay, a ML training loss, a drift of a local model of each of the wireless devices with respect to a global model, for example, held at an Al master node 236, and a last time a network node 234 of the environment received ML data from each of the wireless devices 232.

The RL example comprises an action space, which is the set of decision parameters and variables through which the RL agent of an RL orchestrator 212 interacts with the environment 230. The decision variables for generating the URLLC policy for controlling the power and bandwidth of the URLLC or URLLC slice, are quantized into M+2 power levels P := {p_min,p₁,P₂,... ,P_M,P_MAX} and B bandwidth levels B := {b₁,b₂,... ,b_B}.

To define the decision variable for generating the policy for selecting a set of wireless devices of the environment to participate in the ML process may comprise an action space for each device of A := U₁ x U₂ x ... x U_N x P x B, where U_i:={0, 1 } is a binary decision variable taking the value 1 when a wireless device i should participate in the next ML training process, and x represents the cartesian product. This decision variable accounts for the URLLC performance of the CPS environment 230 and accounts for the problem of improving ML performance, for example in terms of training delay, by selecting a set of wireless devices to participate in a ML process. In another example, instead of a binary decision variable, the decision variable for determining whether a wireless devices 232 is to participate in a ML process can comprise the characteristic of a probability distribution that generates binary outputs. For example, the decision variable may be the success probability of a Bernoulli random variable and, at every round, the decision for each wireless device 232 to participate in the ML process (e.g. upload or not) may be a new realization of that random variable.

The objective of the reward function of the RL orchestrator 212 is to maximize the ML performance of the ML process based on the ML performance parameters, as described above, with a constraint on satisfying the one or more URLLC requirements for all wireless devices 232.

For the reward function, the ML training delay d_ML of every iteration can be estimated, as well as the training loss, J. The ML training delay d_ML may have many contributing factors including, but not limited to: DL delay d^dl, which is the delay to send a global model from a network node (such as a base station, e.g. gNB) to a wireless device, wireless device local processing delay d_k ^pr, which is the delay to execute the local computations at a wireless device, uplink delay d_k ^ul, which is the delay to collect the wireless device data by a network node (such as a base station, e.g. gNB) and the network node subsequently being ready to run the new global update, and the network node processing delay d^pr, which is the delay to execute the global computations at the network node. The ML training delay may thus be expressed as d_ML = max(d^dl + d_k ^pr+ d_k ^ul + d^pr), where k is the set of wireless devices selected to participate in the ML process.

The URLLC availability a_ij of each wireless device 232 can be defined as equation (1):

Where, Z_ij is the application state variable defining if a wireless device is meeting the one or more URLLC requirements, which can be defined by:

Equation (1) above details how the URLLC availability a_ij for each wireless device 232 can be defined for a time period T. However, it may not be practical to continually measure the application layer URLLC availability a_ij for all of the wireless devices 232 over the entire time period T. Thus, as described above with respect to translation entity 310, the application layer URLLC availability a_ij may be estimated based on one or more predefined (e.g. short) time measurements of the URLLC availability a_ij, for example, as measured by each of the wireless devices 232.

The URLLC availability of each wireless device 232 can thus be estimated from the

following equation:

where as described above, the step period At of the RL orchestrator may define the time taken to collect observations 205 from the environment 230 in response to an action from the RL orchestrator 212. The step period At may thus comprise the predefined (e.g. short) time measurement over which a wireless device 232 may measure their own

URLLC availability The measured URLLC availability for each wireless device

232 may thus be used to estimate the URLLC availability a_ij over the entire time period T for the wireless devices 232 as a whole.

As described above, the application layer URLLC availability a_ij may also be approximated, for example, using the translation entity 310. The application layer URLLC availability a_ij may be approximated based on the one or more URLLC performance parameters which explicitly impact the URLLC reliability and/or URLLC availability and/or one or more URLLC performance parameters which implicitly impact the URLLC reliability and/or URLLC availability. For example, said approximation may be based on a solution to a finite-state Markov chain (FSMC), for example, as described in M. Ganjalizadeh, A. Alabbasi, J. Sachs, and M. Petrova, “Translating cyber-physical control application requirements to network level parameters,” in IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, 2020, vol. 2020-August and in International patent application WO 2021/084027. As described in these references, the URLLC availability a_ij of each wireless device 232 may be approximated according to:

where is the burst error length, p is the expected packet error ratio and N_sv represents the consecutive packet failures with length of {1 , 2, ... N_SV}.

The application layer URLLC reliability Ʈ_ij of each wireless device 232 can be defined as the mean duration of time that the application is operational. The decision variable of the application being operational may be defined as Z_ij(t) = 1 and the decision variable of the application not being operational may be defined as Z_ij(t) = 0. The application layer URLLC reliability Ʈ_ij of each wireless device 232 may thus be defined as:

where N(t) is the number of times the application at a wireless device 232 switches between being operational (i.e. Z_ij(t) = 1) to not being operational (i.e. Z_ij(t) = 0).

As described above, the application layer URLLC reliability Ʈ_ij of each wireless device 232 can also be estimated, for example, by the translation entity 310. The crossing rate l_ij of the URLLC reliability Ʈ_ij of a wireless device 232 transitioning between being operational (i.e. Z_ij(t) = 1) and not being operational i.e. (Z_ij(t) = 0), may be defined as:

From equations (1), (5) and (6), the application layer URLLC reliability Ʈ_ij of each wireless device 232 can be derived as:

Thus, during each step period At, the crossing rate 1^ can be defined by:

The crossing rate

, may thus be measured by each wireless device 232 over the step period At. From the crossing rate from each wireless devices 232, the

application layer URLLC reliability of the wireless devices 232 can thus be estimated, for example, as described above with respect to the translation entity 310.

As described above, the application layer URLLC reliability Ʈ_ij of each wireless device 232 may also be approximated, for example, based on the one or more URLLC performance parameters which explicitly impact the URLLC reliability and/or URLLC availability and/or one or more URLLC performance parameters which implicitly impact the URLLC reliability and/or URLLC availability. For example, said approximation may be approximated based on a solution to a finite-state Markov chain (FSMC) for example, as described in M. Ganjalizadeh, A. Alabbasi, J. Sachs, and M. Petrova, “Translating cyber-physical control application requirements to network level parameters,” in IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, 2020, vol. 2020-August and in International patent application WO 2021/084027. The URLLC reliability Ʈ_ij of each wireless device 232 may thus be approximated according to:

As described above, from the estimated or approximated URLLC availability and URLLC reliability for the wireless devices 232, URLLC KPIs of the environment 230 may thus be determined, where the URLLC KPIs comprise the URLLC availability and URLLC reliability performance for the wireless devices 232 of the environment 230.

From equations (1) and (3) above or (1) and (4) above, the system level URLLC availability a. of the wireless devices 232 of the environment 230, which may comprise a URLLC KPI of the environment 230, can be determined according to the constraint function:

where a_min is the minimum acceptable system level URLLC availability of the environment 230, as defined by the URLLC requirements 201 , and the term 1/K scales the constraint function to have an upper bound of 1 . As described above, the step period of the RL orchestrator is denoted by Δt, which may define the time taken to collect observations 205 from the environment 230 in response to an action from the RL orchestrator 212. The URLLC availability of each wireless device 232 of equation (10)

may thus either be estimated by equation (3) or approximated by equation (4).

From equations (5) and (8) above, the system level crossing rate

for the application to transition between being operational and not operational of the wireless devices 232 of the environment 230 can be determined according to the constraint function:

where a_min is the minimum acceptable system level URLLC availability of the wireless devices 232 of the environment 230, as defined by the URLLC requirements 201 and Ʈ_min is the minimum acceptable system level URLLC reliability of the wireless devices 232 of the environment 230, as defined by the URLLC requirements 201. At again denotes the step period.

The system level crossing rate

of equation (11) can be determined based on the estimated crossing rate of equation (8). However, as described above, the URLLC

reliability of each of the wireless devices 232 may be approximated by equation (9).

Thus, in such examples, the system level URLLC reliability

of the wireless devices 232 of the environment 230 may be determined by the constraint function:

where T_min is the minimum acceptable system level URLLC reliability of the environment

230, as defined by the URLLC requirements 201 and At again denotes the step period.

At the end of the step period At, the URLLC reward 209, the ML reward 202 and the constraints are calculated at time t. For example, the step period At may be about one second. Thus, during the one second, the observations 205 from the environment 230 are obtained, and the URLLC reward 209 and ML reward 202 are calculated at time step t+1s based on the observations 205. The URLLC reward 209 may thus be determined as a function of the system level URLLC availability a. and the system level crossing rate I and/or the system level URLLC reliability

Based on the ML reward 202 and the URLLC reward 209, the RL orchestrator 212 may thus generate a policy 204 for selecting a set of the wireless devices 232 to participate in the ML process and the URLLC policy 206, as described above.

Figure 4 is a graph 400 illustrating results obtained from a 3GPP-compliant industriallevel simulator comprising an environment. The environment comprises simulated wireless devices. Examples according to the present disclosure were applied to the simulator to determine an assessment of ML performance for different selections of the wireless devices to participate in a ML process, for example, a distributed ML process.

The simulation included ten URLLC wireless devices together with twenty users participating in the distributed ML process, k is the number of wireless devices that were selected to participate in one iteration of ML process, and n is the total number of wireless devices in the environment that can potentially participate in the ML process. Thus, graph 400 represents the results where 40%, 60%, 80% and 100% of all the wireless devices in the environment participated in the ML process. In all selections of graph 400, the URLLC requirements were fulfilled.

D_req is the ML delay requirement in which an Al master node of the environment is to receive all of the local models from the k wireless devices, which are participating the ML process. The results for four D_req ML delay requirements are illustrated, 8 ms, 10 ms, 12 ms, 15 ms, for the four different wireless device selections. As illustrated, with a decreasing number of wireless devices participating in the ML process, the probability of the D_reg requirement being met increases or remains substantially unchanged. For example, for 12ms D_req, both 60% and 40% k/n meet the D_req requirement with substantially 100% probability. Thus, for such a D_req requirement, both 60% and 40% k/n achieve the same ML delay while meeting URLLC requirements. Thus, assuming fixed transmission power per wireless device, examples according to the present disclosure can identify the fewest number of wireless devices to meet D_req and as such reduce the power consumption both on the wireless device level due to less UL transmissions and at network node (such as base station, e.g. gNB) level due to less DL packets.

Figure 5 is a graph 500 illustrating results obtained from the 3GPP-compliant industriallevel simulator comprising an environment comprising thirty wireless devices. Graph 500 illustrates an ML accuracy against a number of the wireless devices of the simulated environment participating in an ML process. As illustrated, above approximately fifteen wireless devices, there is substantially no change in the ML accuracy. Thus, all thirty wireless devices do not need to participate in the ML process to achieve a high level of ML accuracy. As such, examples, according to the present disclosure are able to determine a set of the wireless devices, less than the total number of wireless devices, to participate in the ML process, which can reduce signalling.

Figures 6a and 6b illustrate further graphs 600a, 600b illustrating results obtained from the 3GPP-compliant industrial-level simulator comprising the environment comprising thirty simulated wireless devices, as described above for Figure 5. Graphs 600a, 600b illustrate results obtained according to examples of the present disclosure, but where the orchestrator generates a policy for selecting the maximum number of wireless devices to participate in the ML process that can still achieve the URLLC requirements.

Referring to Figure 6a, graph 600a illustrates the URLLC availability of different numbers of wireless devices participating in the ML process. Graph 600a illustrates the availability for a random selection of ten, twenty and thirty wireless devices participating in the ML process. The URLLC availability requirement in graph 600a is 0.995. As illustrated, twenty wireless devices is the maximum number of wireless devices that can participate in the ML process and still meet the URLLC availability requirement. If more than twenty, for example, thirty wireless devices, participated in the ML process, the increased signalling means that the wireless devices cannot meet the URLLC requirement. Referring to Figure 6b, graph 600b illustrates the ML training delay d_ML, defined above, for the random selection of ten, twenty and thirty wireless devices participating in the ML process. As illustrated, as the number of participating wireless devices increases, the ML training delay also increases.

Figures 7a and 7b illustrate further graphs 700a, 700b illustrating results obtained from the 3GPP-compliant industrial-level simulator comprising the environment comprising thirty simulated wireless devices, as described above for Figure 5 and Figures 6 and 6b. Graphs 700a, 700b illustrate results obtained according to examples of the present disclosure where the orchestrator generates a policy to select the optimal set of twenty wireless devices of the environment to participate in the ML process.

Referring to Figure 7a, graph 700a illustrates the URLLC availability of different sets of wireless devices participating in the ML process. Graph 700a illustrates the availability for a ‘random selection’ of twenty wireless devices, a ‘poor selection’ of twenty wireless devices and a ‘best selection’ of twenty wireless devices, participating in the ML process. The URLLC availability requirement in graph 700a is again 0.995. As illustrated, the ‘best selection’ of twenty wireless devices results in an improvement in URLLC availability compared to the ‘random selection’ and ‘poor selection’.

Referring to Figure 7b, graph 700b illustrates the ML training delay d_ML, defined above, for the ‘random selection’ of twenty wireless devices, the ‘poor selection’ of twenty wireless devices and the ‘best selection’ of twenty wireless devices, participating in the ML process. As illustrated, the optimal selection results in a substantial improvement in the ML training delay compared to the ‘random selection’ and ‘poor selection’. Examples according to the present disclosure thus present a method which can identify a set of wireless devices from a total set of the environment, which can meet URLLC requirements, maintain high levels of ML accuracy and reduce ML training delay.

As described above, examples according to the present disclosure thus aim to select a set of wireless devices to participate in an ML process, with a high level of ML accuracy, an improved ML training delay and which can meet URLLC requirements for the wireless devices of an environment. In one use case, the environment may comprise a CPS environment (such as a smart factory) and the wireless devices of the environment may comprise wireless robots. The wireless robots can be configured to carry out processes. Each wireless robot may comprise one or more sensors and one or more actuators, where the wireless robot can be configured to transmit signals obtained from the one or more sensors and control the one or more actuators based on signals received from a wireless robot controller. The communication system between the wireless robots and the wireless robot controller may be enabled using URLLC technologies. The wireless devices of the smart factory may comprise URLLC wireless devices, which produce URLLC traffic affecting the URLLC performance of the wireless devices 232 of the environment and ML wireless devices, which provide ML traffic resulting from participation in the ML process. In some examples, the wireless robots may be both a URLLC wireless device and a ML wireless device.

In the use case above, the wireless robots and the controller may be controlled by an orchestrator according to the examples of the present disclosure. The orchestrator may thus control the wireless robots to participate in an ML process, for example, a federated learning process. The federated learning process may be able to detect or predict faults that may be occurring, or about to occur, within the smart factory. Detecting or predicting faults within a smart factory is of importance as the faults can greatly reduce the efficiency of the smart factory. Federated learning presents a method to quickly and accurately detect or predict faults in a smart factory.

Examples according to the present disclosure may thus be able to select a set of the ML wireless devices of smart factory to participate in a federated learning process for detecting faults, whilst ensuring that the URLLC wireless devices of the smart factory are able to meet URLLC requirements for the smart factory. As described above, examples according to the present disclosure can greatly reduce the ML training delay for an ML process, such as federated learning, and therefore examples according to the present disclosure can lead to faster detection or prediction of a fault within a smart factory, with little to no impact on the URLLC service performance.

According to one example of the present disclosure, there is thus provided a computer- implemented method for managing a federated learning process for detecting or predicting a fault within a smart factory. The method may comprise obtaining an indication of a performance of the federated learning process resulting from application of a first policy to a smart factory environment comprising wireless devices, wherein the first policy is for selecting a first set of the wireless devices to participate in the federated learning process. The method may further comprise obtaining an indication of one or more URLLC requirements for URLLC of the wireless devices and generating a second policy for selecting a second set of the wireless devices to participate in the federated learning process, wherein the second policy is generated based on the indication of the performance of the federated learning process and the indication of the one or more URLLC requirements.

In another use case, examples according to the present disclosure may again be applied to a smart factory comprising a plurality of wireless robots and a wireless robot controller, as described above. In such a use case, examples according to the present disclosure may again be for managing a federated learning process, but for detecting failures on the wireless devices of the smart factory, for example, software and/or hardware failures. Said software and/or hardware failures may again result in inefficiencies within the smart factory. Federated learning presents a method to quickly and accurately detect failures within the wireless devices of a smart factory.

Examples according to the present disclosure may thus be able to select a set of the ML wireless devices of smart factory to participate in a federated learning process for detecting one or more failures within the wireless devices, whilst ensuring that the URLLC wireless devices of the smart factory are able to meet URLLC requirements for the smart factory. As described above, examples according to the present disclosure can greatly reduce the ML training delay for an ML process, such as federated learning, and therefore examples according to the present disclosure can lead to faster detection of failures within the wireless devices of a smart factory, with little to no impact on the URLLC service performance.

According to one example of the present disclosure, there is thus provided a computer- implemented method for managing a federated learning process for detecting one or more failures within wireless devices of a smart factory. The method may comprise obtaining an indication of a performance of the federated learning process resulting from application of a first policy to a smart factory environment comprising wireless devices, wherein the first policy is for selecting a first set of the wireless devices to participate in the federated learning process. The method may further comprise obtaining an indication of one or more URLLC requirements for URLLC of the wireless devices and generating a second policy for selecting a second set of the wireless devices to participate in the federated learning process, wherein the second policy is generated based on the indication of the performance of the federated learning process and the indication of the one or more URLLC requirements.

In another use case, examples according to the present disclosure may again be applied to a smart factory comprising a plurality of wireless robots and a wireless robot controller, as described above. In such a use case, examples according to the present disclosure may again be for managing a federated learning process, but for detecting a cyber attack. Said federated learning process may be for detecting a cyber attack in a smart factory environment utilizing blockchain technology, for example, as described in A. Yazdinejad, A. Dehghantanha, R. M. Parizi, M. Hammoudeh, H. Karimipourand G. Srivastava, "Block Hunter: Federated Learning for Cyber Threat Hunting in Blockchain-based lloT Networks," in IEEE Transactions on Industrial Informatics, doi: 10.1109, Til.2022.3168011 .

Examples according to the present disclosure may thus be able to select a set of the ML wireless devices of smart factory to participate in a federated learning process for detecting a cyber attack, whilst ensuring that the URLLC wireless devices of the smart factory are able to meet URLLC requirements for the smart factory. As described above, examples according to the present disclosure can greatly reduce the ML training delay for an ML process, such as federated learning, and therefore examples according to the present disclosure can lead to faster detection of a cyber attack within a smart factory, with little to no impact on the URLLC service performance.

According to one example of the present disclosure, there is thus provided a computer- implemented method for managing a federated learning process for detecting a cyber attack within a smart factory. The method may comprise obtaining an indication of a performance of the federated learning process resulting from application of a first policy to a smart factory environment comprising wireless devices, wherein the first policy is for selecting a first set of the wireless devices to participate in the federated learning process. The method may further comprise obtaining an indication of one or more URLLC requirements for URLLC of the wireless devices and generating a second policy for selecting a second set of the wireless devices to participate in the federated learning process, wherein the second policy is generated based on the indication of the performance of the federated learning process and the indication of the one or more URLLC requirements.

Although example use cases have been provided, it will be understood that the techniques described herein will be applicable to many other use cases. In particular, examples according to the present disclosure can be applied to any use case with continuous intelligence that makes use of machine learning to implement self-discovery, self-analysis, and self-correction features at a high level through distributed learning. These use cases are commonly referred as ‘self-managed’ or ‘zero touch’ networks and achieve these features by combining real-time analytics and historical data within a particular environment.

Figure 8 is a block diagram illustrating an example orchestrator node 800 which may implement the methods described herein (e.g. the method 100 as illustrated in and described with reference to Figure 1), according to examples of the present disclosure, for example on receipt of suitable instructions from a computer program 850. Referring to Figure 8, the orchestrator node 800 comprises a processor or processing circuitry 802, and may comprise a memory 804 and/or interfaces 806. The processing circuitry 802 is operable to perform some or all of the steps of the methods described herein (e.g. the method 100 as discussed above with reference to Figure 1). The memory 804 may contain instructions executable by the processing circuitry 802 such that the orchestrator node 800 is operable to perform some or all of the steps of the methods described herein (e.g. the method 100 as discussed above with reference to Figure 1). The instructions may also include instructions for executing one or more telecommunications and/or data communications protocols. The instructions may be stored in the form of the computer program 850.

In some examples, the processor or processing circuitry 802 may include one or more microprocessors or microcontrollers, as well as other digital hardware, which may include digital signal processors (DSPs), special-purpose digital logic, etc. The processor or processing circuitry 802 may be implemented by any type of integrated circuit, such as an Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) etc. The memory 804 may include one or several types of memory suitable for the processor, such as read-only memory (ROM), random-access memory, cache memory, flash memory devices, optical storage devices, solid state disk, hard disk drive etc. The orchestrator node 800 may comprise interfaces 806, which may be operable to facilitate communication with a wireless device, and/or with other communication (e.g. network) nodes over suitable communication channels.

In some embodiments, the orchestrator node functionality described herein can be performed by hardware. Thus, in some embodiments, the orchestrator node 800 described herein can be a hardware entity. However, it will also be understood that optionally at least part or all of the orchestrator node functionality described herein can be virtualised. For example, the functions performed by the orchestrator node 800 herein can be implemented in software running on generic hardware that is configured to orchestrate the orchestrator node functionality described herein. Thus, in some embodiments, the orchestrator node 800 described herein can be a virtual entity. In some embodiments, at least part or all of the orchestrator node functionality described herein may be performed in a network enabled cloud. Thus, the method described herein can be realised as a cloud implementation according to some embodiments. The orchestrator node functionality described herein may all be at the same location or at least some of the orchestrator node functionality may be distributed, e.g. the orchestrator node functionality may be performed by one or more different entities.

There is also provided a computer program comprising instructions which, when executed by processing circuitry (such as the processing circuitry 802 of the orchestrator node 800 described herein), cause the processing circuitry to perform at least part of the method described herein. There is provided a computer program product, embodied on a non-transitory machine-readable medium, comprising instructions which are executable by processing circuitry (such as the processing circuitry 802 of the orchestrator node 800 described herein) to cause the processing circuitry to perform at least part of the method described herein. There is provided a computer program product comprising a carrier containing instructions for causing processing circuitry (such as the processing circuitry 802 of the orchestrator node 800 described herein) to perform at least part of the method described herein. In some embodiments, the carrier can be any one of an electronic signal, an optical signal, an electromagnetic signal, an electrical signal, a radio signal, a microwave signal, or a computer-readable storage medium. It will be understood that at least some or all of the method steps described herein can be automated in some embodiments. That is, in some embodiments, at least some or all of the method steps described herein can be performed automatically. The method described herein can be a computer-implemented method.

Therefore, as described herein, there is provided an advantageous technique for managing a ML process.

Examples according to the present disclosure thus present a method for managing an ML process where, as well as the ML performance of the ML process, the URLLC requirements of wireless devices of the environment in which the ML process takes place is also considered. The method can improve ML performance, particularly, in terms of training delay, whilst reducing signalling, which improves the URLLC performance of the wireless devices of the environment and reduces power consumption of the environment. In comparison with known approaches in the state-of-the-art, the same level of URLLC reliability and/or availability or the same level of ML performance can be achieved at a much lower cost and with reduced energy consumption.

Examples according to the present disclosure may greatly reduce the total number of transmissions, latency, and energy consumption of an environment while achieving the same ML accuracy. Examples according to the present disclosure may thus be an enabler of a green cognitive layer.

Examples according to the present disclosure can perform near real-time control of an environment, e.g. of the order of 500 msec, which is compatible with current developments in edge computing for cellular networks. This enables a closed-loop management entity to complete the control loop (observe, process, action) with non and near real-time control. Examples according to the present disclosure thus propose a RL controller cycle period, which aligns with views on Open Radio Access Network (ORAN) standardization of A1 interfaces between non and near Real-Time (RT) RAN Intelligent Controllers (RIC).

Examples according to the present disclosure present a wireless device selection method, which may reduce the energy consumption of the ML process, but also reduce the interference to URLLC and other ML users operating in different cells. Thus, wireless devices are not required to train and transmit their local model at each epoch, and the method thus reduces the energy consumption on the wireless device side.

It should be noted that the above-mentioned embodiments illustrate rather than limit the idea, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.

Claims

1. A computer-implemented method (100) for managing a machine learning, ML, process, the method comprising: obtaining (110) an indication of a performance of the ML process resulting from application of a first policy to an environment (230) comprising wireless devices (232), wherein the first policy is for selecting a first set of the wireless devices (232) to participate in the ML process; obtaining (120) an indication of one or more ultra reliable low latency communication, URLLC, requirements (201) for URLLC of the wireless devices (232); and generating (130) a second policy (204) for selecting a second set of the wireless devices (232) to participate in the ML process, wherein the second policy is generated based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements (201).

2. A computer-implemented method according to claim 1 further comprising: obtaining an indication of one or more URLLC performance parameters indicating an URLLC communication performance of each of the wireless devices (232); wherein generating the second policy (204) is further based on the indication of the one or more URLLC performance parameters.

3. A computer-implemented method according to claim 2 further comprising: comparing the indication of the one or more URLLC performance parameters to the indication of the one or more URLLC requirements (201); and generating a URLLC assessment based on the comparison of the indication of the one or more URLLC performance parameters to the indication of the one or more URLLC requirements (201); wherein generating the second policy (204) based on the indication of the one or more URLLC requirements (201) and based on the indication of the one or more URLLC performance parameters comprises generating the second policy (204) based on the URLLC assessment.

4. A computer-implemented method according to claim 3 further comprising: obtaining the one or more URLLC performance parameters; and translating the one or more URLLC performance parameters into one or more URLLC key performance indicators, KPIs of the wireless devices (232); wherein comparing the indication of the one or more URLLC performance parameters to the indication of the one or more URLLC requirements (201) comprises comparing the one or more URLLC KPIs to the indication of the one or more URLLC requirements (201), and wherein generating the URLLC assessment is based on the comparison of the one or more URLLC KPIs to the indication of the one or more URLLC requirements (201).

5. A computer-implemented method according to any of claim 4 wherein the one or more URLLC requirements (201) and the one or more URLLC KPIs comprise at least one of an indication of URLLC availability (208) of the wireless devices (232) and an indication of URLLC reliability (207) of the wireless devices (232).

6. A computer-implemented method according to claim 4 or 5 wherein translating the one or more URLLC performance parameters into the one or more URLLC KPIs comprises estimating the one or more URLLC KPIs based on the one or more URLLC performance parameters.

7. A computer-implemented method according to any of claims 2-6 wherein the one or more URLLC performance parameters comprise at least one of: a URLLC availability of each of the wireless devices (232); and a URLLC reliability of each of the wireless devices (232).

8. A computer-implemented method according to claim 4 or 5 wherein translating the one or more URLLC performance parameters into the one or more URLLC KPIs comprises approximating the one or more URLLC KPIs based on the one or more URLLC performance parameters.

9. A computer-implemented method according to any of claims 2 to 5 or claim 8 wherein the one or more URLLC performance parameters comprise at least one of: a packet error ratio of each of the wireless devices (232); and a mean down time of each of the wireless devices (232).

10. A computer-implemented method according to claim 9 wherein the one or more URLLC performance parameters further comprise at least one of: a signal to interference and noise ratio, SINR, of each of the wireless devices (232); a downlink, DL, transmission delay of each of the wireless devices (232); an uplink, UL, transmission delay of each of the wireless devices (232); and a path gain of each of the wireless devices (232).

11. A computer-implemented method according to any preceding claim wherein the indication of a performance of the ML process comprises an indication of one or more ML performance parameters.

12. A computer-implemented method according to claim 11 further comprising: obtaining the one or more ML performance parameters from the environment (230); and generating an ML performance assessment based on the one or more ML performance parameters, wherein the indication of the one or more ML performance parameters comprises the ML performance assessment.

13. A computer-implemented method according to claim 12, when claim 11 is directly or indirectly dependent on claim 3, wherein generating the second policy (204) comprises using a Reinforcement Learning, RL, process to generate the second policy (204) and the RL process comprises: obtaining a ML performance value of a ML performance reward function based on the one or more ML performance parameters, wherein the ML performance assessment comprises the value of the ML performance reward function; obtaining a URLLC performance value of a URLLC performance reward function based on the one or more URLLC KPIs and the indication of the one or more URLLC requirements (201), wherein the URLLC assessment comprises the value of the URLLC performance reward function; and wherein generating the second policy (204) comprises generating the second policy (204) based on the ML performance value and the URLLC performance value.

14. A computer-implemented method according to any of claims 11 to 13, wherein the one or more ML performance parameters comprise at least one of: a ML training delay; a ML training loss; a drift of a local model of each of the wireless devices (232) with respect to a global model; and a last time a base station (234) of the environment (230) received ML data from each of the wireless devices (232).

15. A computer-implemented method according to any preceding claim further comprising obtaining a state of each of the wireless devices (232) and wherein generating the second policy (204) is further based on the state of each of the wireless devices (232).

16. A computer-implemented method according to any preceding claim further comprising applying the second policy (204) to the environment (230) comprising instructing the second set of the wireless devices (232) to participate in the ML process.

17. A computer-implemented method according to claim 16 wherein applying the second policy (204) to the environment (230) comprises: outputting the second policy (204) to a planner unit (220); decoding the second policy (204), using the planner unit (220); and instructing the second set of the wireless devices (232) to participate in the ML process based on the decoded second policy (204).

18. A computer-implemented method according to claim 17 wherein instructing the second set of the wireless devices to participate in the ML process comprises transmitting instructions to a virtual planner implemented on each of the second set of the wireless devices (232).

19. A computer-implemented method according to any preceding claim further comprising: obtaining the first policy; and applying the first policy to the environment (230).

20. A computer-implemented method according to claim 19 wherein the first policy comprises an initial policy for generating a learned policy for selecting a set of the wireless devices (232) to participate in the ML process; and wherein the second policy (204) comprises the learned policy.

21. A computer-implemented method according to claim 19 wherein the first policy comprises a learned policy and wherein generating the second policy (204) comprises updating the learned policy.

22. A computer-implemented method according to any preceding claim further comprising generating a URLLC policy (206) for controlling a power and a bandwidth allocation of URLLC for the wireless devices (232) based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements (201).

23. A computer-implemented method according to claim 22 further comprising controlling the power and the bandwidth allocation of URLLC for the wireless devices (232) based on the URLLC policy (206).

24. A computer-implemented method according to claim 23 wherein controlling the power and the bandwidth allocation of URLLC for the wireless devices (232) based on the URLLC policy (206) comprises outputting a request to a URLLC management function (260) to allocate the power and the bandwidth allocation of URLLC for the wireless devices (232) based on the URLLC policy (206).

25. A computer-implemented method according to claim 24 further comprising: receiving, from the URLLC management function (260), a response accepting the request; and responsive to receiving the response, controlling the power and the bandwidth allocation of URLLC for the wireless devices (232) based on the URLLC policy (206).

26. A computer-implemented method according to any preceding claim wherein the environment (230) comprises a cyber-physical system, CPS, environment.

27. An orchestrator node (210, 800) for managing a machine learning, ML, process, the orchestrator node comprising processing circuitry (802) configured to: obtain an indication of a performance of the ML process resulting from application of a first policy to an environment (230) comprising wireless devices (232), wherein the first policy is for selecting a first set of the wireless devices (232) to participate in the ML process; obtain an indication of one or more ultra reliable low latency communication, URLLC, requirements (201) for URLLC of the wireless devices (232); and generate a second policy (204) for selecting a second set of the wireless devices (232) to participate in the ML process, wherein the second policy (204) is generated based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements (201).

28. An orchestrator node according to claim 27 wherein the processing circuitry (802) is further configured to carry out any of the method steps according to claims 2 to 26.

29. A system (200, 300) for managing a machine learning, ML, process, the system (200, 300) comprising: an environment (230) comprising wireless devices (232); and an orchestrator node (210, 800) comprising processing circuitry (802) configured to: obtain an indication of a performance of the ML process resulting from application of a first policy to the environment (230), wherein the first policy is for selecting a first set of the wireless devices (232) to participate in the ML process; obtain an indication of one or more ultra reliable low latency communication, URLLC, requirements (201) for URLLC of the wireless devices (232); and generate a second policy (206) for selecting a second set of the wireless devices (232) to participate in the ML process, wherein the second policy (206) is generated based on the indication of the performance of the ML process and the indication of the one or more URLLC requirements (201).

30. A system according to claim 29 wherein the processing circuitry (802) is further configured to perform any of the steps of claims 2 to 26.

31. A computer program product (850) comprising a computer readable medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform a method as claimed in any one of claims 1 to 26.