EP4635209A1

EP4635209A1 - Managing sensor entities on a radio stripe

Info

Publication number: EP4635209A1
Application number: EP22968672.0A
Authority: EP
Inventors: Senthamiz Selvi ARUMUGAM; Valentin TUDOR; Aitor Hernandez Herranz
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2025-10-22
Also published as: WO2024128943A1

Abstract

A method (200) is disclosed for using Reinforcement Learning to manage sensor entities on a radio stripe, wherein the sensor entities are operable to perform a task. The method comprises obtaining a representation of a current state of the radio stripe (210), and, for sensor entities on the radio stripe (260), using the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity (220) and causing the action to be carried out on the sensor entity (230). The method further comprises obtaining an updated representation of a current state of the radio stripe and a value of a reward function that measures impact of the selected actions on performance of the task (240), updating the selection policy using at least the obtained reward function value (250). A sensor entity comprises at least one sensor device mounted on the radio stripe, and each sensor device of the sensor entity is exposed as a LwM2M object instance (270).

Description

MANAGING SENSOR ENTITIES ON A RADIO STRIPE

TECHNICAL FIELD

The present disclosure relates to methods for managing, and for facilitating management of, sensor entities on a radio stripe. The methods are performed by a management node, a Lightweight Machine to Machine (LwM2M) server node, and a LwM2M client node. The present disclosure also relates to a management node, a LwM2M server node, and a LwM2M client node, and to a computer program product configured, when run on a computer, to carry out methods for managing, and for facilitating management of, sensor entities on a radio stripe.

BACKGROUND

Massive MIMO is one example of Multi-user MIMO (MU-MIMO), which is a set of multiple-input and multiple-output technologies for wireless communication. Distributed Massive MIMO (D-maMIMO) refers to a scenario in which base station antennas are geographically spread out over a large area, in a well-planned or random fashion. WO 2018103897 A9 discloses an antenna arrangement for use in D-maMIMO in which a flexible elongated body is provided with a plurality of antenna devices along its length, and comprises a data bus and power supply line for transmitting data to and from the plurality of antenna devices. This antenna arrangement is referred to as a Radio Stripe and can be used to distribute an antenna system in challenging (for example dense) outdoor and indoor areas including factories, stadiums, etc. Besides offering 3GPP connectivity, the radio stripe deployment may integrate additional sensors such as temperature sensors, microphones/speakers, vibration sensors, etc., and may provide additional features such as fire alarms, burglar alarms, earthquake warning, indoor positioning, and climate monitoring and control, as discussed in WO 2018103897 A9.

Radio stripes may be particularly suited to the provision of connectivity in Ultra Low Latency and other sensitive use cases. In some examples, the availability of power supply to radio stripe deployments may be limited or constrained, owing to the specifics of a given deployment. As such, ineffective or inefficient use of sensors mounted on the radio stripe is undesirable. However, there is as yet no established solution for the control of such sensors for energy efficiency. SUMMARY

It is an aim of the present disclosure to provide methods, a management node, a LwM2M server node, a LwM2M client node, and a computer program product which at least partially address one or more of the challenges mentioned above. It is a further aim of the present disclosure to provide methods, a LwM2M server node, a LwM2M client node, and a computer program product which cooperate to facilitate management of sensors on a radio stripe.

According to a first aspect of the present disclosure, there is provided a method for using Reinforcement Learning to manage sensor entities on a radio stripe, wherein the sensor entities are operable to perform a task. The method, performed by a management node, comprises obtaining a representation of a current state of the radio stripe, and, for sensor entities on the radio stripe, using the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity, and causing the action to be carried out on the sensor entity. The method further comprises obtaining an updated representation of a current state of the radio stripe and a value of a reward function that measures impact of the selected actions on performance of the task, and updating the selection policy using at least the obtained reward function value. For the purposes of the method, a sensor entity comprises at least one sensor device mounted on the radio stripe, and each sensor device of the sensor entity is exposed as a Lightweight Machine to Machine (LwM2M) object instance.

According to another aspect of the present disclosure, there is provided a method for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task. The method, performed by a LwM2M server node, comprises registering a plurality of LwM2M object instances and associated resources, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe and are hosted at a LwM2M client node associated with the radio stripe. For the purposes of the method, a sensor entity comprises at least one sensor device mounted on the radio stripe, and the sensor entities are managed by a management node using a method according to the preceding aspect of the present invention. According to another aspect of the present disclosure, there is provided a method for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task. The method, performed by a LwM2M client node associated with the radio stripe, comprises exposing, to a LwM2M server node, a plurality of LwM2M object instances and associated resources hosted at a LwM2M client node, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe. For the purposes of the method, a sensor entity comprises at least one sensor device mounted on the radio stripe, and the sensor entities are managed by a management node using a method according to a preceding aspect of the present invention.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer readable non-transitory medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform a method according to any one of the aspects or examples of the present disclosure.

According to another aspect of the present disclosure, there is provided a management node for using Reinforcement Learning to manage sensor entities on a radio stripe, wherein the sensor entities are operable to perform a task. The management node comprises processing circuitry configured to cause the management node to obtain a representation of a current state of the radio stripe, and for sensor entities on the radio stripe, use the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity, and cause the action to be carried out on the sensor entity. The processing circuitry is further configured to cause the management node to obtain an updated representation of a current state of the radio stripe and a value of a reward function that measures impact of the selected actions on performance of the task, and update the selection policy using at least the obtained reward function value. A sensor entity comprises at least one sensor device mounted on the radio stripe, and each sensor device of the sensor entity is exposed as a LwM2M object instance.

According to another aspect of the present disclosure, there is provided a LwM2M server node for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task. The LwM2M server node comprises processing circuitry configured to cause the LwM2M server node to register a plurality of LwM2M object instances and associated resources, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe and are hosted at a LwM2M client node associated with the radio stripe. A sensor entity comprises at least one sensor device mounted on the radio stripe, and wherein the sensor entities are managed by a management node using a method according to a preceding aspect of the present disclosure.

According to another aspect of the present disclosure, there is provided a LwM2M client node for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task and the LwM2M client node is associated with the radio stripe. The LwM2M client node comprises processing circuitry configured to cause the LwM2M client node to expose, to a LwM2M server node, a plurality of LwM2M object instances and associated resources hosted at a LwM2M client node, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe. A sensor entity comprises at least one sensor device mounted on the radio stripe, and wherein the sensor entities are management by a management node using a method according to a preceding aspect of the present disclosure.

Aspects of the present disclosure thus provide methods and nodes that enable the management of sensor devices on a radio stripe. The sensor devices are exposed as LwM2M object instances, and managed a logical construct of sensor entities. A single sensor entity on the radio stripe may comprise one or more sensor devices, and a Reinforcement Learning process is used to select actions for execution on the sensor entities in light of performance by the entities of a task. An action selected for execution on a sensor entity may be executed on each of the sensor devices comprised within the logical sensor entity.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the present disclosure, and to show more clearly how it may be carried into effect, reference will now be made, by way of example, to the following drawings in which:

Figure 1 illustrates the main components in a Reinforcement Learning system; Figure 2 is a flow chart illustrating process steps in a method for using Reinforcement Learning to manage sensor entities on a radio stripe;

Figures 3a and 3b show flow charts illustrating process steps in another example of a method for using Reinforcement Learning to manage sensor entities on a radio stripe;

Figure 4 is a flow chart illustrating process steps in a method for facilitating management of sensor entities on a radio stripe using Reinforcement Learning;

Figure 5 is a flow chart illustrating process steps in another method for facilitating management of sensor entities on a radio stripe using Reinforcement Learning;

Figure 6 is a block diagram illustrating functional modules in an example management node;

Figure 7 is a block diagram illustrating functional modules in an example LwM2M server node;

Figure 8 is a block diagram illustrating functional modules in an example LwM2M client node;

Figure 9 is a process flow illustrating an overview of implementation of the methods of Figures 3a, 3b, 4 and 5;

Figure 10 illustrates a training phase of the process flow of Figure 9;

Figure 11 illustrates a prediction phase of the process flow of Figure 9; and

Figures 12 to 15 illustrate an example use case for the methods of the present disclosure.

DETAILED DESCRIPTION

Examples of the present disclosure propose methods and nodes that enable the exposure and control of sensors on a radio stripe, and, according to the nature of the actions selected, may individually control the energy consumption of sensors on the radio stripe. The usage of power is often critical for sensors, which are frequently constrained devices in that they have limited access to at least one of power supply, processing, communication resource, etc. On a radio stripe, sensors can obtain power from the main power bus on the radio stripe, but this itself may also have a limited power supply. Ensuring that sensors are managed to maximize their energy efficiency on the radio stripe is therefore highly beneficial. In some examples of the present disclosure, a LwM2M client node may be running on a modified APU of the radio stripe, and this may expose the sensor devices on the radio stripe as LwM2M object instances. The sensor devices may then be reconfigured by a LwM2M server node with which the LwM2M client node is registered. The reconfiguration parameters (for example control of sensor state or information exposure) are generated by a management node using a Reinforcement Learning method (such as Q-learning for example), which takes as input information from the radio stripe’s power and data bus, and the local LwM2M server. The management node may seek to select actions that maximize energy efficiency while ensuring the sensor devices are adequately performing the task that is required of them.

Example methods according to the present disclosure make use of Reinforcement Learning (RL), and of the LwM2M management protocol. There now follows a brief discussion of each of these concepts.

Reinforcement Learning (RL) is a type of Machine Learning (ML), in which an agent learns to solve a problem by interacting with an environment following a policy, and obtaining a reward for every action it takes. Figure 1 illustrates the main components in a RL system including: an agent, an environment, an action and a reward. Q-learning is a form of Reinforcement Learning that does not require a model, but involves learning a policy that an agent can use to take actions based on the current input characteristics. Rewards are calculated in successive passes to reach a future state. The rewards are saved to a repository called a Q-table which is used as feedback input to the agent. As can be seen in Figure 1 , based on the state observed from the environment, actions are taken and rewards calculated, which rewards are fed back to the agent to help to guide selection of the next action. Q-learning is in general low-cost and fast iterating, having a consequently smaller impact on energy consumption of the equipment on which it is running than other types of RL.

The Open Mobile Alliance (OMA) Lightweight Device Management (DM) protocol, also known as the Lightweight Machine to Machine protocol (LwM2M), is a light and compact device management protocol that may be used for managing loT devices and their resources. The “Internet of Things” (loT) refers to devices enabled for communication network connectivity, so that these devices may be remotely managed, and data collected or required by the devices may be exchanged between individual devices and between devices and application servers. Such devices, examples of which may include sensors and actuators, are often, although not necessarily, subject to severe limitations on processing power, storage capacity, energy supply, device complexity and/or network connectivity, imposed by their operating environment or situation, and may consequently be referred to as constrained devices.

The constrained nature of loT devices has prompted the design and implementation of new protocols and mechanisms. The Constrained Application Protocol (CoAP), as defined in RFC 7252, is one example of a protocol designed for loT applications in constrained nodes and constrained networks. LwM2M is designed to run on top of CoAP, and LwM2M is therefore compatible with any constrained device which supports CoAP.

LwM2M defines three components:

LwM2M Client: contains several LwM2M objects with resources. A LwM2M Management Server can execute commands on the resources to manage the client, including reading, deleting or updating resources. LwM2M Clients are generally run in constrained devices.

LwM2M Management Server: manages LwM2M Clients by sending management commands to them.

LwM2M Bootstrap Server: is used to manage the initial configuration parameters of LwM2M Clients during bootstrapping of a device.

In order to maintain communication between the above discussed components, LwM2M defines several interfaces, including:

Bootstrapping: LwM2M Bootstrap Server sets the initial configuration on a LwM2M Client when the client device bootstraps.

Client Registration: LwM2M Client registers to one or more LwM2M Management Servers when bootstrapping is completed.

Device Management and Service Enablement: LwM2M Management Server can send management commands to LwM2M Clients to perform several management actions on LwM2M resources of the client. An access control object of the client determines the set of actions the server can perform.

Information Reporting: LwM2M Clients can initiate communication to a LwM2M Management Server and report information in the form of notifications.

A constrained device is configured during bootstrap for a specific environment and/or domain before being deployed to use that domain’s LwM2M Management Server. During bootstrapping, a LwM2M Bootstrap Server updates client security information with the assigned LwM2M Management Server address and credentials for the LwM2M Client. In this manner, the assigned LwM2M Management Server is given management rights on the client.

Figure 2 is a flow chart illustrating process steps in a method 200 for using Reinforcement Learning (RL) to manage sensor entities on a radio stripe, wherein the sensor entities are operable to perform a task. For the purposes of the present disclosure, a radio stripe comprises a flexible elongated body with a plurality of antenna devices along its length. The body of a radio stripe is flexible in that, under normal operating conditions for the radio stripe, the body may be deformed or bent under the influence of a force, and may return to its original shape when the force is removed. It may be assumed that the normal operating conditions for the radio stripe include normal atmospheric pressure, and an atmospheric temperature range that may reasonably be expected in an outdoor deployment. It will be appreciated that the degree of flexibility of the body may vary from one end of such a temperature range to the other, but that an ability of the body to deform, substantially elastically, is displayed across substantially the entire temperature range that may be expected in an outdoor deployment. The body may be flexible in one or more degrees of freedom. For example, the body may be operable to be wound upon a spool for storage, and to follow corners and other contours of a building or other infrastructure installation on which the radio stripe may be deployed. In some examples, the body of the radio stripe may be flexible in at least two degrees of freedom, for example both along and perpendicular to its longitudinal axis. The plurality of antenna devices may be mounted along the length of the body with sufficient spacing between them to allow for bending of the body between the antenna devices, and so to maintain an overall degree of flexibility of the radio stripe such that it may be wound and deployed as discussed above. The body of a radio stripe is elongated in that its length substantially exceeds its width. The factor by which the length of the body exceeds its width may vary from around 50 to upwards of 1000, such that the width of the body is practically negligible with respect to its length. A radio stripe for the purposes of the present disclosure also comprises a data bus and power supply line for transmitting data to and from the plurality of antenna devices. The radio stripe also comprises at least one sensor device mounted on the flexible body, meaning the sensor device or devices are physically connected to the flexible body and are connected to the data and power bus of the radio stripe. The radio stripe may in one embodiment comprise an adhesive tape for ease of installation. The radio stripe may in another embodiment have the shape of a cable.

The method is be performed by a management node, which may comprise a physical or virtual node, and may be implemented in a computer system, computing device or server apparatus and/or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. Examples of a virtual node may include a piece of software or computer program, a code fragment operable to implement a computer program, a virtualised function, or any other logical entity. The management node may for example be implemented in a core network of a communication network. In other examples, the management node may be implemented in a Radio Access node, which itself may comprise a physical node and/or a virtualized network function that is operable to exchange wireless signals. In some examples, a Radio Access node may comprise a base station node such as a NodeB, eNodeB, gNodeB, or any future implementation of this functionality. The management node may encompass multiple logical entities, as discussed in greater detail below, and may for example comprise a Virtualised Network Function (VNF). In other examples, the management node may be implemented in a device, which may be a wireless device, a wired device, a constrained device, etc.

The management node may in some examples be implemented in a cloud or edge cloud location near to the LwM2M server node. The management node uses information from and controls a LwM2M client node (for example running on a modified APU on the radio stripe) via a LwM2M server node.

Referring to Figure 2, the method 200 comprises obtaining a representation of a current state of the radio stripe in step 210, and then performing steps 220 and 230 for sensor entities on the radio stripe, as illustrated at 260. In step 220, the method 200 comprises using the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity, and in step 230, the method 200 comprises causing the action to be carried out on the sensor entity. The method further comprises, in step 240, obtaining an updated representation of a current state of the radio stripe and a value of a reward function that measures impact of the selected actions on performance of the task. Finally, the method 200 comprises updating the selection policy using at least the obtained reward function value. As illustrated at step 270, a sensor entity comprises at least one sensor device mounted on the radio stripe, and each sensor device of the sensor entity is exposed as a Lightweight Machine to Machine, LwM2M, object instance.

As discussed above, a sensor entity in the method 200 comprises a logical entity that encompasses one or more physical sensor devices. It will be appreciated that several sensor devices may operate in combination, for example sensing temperature and humidity at a specific location on the radio stripe. It may be advantageous in some circumstances to manage the two sensors together, such that they always have the same operational state. This may also facilitate the Reinforcement Learning, reducing the state action space for the RL by reducing the number of entities within the environment and consequently the size of the state representation and the number of possible actions. A mapping may therefore be envisaged between physical sensor devices and logical sensor entities. In some examples the mapping may be one-to-one, with each sensor entity comprising a single sensor device. In other examples, the mapping could be one-to-many, so that a single logical sensor entity corresponds to several sensors that are grouped under the same operational state, and so considered as a single entity for the RL.

The method 200 specifies that each sensor device of a sensor entity is exposed as a LwM2M object instance. Object types may include temperature sensors, humidity sensors, pressure sensors, etc. Each individual sensor device may therefore be exposed as a specific instance of an object type. Object instances are exposed by a LwM2M client node at which the object instances are hosted. As discussed above, a LwM2M client node may be instantiated on an APU of the radio stripe, or may be running in a device that is otherwise in communication with or connected to the radio stripe. Each object instance is associated with one or more resources, which are sources of information about the object and/or means of control of the object. In the case for example of a temperature sensor, associated resources may include current sensed value, maximum and minimum sensed values, etc., as well as resources associated with the state of the device. Thus, some resources may enable control of the device, to power it on or off or change other aspects of its operational state. These resources can be observed and controlled via a LwM2M server node with which the LwM2M client node is registered.

Figures 3a and 3b show flow charts illustrating another example of a method 300 for using Reinforcement Learning to manage sensor entities on a radio stripe, wherein the sensor entities are operable to perform a task. As for the method 200 discussed above, the method 300 is performed by a management node, which may comprise a physical or virtual node, and may be implemented in a computer system, computing device or server apparatus and/or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. Also as for the method 200 discussed above, although not explicitly illustrated in Figures 3a and 3b, for the purposes of the method 300, a sensor entity comprises at least one sensor device mounted on the radio stripe, and each sensor device of the sensor entity is exposed as a LwM2M object instance. The method 300 illustrates examples of how the steps of the method 200 may be implemented and supplemented to provide the above discussed and additional functionality.

Referring initially to Figure 3a, in a first step 310, the management node obtains a representation of a current state of the radio stripe. As illustrated at 310a, the representation of a current state of the radio stripe may comprise at least one of a value of a parameter characterizing operation of a power bus of the radio stripe, a value of a parameter characterizing operation of a data bus of the radio stripe, a performance requirement for the task, sensor entities present on the radio stripe, and/or operational states of sensor entities present on the radio stripe.

In some examples, as illustrated at 310b, obtaining a representation of a current state of the radio stripe may comprise obtaining an indication of sensor entities present on the radio stripe and operational states of sensor entities present on the radio stripe from a LwM2M server node with which sensor devices of the sensor entities are registered. Obtaining the state representation may further comprise obtaining metrics for the data and power buses from an APU on the radio stripe.

In some examples, as illustrated at 310c, obtaining a representation of a current state of the radio stripe may further comprise obtaining from the LwM2M server node an indication of LwM2M sensor device object instances present on the radio stripe and operational states of the object instances, and mapping the obtained indications to indications of sensor entities present on the radio stripe and operational states of the sensor entities. In such examples, it may be envisaged that the LwM2M server node is only running the LwM2M management of objects and resources as exposed by the LwM2M client, and the management node consequently performs any mapping that may be appropriate between individual devices as exposed on the radio stripe and the sensor entities that the management node uses for performing RL. This ensures that the management node retains control over the granularity of the management that it performs, i.e., over whether to consider each sensor device individually in the RL, or to group multiple sensor devices under a single sensor entity for RL, so as to reduce the state action space for the RL.

As illustrated at 360, the management node then performs steps 320 and 330 for sensor entities on the radio stripe.

In step 320, the management node uses the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity. As illustrated at 320a, an action may comprise placing a sensor entity in at least one of a candidate set of operational states, and the candidate set of operational states may comprises: powered on, powered off, exposure enabled, and exposure disabled. Each of these states represents a different level of energy saving, although not all states may be available for all sensor devices. In devices for which a suitable resource is available, a powered off state may enable the device to be completely switched off, and so making no requirements on the power supplied by the power bus of the radio stripe. Disabling exposure of a device will not cause the device to be powered off, but will nonetheless result in energy savings if the device is not visible to the LwM2M server node, and so there is no associated communication cost owing to the transfer of information and management commands relating to the device.

Additional operational states may also be envisaged in the candidate set. For example, “energy mode” states such as {high-performance, mid-performance etc.,} may be included in the candidate set. It will be appreciated that increasing the number of operational states in the candidate set will necessarily increase the state action space for the RL, and consequently the additional management flexibility offered by extra operational states may be balanced against the computational complexity of the RL process for a larger state action space. With reference to the relation between logical sensor entities and sensor devices, it will be understood that placing a sensor entity in an operational state comprises placing all of the sensor devices comprised within that sensor entity into the relevant operational state.

As illustrated at 320b, using the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity may comprise selecting an action that is predicted to result in a maximum value of the reward function that measures impact of the action on performance of the task. In some examples, using a selection policy to select an action for the current state of the radio stripe may comprise identifying in a Q table the action associated with the highest predicted reward function value, as shown at step 320c. During a training phase, for example while the Q table is being populated and refined, the selection policy may comprise random selection.

In step 330, the management node causes the action to be carried out on the sensor entity. As illustrated at 330a, this may comprise sending an instruction to a LwM2M server with which the individual sensor devices of the sensor entities are registered. The LwM2M server may then send an appropriate instruction to the relevant LwM2M client, which instruction may be carried over CoAP. In the event of anything other than one-to- one mapping between server entities and server devices as discussed above, the management node may map the selected instructions for server entities to instructions for individual server devices before sending the instruction to the LwM2M server node.

Referring now to Figure 3b, and following the carrying out of steps 320 and 330 for server entities on the radio stripe, the management node then obtains an updated representation of a current state of the radio stripe and a value of a reward function that measures impact of the selected actions on performance of the task in step 340. As illustrated at 340a, the reward function may comprise a function of at least one of a performance measure for the task, and/or a measure of energy expenditure of the radio stripe. In some examples the performance measure of the task may include a measure of the number of observers requiring values from the sensor entities. This may reflect the idea that if for example a resource of a sensor device is not observed, then exposure of that device as a LwM2M object may be disabled without adversely affecting performance of the task of the sensors. For example, if the sensor task is environment monitoring for air quality, and no individuals or articles are in the vicinity of particular sensors, then no observers will be monitoring the readings of those sensors, and exposure of the sensors may be disabled without impacting the overall environment monitoring performance.

As illustrated at 340b, the reward function may be such that maximizing a value of the reward function comprises minimizing energy expenditure of the radio stripe without causing the performance measure for the task to fall below a minimum threshold.

In step 350, the management node updates the selection policy using at least the obtained reward function value. As illustrated at 350a, this may comprise updating the selection policy to improve prediction accuracy. This may comprise for example, improving performance of a function that predicts reward values (minimizing difference between actual and predicted reward), or improving performance of a stochastic model that provides probabilities that any given available action will result in the highest reward. In such examples, using the selection policy comprises, respectively, selecting the action associated with the highest predicted reward, or selecting the action associated with the highest probability of resulting in the highest reward. In further examples in which the RL is Q-learning, as illustrated at 350b, updating the selection policy using at least the obtained reward function value may comprise updating the Q table with the obtained value of the reward function. In such examples, using the selection policy comprises selecting the action that is associated in the Q-table with the highest predicted reward function value

The methods 100, 200 may be complemented by a method 400 performed by a LwM2M server node, and a method 500 performed by a LwM2M client node.

Figure 4 is a flow chart illustrating process steps in a method 400 for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task. For the purposes of the method 400, the sensor entities are managed by a management node using examples of the method 200 and/or 300. The method is performed by a LwM2M server node, which may comprise a physical or virtual node, and may be implemented in a computer system, computing device or server apparatus and/or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. The LwM2M server node is running a LwM2M management server, as defined in LwM2M. As for the methods 200 and 300 discussed above, although not explicitly illustrated in Figure 4, for the purposes of the method 400, a sensor entity comprises at least one sensor device mounted on the radio stripe. Referring to Figure 4, a LwM2M server node carrying out the method 400 first registers a plurality of LwM2M object instances and associated resources, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe and are hosted at a LwM2M client node associated with the radio stripe. In some examples, the LwM2M client node may previously have bootstrapped with the LwM2M server node.

As illustrated at step 420, the LwM2M server node may provide to the management node an indication of sensor entities present on the radio stripe and operational states of sensor entities present on the radio stripe in step 420. This may comprise providing to the management node an indication of LwM2M sensor device object instances present on the radio stripe and operational states of the object instances at step 420a. It will be appreciated that the operational states indicated to the management server may comprise at least one of: powered on, powered off, exposure enabled and/or exposure disabled, and may additionally comprise energy mode states or other states encompassed within the candidate set discussed above with respect to the method 300. In some examples, the LwM2M server node may obtain the operational states by reading values of the appropriate resources from the LwM2M client node.

In step 430, the LwM2M server node may receive from the management node an instruction to carry out an action on at least one LwM2M sensor device object instance registered with the LwM2M server node. As illustrated at 430a, the action may comprise placing a sensor device into at least one of a candidate set of operational states, and wherein the candidate set of operational states comprises: powered on, powered off, exposure enabled and/or exposure disabled. The candidate set may contain other operational states, as discussed above. In some examples, the LwM2M server node may additionally provide an update indication to the management node, following execution of the action, for example to confirm it has taken place.

In step 440, the LwM2M server node may send a message to the LwM2M client node hosting the LwM2M server device object instances, the message updating a value of a resource of the LwM2M server device object instance to execute the action. The LwM2M sever node may then inform a registered observer of a LwM2M sensor device object instance registered with the LwM2M server node of an operational state of the LwM2M sensor device object instance, in step 450. The server node may therefore inform consumers of data from the relevant sensors when an action is taken to change the operational state of the sensor.

Figure 5 is a flow chart illustrating process steps in a method 500 for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task. For the purposes of the method 500, the sensor entities are managed by a management node using examples of the method 200 and/or 300. The method is performed by a LwM2M client node associated with the radio stripe. The LwM2M client node may comprise a physical or virtual node, and may be implemented in a computer system, computing device or server apparatus and/or in a virtualized environment, for example in a cloud, edge cloud or fog deployment. The LwM2M client node may for example be instantiated on the radio stripe’s APU or on a separate (relatively small) device attached to the radio stripe and sharing the data and power bus. The LwM2M client node is running a LwM2M client, as defined in LwM2M. As for the methods 200 and 300 discussed above, although not explicitly illustrated in Figure 5, for the purposes of the method 500, a sensor entity comprises at least one sensor device mounted on the radio stripe.

Referring to Figure 5, a LwM2M client node carrying out the method 500 first exposes, to a LwM2M server node, a plurality of LwM2M object instances and associated resources hosted at a LwM2M client node, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe. As illustrated at step 520, the LwM2M client node may also receive a message from the LwM2M server node, the message updating a value of a resource of a LwM2M sensor device object instance exposed to the LwM2M server node. The LwM2M client node may then cause the sensor device represented by the LwM2M sensor device object instance to enter an operational state in accordance with the updated value of the resource in step 530. In some examples, the operational state may comprise at least one of: powered on, powered off, exposure enabled and/or exposure disabled, or other operational states, as discussed above.

As discussed above, the methods 200 and 300 may be performed by a management node, and the present disclosure provides a management node that is adapted to perform any or all of the steps of the above discussed methods. The management node may comprise a physical node such as a computing device, server etc., or may comprise a virtual node. A virtual node may comprise any logical entity, such as a Virtualized Network Function (VNF) which may itself be running in a cloud, edge cloud or fog deployment. The management node may be operable to be instantiated in a cloud or edge cloud based deployment.

Figure 6 is a block diagram illustrating an example management node 600 which may implement the method 200 and/or 300, as illustrated in Figures 2 and 3a and 3b, according to examples of the present disclosure, for example on receipt of suitable instructions from a computer program 650. Referring to Figure 6 the management node 600 comprises a processor or processing circuitry 602, and may comprise a memory 604 and interfaces 606. The processing circuitry 602 is operable to perform some or all of the steps of the method 200 and/or 300 as discussed above with reference to Figures 2 and 3a and 3b. The memory 604 may contain instructions executable by the processing circuitry 602 such that the management node 600 is operable to perform some or all of the steps of the method 200 and/or 300, as illustrated in Figures 2 and 3a and 3b. The instructions may also include instructions for executing one or more telecommunications and/or data communications protocols. The instructions may be stored in the form of the computer program 650. In some examples, the processor or processing circuitry 602 may include one or more microprocessors or microcontrollers, as well as other digital hardware, which may include digital signal processors (DSPs), special-purpose digital logic, etc. The processor or processing circuitry 602 may be implemented by any type of integrated circuit, such as an Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) etc. The memory 604 may include one or several types of memory suitable for the processor, such as read-only memory (ROM), random-access memory, cache memory, flash memory devices, optical storage devices, solid state disk, hard disk drive, etc. The management node 600 may further comprise one or more interfaces suitable for communication with a LwM2M server node and/or other communication network nodes.

As discussed above, the method 400 may be performed by a LwM2M server node, and the present disclosure provides a LwM2M server node that is adapted to perform any or all of the steps of the above discussed methods. The LwM2M server node may comprise a physical node such as a computing device, server etc., or may comprise a virtual node. A virtual node may comprise any logical entity, such as a Virtualized Network Function (VNF) which may itself be running in a cloud, edge cloud or fog deployment. The LwM2M server node may be operable to be instantiated in a cloud based deployment. Figure 7 is a block diagram illustrating an example LwM2M server node 700 which may implement the method 400, as illustrated in Figure 4, according to examples of the present disclosure, for example on receipt of suitable instructions from a computer program 750. Referring to Figure 7, the LwM2M server node 700 comprises a processor or processing circuitry 702, and may comprise a memory 704 and interfaces 706. The processing circuitry 702 is operable to perform some or all of the steps of the method 400 as discussed above with reference to Figure 4. The memory 704 may contain instructions executable by the processing circuitry 702 such that the LwM2M server node 700 is operable to perform some or all of the steps of the method 400, as illustrated in Figure 4. The instructions may also include instructions for executing one or more telecommunications and/or data communications protocols. The instructions may be stored in the form of the computer program 750. In some examples, the processor or processing circuitry 702 may include one or more microprocessors or microcontrollers, as well as other digital hardware, which may include digital signal processors (DSPs), special-purpose digital logic, etc. The processor or processing circuitry 702 may be implemented by any type of integrated circuit, such as an Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) etc. The memory 704 may include one or several types of memory suitable for the processor, such as read-only memory (ROM), random-access memory, cache memory, flash memory devices, optical storage devices, solid state disk, hard disk drive, etc. The LwM2M server node 700 may further comprise one or more interfaces suitable for communication with a management node, a LwM2M client node and/or other communication network nodes.

As discussed above, the method 500 may be performed by a LwM2M client node, and the present disclosure provides a LwM2M client node that is adapted to perform any or all of the steps of the above discussed methods. The LwM2M client node may comprise a physical node such as a computing device, server etc., or may comprise a virtual node. A virtual node may comprise any logical entity, such as a Virtualized Network Function (VNF) which may itself be running in a cloud, edge cloud or fog deployment. The LwM2M client node may be operable to be instantiated in a cloud based deployment or in a device, such as an APU or other device mounted on the radio stripe.

Figure 8 is a block diagram illustrating an example LwM2M client node 800 which may implement the method 500, as illustrated in Figure 5, according to examples of the present disclosure, for example on receipt of suitable instructions from a computer program 850. Referring to Figure 8, the LwM2M client node 800 comprises a processor or processing circuitry 802, and may comprise a memory 804 and interfaces 806. The processing circuitry 802 is operable to perform some or all of the steps of the method 500 as discussed above with reference to Figure. The memory 804 may contain instructions executable by the processing circuitry 802 such that the LwM2M client node 800 is operable to perform some or all of the steps of the method 500, as illustrated in Figure 5. The instructions may also include instructions for executing one or more telecommunications and/or data communications protocols. The instructions may be stored in the form of the computer program 850. In some examples, the processor or processing circuitry 802 may include one or more microprocessors or microcontrollers, as well as other digital hardware, which may include digital signal processors (DSPs), special-purpose digital logic, etc. The processor or processing circuitry 802 may be implemented by any type of integrated circuit, such as an Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) etc. The memory 804 may include one or several types of memory suitable for the processor, such as read-only memory (ROM), random-access memory, cache memory, flash memory devices, optical storage devices, solid state disk, hard disk drive, etc. The LwM2M client node 800 may further comprise one or more interfaces suitable for communication with a LwM2M server node and/or other communication network nodes.

Figures 2 to 5 discussed above provide an overview of methods which may be performed according to different examples of the present disclosure. These methods may be performed by a management node, LwM2M server node and LwM2M client node respectively, as illustrated in Figures 6 to 8. There now follows a detailed discussion of how different process steps illustrated in Figures 2 to 5 and discussed above may be implemented. The functionality and implementation detail described below is discussed with reference to the modules of Figures 6 to 8 performing examples of the methods 200, 300, 400 and/or 500, substantially as described above.

Figures 9 to 11 illustrate example implementation of the methods disclosed herein. Figure 9 is a process flow illustrating an overview of implementation of the methods 300, 400 and 500. The example implementation assumes a radio stripe with one or more sensors attached, and at least one APU which is modified and able to run a small footprint LwM2M Client (“APU with LwM2M client” in Figure 9). Another option would be to have a small device attached to the radio stripe sharing data and power bus and running a LwM2M client (for example an NXP Kinetis KL02 with an 8MHz 32-bit processor, 4KB of RAM and 32KB of internal storage). Figure 9 also illustrates a LwM2M server, a management node, labelled as “online learning agent”, and data consumers of the radio stripe sensor data.

The implementation process flow of Figure 9 comprises the following steps:

Step 1 : APU with LwM2M client can discover and access the neighboring sensors on the same Radio Stripe Power Bus and Data Bus (1)

LwM2M initialization and configuration

Step 2: The APU with LwM2M client goes through LwM2M bootstrap procedure (configuration of keys, LwM2M server access) and registration procedure (initial object discovery and configuration) (Steps 410, 510 of methods 400, 500). The server discovers LwM2M available objects, instances, resources and attributes for configured sensors (2).

Step 3: The available sensors objects are observed by the LwM2M server (3, 4). (5) The available sensors’ values can be read by LwM2M sensors' data consumers.

Reinforcement Learning training

Step 4: (8) A Reinforcement Learning (RL) Model (e.g., Q-Learning) is trained using inputs from the local actors (discussed in greater detail with reference to Figure 10), including: - (6) information about Radio Stripe Power Bus load (i.e. mW or percent) and Radio Stripe Data Bus load (i.e. kbps or percent) - (7) information from LwM2M server: exposed objects parameters (on/off, observe interval, etc.), connected observers, etc. (Steps 310, 420 of method 300, 400)

Reinforcement Learning prediction

Step 5: (9) The result/output of the RL is sent to the LwM2M server (e.g., turning ON/OFF exposed sensor objects, enable/disable exposed sensor objects, modify observability interval, etc.). This can be done via a REST API (discussed in greater detail with reference to Figure 11) (steps 330, 330a, 430 of method 300, 400).

Step 6: (10) Based on the received input, the LwM2M server reconfigures the APU with LwM2M client (e.g., turning ON/OFF exposed sensor objects, enable/disable exposed sensor objects, modify observability interval, etc.) by sending the proper LwM2M commands (steps 440, 520 of method 400, 500).

Notel (LwM2M functionality): enabling and disabling of LwM2M objects can be done using the Create, Read, Update and Delete (CRUD) LwM2M operations (sent from LwM2M Server to LwM2M Client). For example, turning ON/OFF can be done by adding a resource 5850 to the LwM2M Sensor Object and interacting via the LwM2M Write- Attributes operation. The resource identifier 5850 is defined with read and write capabilities for type Boolean. LwM2M Object 3306 (Actuation) or Object 3311 (Light control) allow for actuation (turning ON or OFF) by writing to resource 5850.

Note2 (energy efficiency): As noted above, there is a difference in saved energy between disabling object exposure (with Delete) and turning a sensor OFF with the help of LwM2M functionality (e.g., using Resource 5850 if supported). In the former, the energy saving comes from the reduction in communication as the sensor value cannot be observed (but the sensor will continue to operate on the stripe). In the latter, there is an additional reduction in consumed energy as the sensor will be turned off.

Step 7: (11) Update information on sensors’ data availability can be sent/signaled to the LwM2M sensors' data consumers (step 450 of method 400).

Reinforcement Learning Training and Prediction

Training phase

The training phase of the example implementation is illustrated in Figure 10. The management node (online learning agent) will take inputs including, but not limited to, metrics from the radio stripes’ shared data (1) and power bus (2), along with available sensors (5). Additional inputs from possible Service Level Agreement (SLA) requirements can also be considered (3). Sensors are registered in the remote LwM2M server that is responsible for handling the operations of the sensors by manipulating the objects provided by the LwM2M client (5). The discovery of sensors (4) can be part of the registration or discovery LwM2M procedures as described previously. The example implementation uses a Q-learning method, which is a form of Reinforcement Learning that is model-less and seeks a selection policy that maximizes a reward (6). The reward can be defined to prioritize for example some combination of energy optimization and meeting SLA requirements. All the states of the radio stipe and sensors are saved in a Q-table in the form of [state, action] entries (7). The training phase involves learning a selection policy that tries to reach a state associated with a maximal reward. The agent will learn from the initial parameters representing the current state but will keep updating as a better selection policy for selection of actions is learnt.

In one example, the reward can be a function that considers the available power_bus and data_bus capacity to serve an optimal number of third-party clients which observe the sensor data objects exposed by the LwM2M client. As an example:

Reward = wi x (100 - power_bus_load%) + W2 x (100 - data_bus_load%) - W3 x ([exposed_objects_param]) - W4 x ([served_3rdparty_clients_param])

Where:

• power_bus_load% is the current load of the power bus

• data_bus_load% is the current load of the data bus

• [exposed_objects_param] - is a parameter that summarizes the exposed sensor objects (power/energy and data impact)

• [served_3rdparty_clients_param] - is a parameter that summarizes the impact of the served 3^rd party LwM2M client (e.g., can be extracted from the SLA)

• wi, W2, W3, and W4 are the weights applied to the above parameters.

Here the reward parameter can be defined as a goal to achieve only energy optimization, or looking to meet SLA needs of a use case, or both.

Prediction phase

The prediction phase of the example implementation is illustrated in Figure 11. In the prediction phase, all the inputs (1 ,2,3,5) mentioned in the Training phase are used together with the Q-table based knowledge base/repository to find the corresponding state and take an optimal decision which will be sent to the LwM2M server for implementation (9). Based on this information, the LwM2M server will use the LwM2M functionality client to manage the LwM2M client’s controlled sensors on the radio stripe.

The action space of the management node (online learning agent) may vary according to the number of sensor entities being managed. In one example, it is envisaged that one management node (i.e. one RL agent) controls one LwM2M client (which runs for example on a modified APU on the radio stripe). The client runs on a device which exposes a number of sensors (Num_sensors)

For S states and A actions, there will be a Q-table of size S x A. The number of states for a sensor device is given by the current status of its sensors: for example if a sensor is observed or not (has a consumer for its data readings or not). In such an example, the number of states associated with individual sensor entities would therefore be S = 2^ANum_Sensors. It will be appreciated that the state of the radio stripe may additionally encompass power and data metrics from the radio stripe, as discussed above. If two actions are considered for each sensor device: exposure enabled or exposure disabled (read/write or create/delete object in LwM2M via the Device Management and Service Enablement Interface), the number of actions would then be A = 2^ANum_Sensors. The size of the Q-table would be S x A = 2^A(2xNum_sensors). For a device with 4 sensors (e.g. Temperature, Humidity, Light, Pressure) the size of the Q-table would be S x A = 2^A(2x4) = 256.

As discussed, it may be desirable to have additional states and actions for consideration, and this will result in increasing the size of the Q-table. Another solution to avoid very large tables is therefore to make use of the mapping between individual sensor devices and sensor entities in order to achieve state aggregation (treat many states as one). Thus if for example if two sensors work together, they may be represented as a single sensor entity in the Q-table, and a single action selected for that entity, which action is executed on all of the relevant sensor devices. The reward function may then be established to balance energy consumption against data collection and applicability, consistently wit a goal of maximum energy efficiency while satisfying demands associated with SLA requirements (i.e., task performance).

It will be appreciated that there is a tradeoff between the energy savings achieved by managing sensors on the radio stripe and the energy requirements of running the management node. However, included in this tradeoff, it may be beneficial to consider where and when energy is available, and its cost. For example, keeping unused sensors online on a radio stripe with limited resources may be more expensive training and running RL in another location which can have more available energy and computational power (e.g., edge or cloud). Another option for energy saving in the running of the management node conducting the RL is to consider transfer learning. For example, a trained Q-learning model can be used for similar use cases or patterns, meaning a model trained on one factory set up may be suitable for, or at least a useful starting point for, a similar set up in a different factory.

Example use cases

Indoor Factory inventory system:

In this use case, temperature sensors are pre-installed (onboarded) on the radio stripe, which is deployed along a shelf area with raw materials or finished products. There may be a need to monitor the temperature and humidity to maintain the integrity of the materials or product. Depending on which material or item is to be monitored, the right set of temperature or humidity sensors can be switched on/off for a level/pallet/shelf/section etc. Not all sensors on the stripe may be switched on and be used if they are not needed for the particular scenario, so saving power and energy consumption.

Figures 12 and 13 illustrate an inventory area in an example factory for the present use case. The area is covered for connectivity using a radio stripe on which several temperature sensors are mounted. The SLA requirement for this scenario is that the area marked as shown in Figure 13 needs to be monitored with a certain level of accuracy. This SLA requirement along with metrics from the radio stripe itself, i.e. , from the data and the power bus and the LwM2M objects, is provided as input to the management node. The management node uses these parameters to learn (via Q- learning) a selection policy that meets a reward parameter, in this case to meet the SLA requirements, as illustrated in Figure 14. The prediction phase recommends a set of LwM2M objects that meet the SLA requirements as depicted in Figure 15. The management node therefore selects the optimal set of sensors to be used in order to minimize overall energy consumption of the radio stripe while fulfilling SLA requirements.

Other use cases for the methods disclosed herein can be envisaged. For example, sports venues and stadiums are examples of locations that often require good connectivity, and the capacity to handle connectivity for large numbers of devices over specific time periods. These would therefore be excellent candidates for the connectivity offered by radio stripes as examples of a D-maMIMO solution. It may also be desirable to manage air quality for spectators in these venues, including for example temperature, humidity, the presence of contaminants, etc., using sensor devices mounted on the radio stripes. Examples of the present invention may be employed to ensure energy efficiency in the management of such sensors, for example turning off sensors in parts of the venue that are not occupied, or when the venue is not being used.

Smart cities are another example in which radio stripes may offer significant advantages in connectivity, and sensor devices mounted on radio stripes may offer additional functionality for pollution monitoring, vibration monitoring for earthquake early warning or other purposes, air quality monitoring for management policies, etc. Examples of the present invention may gain be employed to ensure energy efficiency in the management of such sensors, by turning off, or disabling exposure, of sensors in areas that do not require monitoring at any given time.

Finally, examples of the present disclosure may be envisaged for use in the logistics domain, with radio stripes providing connectivity together with monitoring to ensure compliance with requirements for temperature or other environmental conditions. Example methods of the present disclosure can ensure that such monitoring is provided in an energy efficient matter.

Examples of the present disclosure thus provide methods and nodes that use a Reinforcement Learning solution to control individual sensors attached to a radio stripe via a LwM2M client (running for example on a modified radio stripe APU). These methods enable fine grained control of sensor devices on a radio stripe, in order to balance energy efficiency with performance of whatever task the sensors are required to undertake. Energy demands and consumption are critical when deploying a large number of sensors on radio stripes that provide ad-hoc connectivity. Power on and exposure of only those sensors that are needed for a given use case and operational scenario can provide energy savings, as well as supporting optimal usage of power available to the radio stripe for servicing radio requirements, thus supporting high throughput performance of the radio stripe itself.

The methods of the present disclosure may be implemented in hardware, or as software modules running on one or more processors. The methods may also be carried out according to the instructions of a computer program, and the present disclosure also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein. A computer program embodying the disclosure may be stored on a computer readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form. It should be noted that the above-mentioned examples illustrate rather than limit the disclosure, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims or numbered embodiments. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim or embodiment, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims or numbered embodiments. Any reference signs in the claims or numbered embodiments shall not be construed so as to limit their scope.

Claims

1. A method (200) for using Reinforcement Learning to manage sensor entities on a radio stripe, wherein the sensor entities are operable to perform a task, the method, performed by a management node, comprising: obtaining a representation of a current state of the radio stripe (210); and for sensor entities on the radio stripe (260): using the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity (220); and causing the action to be carried out on the sensor entity (230): the method further comprising: obtaining an updated representation of a current state of the radio stripe and a value of a reward function that measures impact of the selected actions on performance of the task (240); and updating the selection policy using at least the obtained reward function value; wherein a sensor entity comprises at least one sensor device mounted on the radio stripe, and wherein each sensor device of the sensor entity is exposed as a Lightweight Machine to Machine, LwM2M, object instance (250).

2. A method as claimed in claim 1 , wherein an action comprises placing a sensor entity in at least one of a candidate set of operational states, and wherein the candidate set of operational states comprises: powered on, powered off, exposure enabled, and exposure disabled (230a).

3. A method as claimed in claim 1 or 2, wherein using the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity comprises selecting an action that is predicted to result in a maximum value of the reward function that measures impact of the action on performance of the task (320b).

4. A method as claimed in claim 3, wherein updating the selection policy using at least the obtained reward function value comprises updating the selection policy to improve prediction accuracy (350a).

5. A method as claimed in any one of the preceding claims, wherein the representation of a current state of the radio stripe comprises at least one of (310a): a value of a parameter characterizing operation of a power bus of the radio stripe; a value of a parameter characterizing operation of a data bus of the radio stripe; a performance requirement for the task; sensor entities present on the radio stripe; operational states of sensor entities present on the radio stripe.

6. A method as claimed in any one of the preceding claims, wherein obtaining a representation of a current state of the radio stripe comprises obtaining an indication of sensor entities present on the radio stripe and operational states of sensor entities present on the radio stripe from a LwM2M server node with which sensor devices of the sensor entities are registered (310b).

7. A method as claimed in claim 6, wherein obtaining a representation of a current state of the radio stripe further comprises obtaining from the LwM2M server node an indication of LwM2M sensor device object instances present on the radio stripe and operational states of the object instances, and mapping the obtained indications to indications of sensor entities present on the radio stripe and operational states of the sensor entities (310c).

8. A method as claimed in any one of the preceding claims, wherein using a selection policy to select an action for the current state of the radio stripe comprises:

Identifying in a Q table the action associated with the highest predicted reward function value (320c).

9. A method as claimed in claim 8, wherein updating the selection policy using at least the obtained reward function value comprises:

Updating the Q table with the obtained value of the reward function (350b).

10. A method as claimed in any one of the preceding claims, wherein the reward function comprises a function of at least one of (340a): a performance measure for the task; a measure of energy expenditure of the radio stripe.

11 A method as claimed in claim 10, wherein maximizing a value of the reward function comprises minimizing energy expenditure of the radio stripe without causing the performance measure for the task to fall below a minimum threshold (340b).

12. A method as claimed in any one of the preceding claims, wherein causing the action to be carried out on the sensor entities comprises sending an instruction to a LwM2M server with which the individual sensor devices of the sensor entities are registered (330a).

13. A method (400) for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task, the method, performed by a Lightweight Machine to Machine, LwM2M, server node, comprising: registering a plurality of LwM2M object instances and associated resources, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe and are hosted at a LwM2M client node associated with the radio stripe (410); wherein a sensor entity comprises at least one sensor device mounted on the radio stripe, and wherein the sensor entities are managed by a management node using a method according to claim 1.

14. A method as claimed in claim 13, further comprising: providing to the management node an indication of sensor entities present on the radio stripe and operational states of sensor entities present on the radio stripe (420).

15. A method as claimed in claim 13 or 14, further comprising: providing to the management node an indication of LwM2M sensor device object instances present on the radio stripe and operational states of the object instances (420a).

16. A method as claimed in any one of claims 13 to 15, further comprising: receiving from the management node an instruction to carry out an action on at least one LwM2M sensor device object instance registered with the LwM2M server node (430); and sending a message to the LwM2M client node hosting the LwM2M server device object instances, the message updating a value of a resource of the LwM2M server device object instance to execute the action (440).

17. A method as claimed in claim 16, wherein the action comprises placing a sensor device into at least one of a candidate set of operational states, and wherein the candidate set of operational states comprises: powered on, powered off, exposure enabled, and exposure disabled (430a).

18. A method as claimed in any one of claims 13 to 17, further comprising: informing a registered observer of a LwM2M sensor device object instance registered with the LwM2M server node of an operational state of the LwM2M sensor device object instance (450).

19. A method (500) for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task, the method, performed by a Lightweight Machine to Machine, LwM2M, client node associated with the radio stripe, comprising: exposing, to a LwM2M server node, a plurality of LwM2M object instances and associated resources hosted at a LwM2M client node, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe (510); wherein a sensor entity comprises at least one sensor device mounted on the radio stripe, and wherein the sensor entities are managed by a management node using a method according to claim 1.

20. A method as claimed in claim 19, further comprising: receiving a message from the LwM2M server node, the message updating a value of a resource of a LwM2M sensor device object instance exposed to the LwM2M server node (520); and cause the sensor device represented by the LwM2M sensor device object instance to enter an operational state in accordance with the updated value of the resource (530).

21 . A method as claimed in claim 20, wherein the operational state comprises at least one of: powered on, powered off, exposure enabled and/or exposure disabled.

22. A computer program product comprising a computer readable medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform a method of any one of embodiments 1 to 21.

23. A management node (600) for using Reinforcement Learning to manage sensor entities on a radio stripe, wherein the sensor entities are operable to perform a task, the management node comprising processing circuitry (602) configured to cause the management node to: obtain a representation of a current state of the radio stripe; and for sensor entities on the radio stripe: use the current state of the radio stripe and a selection policy to select an action to be carried out on the sensor entity; and cause the action to be carried out on the sensor entity: the processing circuitry further configured to cause the management node to: obtain an updated representation of a current state of the radio stripe and a value of a reward function that measures impact of the selected actions on performance of the task; and update the selection policy using at least the obtained reward function value; wherein a sensor entity comprises at least one sensor device mounted on the radio stripe, and wherein each sensor device of the sensor entity is exposed as a Lightweight Machine to Machine, LwM2M, object instance.

24. A management node as claimed in claim 23, wherein the processing circuitry is further configured to cause the management node to perform the steps of any one of claims 2 to 12.

25. A LwM2M server node (700) for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task, the LwM2M server node comprising processing circuitry (702) configured to cause the LwM2M server node to: register a plurality of LwM2M object instances and associated resources, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe and are hosted at a LwM2M client node associated with the radio stripe; wherein a sensor entity comprises at least one sensor device mounted on the radio stripe, and wherein the sensor entities are managed by a management node using a method according to claim 1.

26. A LwM2M server node as claimed in claim 25, wherein the processing circuitry is further configured to cause the LwM2M server node to perform the steps of any one of claims 14 to 18.

27. A LwM2M client node (800) for facilitating management of sensor entities on a radio stripe using Reinforcement Learning, wherein the sensor entities are operable to perform a task and the LwM2M client node is associated with the radio stripe, the LwM2M client node comprising processing circuitry (802) configured to cause the LwM2M client node to: expose, to a LwM2M server node, a plurality of LwM2M object instances and associated resources hosted at a LwM2M client node, wherein the LwM2M object instances represent sensor devices mounted on the radio stripe; wherein a sensor entity comprises at least one sensor device mounted on the radio stripe, and wherein the sensor entities are managed by a management node using a method according to claim 1.

28. A LwM2M client node as claimed in claim 27, wherein the processing circuitry is further configured to cause the LwM2M client node to perform the steps of claim 20 or 21.