WO2020135919A1

WO2020135919A1 - Selecting beamforming options

Info

Publication number: WO2020135919A1
Application number: PCT/EP2018/097127
Authority: WO
Inventors: Véronique Capdevielle; Afef Feki; Amine BELHAJ SALAH; Boris KOUASSI; Claudiu Mihailescu; Christian Mahr
Original assignee: Nokia Technologies Oy
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-07-02

Abstract

A method, apparatus and computer program is described, comprising: collecting performance information, at a current iteration, of a current beamforming option at a node of a mobile communication system; updating a decision function for each of a plurality of beamforming options for the node of the mobile communication system, wherein updating the decision function uses the collected performance information of the current beamforming option at the current iteration; and selecting one of said beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected beamforming option maximises the decision function. The method may make use of machine-learning principles.

Description

Selecting Beamforming Options

Field

This specification relates to selecting beamforming options, for example for

communication between a base station of a mobile communication system and one or more user devices.

Background

Abase station of a communication system may comprise a number of beams that can be used for communications with user devices. There remains a need for developments relating to the selection of beamforming options for use in communication systems.

Summary

In a first aspect, this specification describes an apparatus comprising: means for collecting performance information (such as mean signal received power, signal-to- interference ratio, data throughput, spectral efficiency etc.) at a current iteration, of a current beamforming option at a node of a mobile communication system; means for updating a decision function for each of a plurality of beamforming options (e.g. a grid- of-beams) for the node of the mobile communication system, said means for updating the decision function using the collected performance information of the current beamforming option at the current iteration; and means for selecting one of said beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected beamforming option maximises the decision function (e.g. the decision function may seek to maximise a long-term reward). The mobile communication system may comprise a plurality of nodes. At least some of the other nodes of the mobile communication system may (independently) select one of a plurality of beamforming options available at that node, such that a distributed arrangement for beamforming options selection is provided. Some embodiments comprise means for repeating said selecting of one of said beamforming options at a plurality of next iterations (thus, an iterative process may be implemented). The selection may be updated periodically. Some embodiments may further comprise means for adjusting a period of repeating said selecting of one of said beamforming options at the plurality of next iterations. The period may, for example, be increased over time (e.g. as the selection is stabilised). The means for collecting performance information may collect information from user devices in mobile communication with said node of said mobile communication system.

The means for selecting said one of said beamforming options may be conducted at said node of said communication system independently of other nodes in said

communication system. Thus, each node may make its own decisions, without requiring data or decision information to be received from other nodes, such that signal exchange or co-ordination between nodes is not required. The node may be a base station of said mobile communication system (and similar apparatuses may be provided at other base stations of the mobile communication system). However, this is not essential to all embodiments. For example, the principles described herein could be applied to other devices or entities selecting a beamforming option from a predefined set of possible beamforming options.

The means for updating the decision function may use performance information of previous iterations for the current beamforming option and other beamforming options of the plurality. The decision function for each of the plurality of beamforming options may comprise a combination (e.g. a sum) of a first variable and a second variable for the respective beamforming option. The first variable may be an exploitation variable. The second variable may be an exploration variable. The decision function may be provided to steer the choice of the beamforming option so as to provide a trade-off between exploitation of the beamforming options to maximise a long-term reward and exploring

beamforming options (e.g. beamforming options that have not been explored for a long time). The first variable may comprise a mean reward function for the respective beamforming option (such as an average of past rewards for the respective

beamforming option). The second variable may comprise a usage variable relating to recent usage of the respective beamforming option, wherein the decision function is biased towards selecting beamforming options with low recent usage rates. The second variable may comprise a tuning variable, wherein the tuning variable sets the relative importance of the first and second variables in the outcome of the decision function. The second variable may be dependent on a tuning function and how reliable the data for the particular beam is (for that node). If data is unreliable, there may be a need to update the data to get the best results, but this may be limited by the desire to produce something close to the best option in many instances. The decision function may be defined such that a regret function (e.g. a regret due to exploration) is upper bounded. This may be implemented by the tuning variable referred to above, for example by setting the tuning variable based, at least in part, on an upper confidence bound of a regret function.

The means for updating the decision function (and optionally the means for selecting one of said beamforming options) may implement a machine-learning algorithm (e.g. an algorithm based on reinforcement learning).

The said means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program configured, with the at least one processor, to cause the performance of the apparatus. In a second aspect, this specification describes a method comprising: collecting performance information, at a current iteration, of a current beamforming option at a node of a mobile communication system; updating a decision function for each of a plurality of beamforming options for the node of the mobile communication system, wherein updating the decision function uses the collected performance information of the current beamforming option at the current iteration; and selecting one of said beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected beamforming option maximises the decision function. The mobile communication system may comprise a plurality of nodes. At least some of the other nodes of the mobile communication system may (independently) select one of a plurality of beamforming options available at that node, such that a distributed arrangement for beamforming options selection is provided.

Some embodiments comprise repeating said selecting of one of said beamforming options at a plurality of next iterations (thus, an iterative process may be implemented). The selection may be updated periodically. Some embodiments may further comprise adjusting a period of repeating said selecting of one of said beamforming options at the plurality of next iterations. The period may, for example, be increased over time (e.g. as the selection is stabilised). Collecting performance information may comprise collecting information from user devices in mobile communication with said node of said mobile communication system. Selecting said one of said beamforming options may be conducted at said node of said communication system independently of other nodes in said communication system. Updating the decision function may use performance information of previous iterations for the current beamforming option and other beamforming options of the plurality.

The decision function for each of the plurality of beamforming options may comprise a combination (e.g. a sum) of a first variable and a second variable for the respective beamforming option. The first variable may be an exploitation variable. The second variable may be an exploration variable. The first variable may comprise a mean reward function for the respective beamforming option (such as an average of past rewards for the respective beamforming option). The second variable may comprise a usage variable relating to recent usage of the respective beamforming option, wherein the decision function is biased towards selecting beamforming options with low recent usage rates. The second variable may comprise a tuning variable, wherein the tuning variable sets the relative important of the first and second variables in the outcome of the decision function. The second variable may be dependent on a tuning function and how reliable the data for the particular beam is (for that node).

Updating the decision function may implement a machine-learning algorithm (e.g. an algorithm based on reinforcement learning). Selecting one of said beamforming options may implement a machine-learning algorithm (e.g. an algorithm based on

reinforcement learning).

In a third aspect, this specification describes any apparatus configured to perform any method as described with reference to the second aspect.

In a fourth aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the second aspect.

In a fifth aspect, this specification describes a computer program comprising instructions for causing an apparatus to perform at least the following: collect performance information, at a current iteration, of a current beamforming option at a node of a mobile communication system; update a decision function for each of a plurality of beamforming options for the node of the mobile communication system, wherein updating the decision function uses the collected performance information of the current beamforming option at the current iteration; and select one of said beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected beamforming option maximises the decision function. The mobile communication system may comprise a plurality of nodes. At least some of the other nodes of the mobile communication system may (independently) select one of a plurality of beamforming options available at that node, such that a distributed arrangement for beamforming options selection is provided.

In a sixth aspect, this specification describes a computer-readable medium (such as a non-transitory computer readable medium) comprising program instructions stored thereon for performing at least the following: collecting performance information, at a current iteration, of a current beamforming option at a node of a mobile

communication system; updating a decision function for each of a plurality of beamforming options for the node of the mobile communication system, wherein updating the decision function uses the collected performance information of the current beamforming option at the current iteration; and selecting one of said beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected beamforming option maximises the decision function.

In a seventh aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to: collect performance information, at a current iteration, of a current beamforming option at a node of a mobile communication system; update a decision function for each of a plurality of beamforming options for the node of the mobile communication system, wherein updating the decision function uses the collected performance information of the current beamforming option at the current iteration; and select one of said

beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected beamforming option maximises the decision function. The mobile communication system may comprise a plurality of nodes. At least some of the other nodes of the mobile communication system may (independently) select one of a plurality of beamforming options available at that node, such that a distributed arrangement for beamforming options selection is provided. In an eighth aspect, this specification describes an apparatus comprising: a user device measurement collection module for collecting performance information, at a current iteration, of a current beamforming option at a node of a mobile communication system; a decision function update module for updating a decision function for each of a plurality of beamforming options for the node of the mobile communication system, wherein updating the decision function uses the collected performance information of the current beamforming option at the current iteration; and a beamforming option selection module for selecting one of said beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected beamforming option maximises the decision function.

Brief description of the drawings

Example embodiments will now be described, by way of non-limiting examples, with reference to the following schematic drawings:

FIG. 1 is a block diagram of a system in accordance with an example embodiment;

FIG. 2 is a block diagram of a system in accordance with an example embodiment;

FIG. 3 shows a machine-learning module being used in accordance with an example embodiment;

FIG. 4 is a flow chart showing an algorithm in accordance with an example

embodiment;

FIG. 5 shows a system demonstrating a feature of an example embodiment;

FIG. 6 is a block diagram of a system in accordance with an example embodiment; FIG. 7 is a flow chart showing an algorithm in accordance with an example

embodiment;

FIG. 8 is a flow chart showing an algorithm in accordance with an example

embodiment;

FIGS. 9 to 11 are plots showing aspects of performance of example embodiments; FIG. 12 is a block diagram of a system in accordance with an example embodiment; and FIGS. 13A and 13B show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to example embodiments. Detailed description

In the description, like reference numerals relate to like elements throughout. FIG. l is a block diagram of a system, indicated generally by the reference numeral 10, in accordance with an example embodiment. The system 10 comprises a node 12 of a mobile communication system (such as a base station, eNB, gNodeB, gNB etc.). In use, the node 12 communicates with a number of user devices (not shown).

As shown in FIG. 1, the node 12 comprises a number of beams (such as the beam 14) that can be used for communications with one or more user devices. The system 10 can therefore be used to implement a grid-of-beams type communication between the node 12 and one or more user devices. In one example implementation, the base station 12 and the user device 14 communicate using radio frequencies in the mm-wave band (e.g. of the order of 30 to 300 gigahertz). However, other frequency ranges (e.g. lower frequencies) may be used. A beam selection algorithm may be provided to determine the strongest

communication channel between the node 12 and a user device (e.g. according to some metric, such as received signal strength, although alternative metrics could be used). A complication is that beams from one node may interfere with beams of other nodes. FIG. 2 is a block diagram of a system, indicated generally by the reference numeral 20, in accordance with an example embodiment. The system 20 comprises a first node 22, a second node 23 and a third node 24. The nodes may be base stations and may be similar to the node 12 described above. Thus, each of the nodes 22 to 24 may include a number of beams that could be used for communication purposes. It is not trivial to select an optimum beam selection option at each of multiple nodes of a communication system, such as the system 20.

In some example embodiments described below, beam selection option algorithms are implemented separately at each node (such as the nodes 22 to 24). Thus, a distributed solution may be provided in which a grid-of-beams selection process is embedded within each node (e.g. within each base station of a mobile communication system).

In some example embodiments described below, a machine-learning approach is taken. By way of example, FIG. 3 shows a system, indicated generally by the reference numeral 30, in accordance with an example embodiment. In the system 30, a machine-learning module 32 is used to receive first data at an input and to provide a grid-of-beams (GoB) selection at an output. The module 32 may be implemented separately at each of a plurality of nodes of a communication system (such as the nodes 22, 23 and 24 described above). FIG. 4 is a flow chart showing an algorithm, indicated generally by the reference numeral 40, in accordance with an example embodiment. The algorithm 40 may be implemented by the module 32 described above. As discussed further below, the algorithm 40 is implemented independently for each node within a system (such as the nodes 22 to 24 of the system 20 described above).

At operation 42 of the algorithm 40, performance information is collected (for a current iteration) for a current beamforming option of the relevant node. The performance information could take many forms, such as one or more of mean signal received power, signal-to-interference ratio, data throughput, spectral efficiency etc.

At operation 44, a decision function for each of a plurality of beamforming options (such as the plurality of beams of a grid of beams described above with reference to FIG. 1) for the respective node of the mobile communication system is updated. The operation 44 may be implemented using performance information collected in the operation 42 (i.e. the performance information of the current beamforming option at the current iteration).

At operation 46, a beamforming option for the node (i.e. one of the beams of the grid of means) is selected and the node of the communication system is operated at the next iteration accordingly. The beamforming option may be selected in order to maximise the decision function (for example, to maximise a long-term reward).

With the beamforming option for the next iteration selected, the algorithm 40 returns to operation 42, where the performance information is selected at the next iteration based on the node selected in operation 46.

The algorithm 40 therefore provides an example iterative arrangement for the selection of a beamforming option at a node of a communication system (such as the system 20 described above). At least some of the other nodes of the mobile communication system may select one of a plurality of beamforming options available at that node, such that a distributed arrangement for beamforming options selection is provided. FIG. 5 shows a system, indicated generally by the reference numeral 50, demonstrating a feature of an example embodiment. The system 50 comprises an environment 52 and an agent 54. In accordance with the principles of reinforcement learning, the agent 54 iteratively learns the optimum behaviours in the environment 52 by taking some actions and observing the resulting performances (i.e. rewards). Thus, the agent 54 learns from past experience. As described further below, the principles of the algorithm 50 can be applied to the algorithm 40 described above, such that beamforming options may be selected using the principles of reinforcement learning.

The environment 52 represents a system that is impacted by decisions taken by the agent 54. The agent 54 determines the best action to take, based on:

• observations of the environment 52, that consist of rewards for that action (i.e. the performance of the action chosen by the agent); and

· information of the new state.

In the context of the algorithm 40 described above, Let X = {GoB_k, k = 1 ... K} be the set of beamforming options (i.e. candidate grids of beam). This is the set of possible actions.

At each time t, some or all of the following steps are executed by a respective node of a communication system:

• The agent 54 (which could be hosted by the node of the communication system, such as the system 20, or could be hosted elsewhere in a network, provided that the collected performance data by the relevant node are made available to the agent) picks the best grid-of-beams (GoB): GoB_{k *} among the K possible grid-of-beams, so as to maximize a decision function, or a long-term reward which is updated from the instantaneous reward R_t, which is specified upon the targeted objective.

Different options for the reward are possible. These include: mean Received Signal Power experienced by all the user devices spread over the area to cover; mean

Signal to Interference and Noise ratio by all the user devices served by the node and where Interference originates from the neighbouring cells (which have selected their own optimal GoB); and mean throughput by the user devices of the considered cell.

· The node of the communication system operates the optimal GoB: GoB_k*. • The node of the communication system notifies the user devices (and optionally the users) of the updated GoB selection (if required and at least for the active user devices).

• The user devices report their experience of the GoB choice (RSRP, CQI, etc.).

· The node of the communication system collects these user device measurements and updates the decision function accordingly, for each possible GoB.

• Some or all of the operations above are then repeated.

It should be noted that the process described above may be stopped when convergence is reached (e.g. if the selected GoB does not change over time). Alternatively, or in addition, when the selected GoB is stabilized to a fixed unchanged choice, the decision periods (instant for exploring alternative GoBs then selecting other GoBs) may be increased. It should also be noted that by accounting for interference from the neighbouring cells, the GoB selection made by each node of a communication system can be adapted to the GoB choices realized by its neighbours. Thus, the GoB selection at a particular node can be influenced by GoB choices of neighbouring nodes, without requiring information exchange between the respective nodes.

Different possible reinforcement learning algorithms can be implemented. By way of example, a detailed embodiment based on the so-called‘Multi Armed Bandit’ method is described further below. FIG. 6 is a block diagram of a system, indicated generally by the reference numeral 60, in accordance with an example embodiment. The system 60 comprises a node 61 (such as a base station) of a communication system (e.g. gNBi as labelled in FIG. 6), one or more user devices 62 served by the node 61, a first beamforming option selection module 63, a user device measurement collection module 64, a decision function update module 65 and an updated beamforming option selection module 66.

The system 60 may be implemented at a node (e.g. a base station) of a mobile communication system. Similar systems may be provided at other base stations of the mobile communication system. Thus, decisions can be made locally at each base station, with neighbouring base stations making their own decisions. In this way, an arrangement for the communication system can be arrived at iteratively, with decisions being taken autonomously at each node, with those decisions potentially being influenced by decisions taken (independently) at other nodes. An initial beamforming option is selected and applied by the first beamforming option selection module 63. A variety of methods may be used for selecting the first beamforming options (such as preset and random/pseudo-random selections).

As indicated schematically in FIG. 6, the user device measurement collection module 64 receives user device measurements of the one or more user devices 62 (in the first instance based on the initial beamforming option described above). Such user device measurements may take the form of performance information, such as mean signal received power, signal-to-interference ratio, data throughput, spectral efficiency etc. In this way, the user device measurement collection module 64 can be provided with performance information relating to a beamforming option being used in a current iteration (thereby implementing operation 42 of the algorithm 40 described above).

In the context of a mobile communication system, the user device measurement collection module 64 collects information from user devices in mobile communication with said node of said mobile communication system.

The decision function update module 65 updates a decision function for each of a plurality of beamforming options (e.g. for all options of a so-called grid-of-beams) for the node of the mobile communication system (thereby implementing operation 44 of the algorithm 40). The decision function update module 65 uses the performance information collected by the user device measurement module 64.

As also indicated schematically in FIG. 6, the updated beamforming option selection module 66 selects one of said beamforming options (thereby implementing operation 46 of the algorithm 40) and operates the node of the communication system at a next iteration accordingly. The selection is made in order to maximise the decision function updated by the module 65. The module 66 informs the user devices of any updates to the selected beamforming options such that those user devices can be updated accordingly. As discussed above, separate instances of the updated beamforming option selection module 66 are provided at said nodes of said communication system. Thus, each node makes its own decisions and does not require data or decision information to be received from other nodes. This local decision-making does not require signal exchange or co-ordination between nodes.

With the beamforming options updated, the user devices continue to provide performance measurements to the user device measurement collection module 64 such that the decision function (and hence the selected beamforming option) is iteratively updated.

The system 60 shows an arrangement for updating beamforming options for a single node of a communication system. At least some of the other nodes of the

communication system implement a similar system in order to iteratively, and independently, update a selection of beamforming options available at those nodes, such that a distributed arrangement for beamforming options selection is provided.

Thus, for example, at initialisation, each instance of the first beamforming option selection module 63 at each node independently selects one of the possible

beamforming options (e.g. GoB\ for a first node, GoB for a second node etc.).

Each instance of the user device measurement collection module 64 receives data from the user devices in communication with the respective nodes. The user device measurements account for interference (and are therefore affected by the GoB choices of neighbouring cells) .

Each instance of the decision function update module 65 may then run a machine learning algorithm which outputs an estimated long-term reward per candidate GoB. Subsequently, each instance of the updated beamforming option selection module 66 takes the most suitable decision/ action by picking the beam option which maximizes the decision function and switches to that new selected GoB (e.g. GoB for the first node and G0B₂ for second node etc.).

FIG. 7 is a flow chart showing an algorithm, indicated generally by the reference numeral 70, in accordance with an example embodiment. The algorithm 70 shows an example decision function implementation and may be implemented by instances of the decision function update module 65 described above.

The algorithm 70 starts at operation 72, where a first variable for the respective beamforming option is determined. The first variable may be a so-called exploitation variable that is based, for example, on knowledge of performance information of previous iterations of the current beamforming option. By way of example, the first variable may comprises a mean reward function for the respective beamforming option (e.g. an average of past rewards for the respective beamforming option).

Next, at operation 74, a second variable for the respective beamforming option is determined. The second variable may be a so-called exploration variable that is related to how recently data has been acquired for the respective beamforming option. For example, the second variable may comprise a usage variable relating to recent usage of the respective beamforming option, wherein the decision function is biased towards selecting beamforming options with low recent usage rates.

Finally, at operation 76, a decision function is determined based, at least in part, on the first and second variables determined in the operations 72 and 74. In one embodiment, the decision function comprises a combination (such as a sum, or a weighted sum) of the first and second variables for the respective beamforming option. A means for updating the decision function may be provided that implements a machine-learning algorithm (e.g. the algorithm may be based on reinforcement learning, as discussed further below).

Thus, the algorithm 70 may be provided to steer the choice of the next beamforming option so as to provide a trade-off between exploitation of performance information related to the current beamforming options to maximise the long-term reward and exploring beamforming options that have not been explored for some time. Of course, the algorithm 70 is provided by way of example and variants are possible (such as changing the order of the operations of the algorithm).

The second variable described above with reference to the operation 74 may comprise a tuning variable, wherein the tuning variable sets the relative importance of the first and second variables in the outcome of the decision function. The tuning variable may be dependent on a tuning function and how reliable the data for the particular beam is (e.g. for that node/base station). For example, if data is unreliable, there may be a need to update the data to get best results, but this may be limited by the desire to produce something close to the best option in as many instances as possible. Thus, there is a balance to be struck between exploitation and exploration (and the algorithm 70 seeks to strike that balance). The said tuning function may be defined such as a‘regret’ or a loss due to exploration is upper bounded (i.e. an upper confidence bound is provided).

FIG. 8 is a flow chart showing an algorithm, indicated generally by the reference numeral 80, in accordance with an example embodiment.

The algorithm 80 starts at operation 81, wherein performance information is collected, at a current iteration, of a current beamforming option at a node of a mobile

communication system (such as the system 20 or the system 60 described above). By way of example, the performance information could include one or more of mean signal received power, signal-to-interference ratio, data throughput and spectral efficiency. Other parameters of performance information could also be used.

At operation 82, a decision function for each of a plurality of beamforming options for the node of the mobile communication system is updated. Updating the decision function may involve using the collected performance information of the current beamforming option at the current iteration.

At operation 83, one of said beamforming options is selected, wherein the selected beamforming option maximises the decision function. At operation 84, the node of the communication system is operated based on the beamforming option selected in operation 83. As discussed above, at least some of the other nodes of the system select one of plurality of beamforming options available at that node, such that a distributed arrangement for beamforming options selection is provided. At operation 85, a period of repeating said selection of one of said beamforming options at the plurality of next iterations may be adjusted. For example, the period may be increased over time (e.g. as the beamforming option selection stabilises). The adjustment of the period may be based on the number of iterations made and/or a performance measurement. The algorithm 80 then returns to operation 81 after a wait operation (operation 86) is implemented, such that the selection of the beamforming option is updated

periodically. Thus, the algorithm 80 implements an iterative process. The wait operation may implement the repeat period adjusted at operation 85.

By way of example, a technical implementation of an example embodiment of a grid-of- beams selection algorithm is described below based on a Multi-Armed Bandit (MAB) method. The MAB algorithm described below is an example of reinforcement learning. In this example, each node (e.g. gNB) of a system should select the best grid-of-beams (GoBs) pattern from a predefined set of available GoBs {GoBiii₌₁. Nevertheless, generated inter-cell interference is highly correlated with the GoB pattern selected by each node. Thus, the optimal GoB pattern should seek to guarantee the coverage of the node as well as reduced interference level to neighbouring nodes. For example, in the case of line-of-sight channels, GoB patterns with nominal directions that are colliding results in high inter-cell interference, whilst a shift in the nominal directions of the respective nodes reduces inter-cell interference. Thus, the idea behind the MAB approach is to steer the gNBs towards a favourable GoB selection in an autonomous manner (wherein each gNB realizes its decision without information exchange and without need to central entity).

To decide an optimal (or near-optimal) GoB, each node follows a set of rules that steers its decision and seeks to strike a balance between:

• Exploiting the cumulated knowledge by choosing the most appropriate GoB pattern and performing transmission with it; and

• Exploring other GoB patterns among the available choices to detect other

possibilities that could be interesting to exploit.

We consider a model where each node i can select one GoB among the available set [GOBJ]^L . At each iteration t, the respective node decides autonomously which GoB to use during the following period T . To this purpose, we define a decisional function DF that determines uniquely the next GoB to choose.

At each t, each gNB i chooses GoB_{g t} (the index g denotes the greedy GoB) identified as the GoB maximizing the DF value calculated at time t for each GoB identified by the index j: DFj_t . Note that this function calculated for each GoB depends on the considered node as the nodes do not have the same knowledge of the efficiency of each GoB pattern within the available set:

The decision function is formulated as follows:

Where:

• m]_{· t} is the mean reward function resulting from transmitting with GoB_j at time t for the node i. This parameter considers the performances encountered by the node when transmitting on GoB_j during the relevant period and may correspond, for example, to the measured Signal to Interference Ratio (SIR) or Reference Signal Received Power (RSRP).

• n_j ^l _t is for the number of times GoB_j is chosen by the node i until time t. Low values of this parameter steer the cell to select GoB_j, thereby updating the information on the GoB efficiency which has not been used for a long time.

The parameters aj _t and bj_t have the rule of tuning the exploration degree. This means that for high values of the factor a _t / bj_t , we allow the nodes to choose GoBs with low reward values, which favours the exploration task. As example, a] _t can be set as the variance of the rewards: var(ji_j ^l _t ). Thus, the nodes will choose more frequently the GoBs with high variance and be able to have a knowledge of its efficiency that is closer to the reality. The performance of this algorithm is generally evaluated through a regret function that corresponds to the loss comparing to the optimal case where there is a complete knowledge of the set of GoBs efficiency. The regret will increase with time. Thus, MAB allocation strategies bound the increase of the regret, at least asymptotically. Thus, we make use of the Upper Confidence Bound (UCB) approach. In fact, this algorithm has the advantage to be computationally efficient, does not require prior knowledge of the reward distribution and finally it achieves a logarithmic regret with time. In one implementation the reward function is divided by the maximum bound (for example for the case of throughput). The expected regret R after a number of iterations M verifies the following assumption:

Where L is the number of arms (number of alternative GoB patterns), m* is the expected reward when choosing the GoB m^* is the maximum reward value and A_t is defined as: m^*— p_j. The values a] _t and bj_t (Equation 2) in UCB algorithm are respectively set to 2 x log

The UCB algorithm designed for the GoB selection is highlighted hereafter:

Algorithm 1: UCB based smart GoB selection

i : the index of the gNB with 1 < i £ N

j : the index of the GoB with 1 < j £ L

t : the index of the time iteration

rj_t : the reward gathered when gNB i chooses GoB j at time t

S_{j t} : the cumulated reward when gNB i chooses GoB j until time t

n_j ^l _t : the number of iterations gNB i chooses GoB j until time t

For t do

For i = 1: N do

1- Identification of the greedy GoB indexed g

2- Parameters update: 1 < 7 < L

sj,t+ 1 ^{= s}j,t + ^p(3 ⁼ 7) x ^rj,t

n],t+i = ⁿi + = )

i _ ⁵).t+i

Rj.t+i - i

nj_{t+ 1}

With p(c) = 1 if x is true and o else.

As discussed above, different possible options can be considered for the reward function rf_t : • Reference Signal Received Power (RSRP)

• Signal to Interference Ratio

• Throughput Note that the measurements described above may be carried out based on reference signals used for beam management (e.g. CSI-RS or SSB). Note also that measurements can be weighted per user device (UE) location/pixel upon traffic related information (number of user devices per user device position). The performance of the proposed MAB approach described above has been assessed through simulations. A scenario was defined in a ray tracing tool with three gNBs. Different GoB patterns were set with 16 beams each (mixed narrow and coarse beams to be close to the product specification). These GoB patterns guaranteed a predefined coverage level when deployed (above threshold of -120 dBm in 95% of the cases).

Table 1 - Simulation parameters

The MAB approach (algorithm 1 above) was operated at the level of each gNB independently. The reward function was selected as the median of SIR (Signal to Interference ratio) over the pixels within the region of interest.

FIGs. 9 to 11 are plots, indicated generally by the reference numerals 90, 100 and no respectively, showing aspects of performance of example embodiments

In particular, FIG. 9 shows the GoB selected by each gNB over time following MAB algorithm (algorithm 1 above). At first, in an initialization phase, each GoB pattern was used once. Thereafter, the selection of the GoB is made based on the decision function defined in equation (2). After a transitory period where the GoB selection oscillates among the different possible choices (around 100 iterations), the selection converges to a final optimal GoB. Note that an exploration on other GoBs could be performed from time to time in order to check if other choices are interesting or not.

The performance of machine-learning based GoB selection procedure is evaluated and compared to the baseline scheme where the same GoB pattern is used by all the gNBs.

Two key performance indicators (KPIs) are selected:

• RSRP: Reference signal received power. Each pixel /location in the ROI is

associated with the gNB ensuring the highest RSRP level with the current GoB choice. The performance is then shown as a CDF over the RSRP at each pixel (see FIG. to).

• SINR: Signal to Interference plus noise ratio (noise = - 174 dBm): SINR is estimated at each pixel based on the gNB association as serving base station and interfering gNBs accounting for the current GoB pattern selection by base station. The CDF over the pixels is depicted in FIG. 11 with comparison to the baseline case where the same GoB is selected by all the gNBs.

FIGs. 10 and 11 compare the performance of a scenario in which the various base stations have the same GoB selection with the performance of a scenario in which the GoB selections are made in accordance with the MAB algorithm described above. The obtained results clearly show the gains of the MAB approach, up to 15 dBm for RSRP and up to 10 dB for SINR. For completeness, FIG. 12 is a schematic diagram of components of one or more of the example embodiments described previously, which hereafter are referred to generically as processing systems 300. A processing system 300 may have a processor 302, a memory 304 closely coupled to the processor and comprised of a RAM 314 and ROM 312, and, optionally, user input 310 and a display 318. The processing system 300 may comprise one or more network/ apparatus interfaces 308 for connection to a network/apparatus, e.g. a modem which may be wired or wireless. Interface 308 may also operate as a connection to other apparatus such as device/apparatus which is not network side apparatus. Thus direct connection between devices/apparatus without network participation is possible. The processor 302 is connected to each of the other components in order to control operation thereof.

The memory 304 may comprise a non-volatile memory, such as a hard disk drive (HDD) or a solid state drive (SSD). The ROM 312 of the memory 314 stores, amongst other things, an operating system 315 and may store software applications 316. The RAM 314 of the memory 304 is used by the processor 302 for the temporary storage of data. The operating system 315 may contain code which, when executed by the processor implements aspects of the algorithms 40, 70 and 80 described above. Note that in the case of small device/ apparatus the memory can be most suitable for small size usage i.e. not always hard disk drive (HDD) or solid state drive (SSD) is used.

The processor 302 may take any suitable form. For instance, it may be a

microcontroller, a plurality of microcontrollers, a processor, or a plurality of processors.

The processing system 300 may be a standalone computer, a server, a console, or a network thereof. The processing system 300 and needed structural parts may be all inside device/apparatus such as IoT device/apparatus i.e. embedded to very small size

In some example embodiments, the processing system 300 may also be associated with external software applications. These may be applications stored on a remote server device/ apparatus and may run partly or exclusively on the remote server

device/ apparatus. These applications maybe termed cloud-hosted applications. The processing system 300 may be in communication with the remote server

device/ apparatus in order to utilize the software application stored there.

FIGS. 13A and 13B show tangible media, respectively a removable memory unit 365 and a compact disc (CD) 368, storing computer-readable code which when run by a computer may perform methods according to example embodiments described above. The removable memory unit 365 may be a memory stick, e.g. a USB memory stick, having internal memory 366 storing the computer-readable code. The memory 366 may be accessed by a computer system via a connector 367. The CD 368 may be a CD- ROM or a DVD or similar. Other forms of tangible storage media may be used.

Tangible media can be any device/ apparatus capable of storing data/ information which data/information can be exchanged between devices/apparatus/network. Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a“memory” or“computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant,“computer-readable storage medium”,“computer program product”,“tangibly embodied computer program” etc., or a“processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi -processor architectures and sequencers/parallel architectures, but also specialised circuits such as field

programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices/ apparatus and other devices/ apparatus. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device/ apparatus as instructions for a processor or configured or configuration settings for a fixed function device/apparatus, gate array, programmable logic device/apparatus, etc.

As used in this application, the term“circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagrams of Figures 4, 7 and 8 are examples only and that various operations depicted therein may be omitted, reordered and/or combined.

It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present specification. For example, although the embodiments described above relate to nodes which are base stations of a mobile communication system, the principles described herein are applicable to other scenarios where a selection of a beamforming option from a set of possible beamforming options (such as a grid-of-beams) is made.

Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described example embodiments and/ or the dependent claims with the features of the

independent claims, and not solely the combinations explicitly set out in the claims.

It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.

Claims

1. An apparatus comprising:

means for collecting performance information, at a current iteration, of a current beamforming option at a node of a mobile communication system;

means for updating a decision function for each of a plurality of beamforming options for the node of the mobile communication system, said means for updating the decision function using the collected performance information of the current beamforming option at the current iteration; and

means for selecting one of said beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected beamforming option maximises the decision function.

2. An apparatus as claimed in claim l, further comprising means for repeating said selecting of one of said beamforming options at a plurality of next iterations.

3. An apparatus as claimed in claim 2, further comprising means for adjusting a period of repeating said selecting of one of said beamforming options at the plurality of next iterations.

4. An apparatus as claimed in any one of claims 1 to 3, wherein said means for collecting performance information collects information from user devices in mobile communication with said node of said mobile communication system. 5. An apparatus as claimed in any one of the preceding claims, wherein the means for selecting said one of said beamforming options is conducted at said node of said communication system independently of other nodes in said communication system.

6. An apparatus as claimed in any one of the preceding claims, wherein said node is a base station of said mobile communication system.

7. An apparatus as claimed in any one of the preceding claims, wherein said means for updating the decision function uses performance information of previous iterations for the current beamforming option and other beamforming options of the plurality. 8. An apparatus as claimed in any one of the preceding claims, wherein the decision function for each of the plurality of beamforming options comprises a combination of a first variable and a second variable for the respective beamforming option.

9. An apparatus as claimed in claim 8, wherein the first variable comprises a mean reward function for the respective beamforming option. to. An apparatus as claimed in claim 8 or claim 9, wherein the second variable comprises a usage variable relating to recent usage of the respective beamforming option, wherein the decision function is biased towards selecting beamforming options with low recent usage rates.

11. An apparatus as claimed in any one of claims 8 to 10, wherein the second variable comprises a tuning variable, wherein the tuning variable sets the relative important of the first and second variables in the outcome of the decision function.

12. An apparatus as claimed in any one of claims 8 to 11, wherein the decision function is defined such that a regret function due to exploration is upper bounded.

13. An apparatus as claimed in any one of the preceding claims, wherein said means for updating the decision function implements a machine-learning algorithm.

14. An apparatus as claimed in any one of the preceding claims, wherein the means comprise:

at least one processor; and

at least one memory including computer program code, the at least one memory and the computer program configured, with the at least one processor, to cause the performance of the apparatus.

15. A method comprising:

collecting performance information, at a current iteration, of a current beamforming option at a node of a mobile communication system;

updating a decision function for each of a plurality of beamforming options for the node of the mobile communication system, wherein updating the decision function uses the collected performance information of the current beamforming option at the current iteration; and

selecting one of said beamforming options and operating the node of the communication system at a next iteration accordingly, wherein the selected

beamforming option maximises the decision function.

16. A method as claimed in claim 15, further comprising repeating said selecting of one of said beamforming options at a plurality of next iterations. 17. An apparatus as claimed in claim 15 or claim 16, wherein updating the decision function uses performance information of previous iterations for the current beamforming option and other beamforming options of the plurality.

18. A method as claimed in any one of claims 15 to 17, wherein the decision function for each of the plurality of beamforming options comprises a combination of a first variable and a second variable for the respective beamforming option.

19. A method as claimed in any one of claims 15 to 18, wherein updating said decision function implements a machine-learning algorithm.

20. A computer readable medium comprising program instructions stored thereon for performing at least the following:

beamforming option maximises the decision function.