WO2018068858A1 - Method and device in a wireless communication network for downlink power control - Google Patents

Method and device in a wireless communication network for downlink power control Download PDF

Info

Publication number
WO2018068858A1
WO2018068858A1 PCT/EP2016/074618 EP2016074618W WO2018068858A1 WO 2018068858 A1 WO2018068858 A1 WO 2018068858A1 EP 2016074618 W EP2016074618 W EP 2016074618W WO 2018068858 A1 WO2018068858 A1 WO 2018068858A1
Authority
WO
WIPO (PCT)
Prior art keywords
power control
node
radio cell
agent node
radio
Prior art date
Application number
PCT/EP2016/074618
Other languages
French (fr)
Inventor
Euhanna GHADIMI
Francesco Davide CALABRESE
Pablo SOLDATI
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2016/074618 priority Critical patent/WO2018068858A1/en
Publication of WO2018068858A1 publication Critical patent/WO2018068858A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/06TPC algorithms
    • H04W52/14Separate analysis of uplink or downlink
    • H04W52/143Downlink power control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/28TPC being performed according to specific parameters using user profile, e.g. mobile speed, priority or network state, e.g. standby, idle or non transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/28TPC being performed according to specific parameters using user profile, e.g. mobile speed, priority or network state, e.g. standby, idle or non transmission
    • H04W52/281TPC being performed according to specific parameters using user profile, e.g. mobile speed, priority or network state, e.g. standby, idle or non transmission taking into account user or data type priority

Definitions

  • Implementations described herein generally relate to an agent node, a trainer node and methods therein.
  • Radio interference is a major cause of performance degradation in wireless radio systems.
  • state-of-the-art radio cellular systems have adopted Inter-Cell Interference Coordination (ICIC) schemes.
  • ICIC Inter-Cell Interference Coordination
  • LTE Long Term Evolution
  • two forms of ICIC are supported: frequency domain ICIC (adopted in LTE Rel. 8-9); and time domain ICIC (adopted from LTE Rel. 10 and beyond).
  • Frequency domain ICIC relates to the usage of radio resources in the frequency domain and/ or power adaptation.
  • Current methods, showed in Figure 1 include:
  • Full frequency reuse (the basic operating mode of the LTE system) in which each base station uses the entire frequency spectrum with uniform power distributed across the system bandwidth, thereby creating strong interference to cell edge users.
  • Hard frequency reuse (used in the related art GSM and LTE Rel 8-9) in which each base station operates in one out of a set of non-overlapping portions of the available frequency spectrum in such a way that neighbouring base stations do not use the same set of frequencies.
  • GSM is an abbreviation for Global System for Mobile Communications (originally: Groupe Special Mobile). While this minimises the interference at the cell-edge, the overall spectral efficiency is reduced by a factor equal to the reuse factor.
  • Fractional frequency reuse in which the available frequency spectrum is divided into two portions: a portion common to all base stations used for scheduling cell-centre users; and a second portion that is further divided among base stations in a hard frequency reuse manner and used to schedule transmission to/ from cell-edge users.
  • Soft frequency reuse enables base stations to transmit in the entire frequency spectrum with different power levels: higher transmission power in the portion of the spectrum where cell- edge users are scheduled; lower transmission power in the portion of spectrum where cell- centre users are scheduled.
  • Time domain ICIC consists in periodically muting the transmission of a base station in certain time- frequency resources to enable a further base station to serve mobile stations suffering severe interference in the muted radio resources.
  • the related art LTE system introduced Almost Blank Subframes (ABS), i.e., downlink subframes where only the necessary signals to avoid radio link failure or to maintain backward compatibility are transmitted, including common reference signals (except subframes configured as Multicast-Broadcast Single-Frequency Network (MBSFN), Primary and Secondary Synchronisation Signals (PSS/ SSS), Physical Broadcast Channel (PBCH), System Information BlockType 1 (SIB-1) and paging with their associated Physical Downlink Control Channel (PDCCH).
  • ABS Almost Blank Subframes
  • MBSFN Multicast-Broadcast Single-Frequency Network
  • PSS/ SSS Primary and Secondary Synchronisation Signals
  • PBCH Physical Broadcast Channel
  • SIB-1 System Information BlockType 1
  • PDCCH Physical Downlink Control Channel
  • Time domain muting patters are configured semi-statically by means of bitmaps of length 40, i.e. spanning up to four radio frames, signalled between eNodeBs over the X2 interface.
  • Mobile stations in a victim cell are then categorised into two groups: Mobile stations affected by interference from a cell using ABS, which shall preferably be scheduled in correspondence of a muted subframe from said cell; and mobile stations that are not affected by the interference produced by a neighbouring cell using ABS, which can be scheduled freely in any subframe.
  • the above categorisation is done by comparing Channel State Information (CSI) feedback from mobile stations/ user devices in muted and non-muted subframes of a neighbouring cell.
  • CSI Channel State Information
  • ABS adopted in LTE Rel-10 to mitigate interference for cell-edge users, comprise time- domain muting patterns of data transmission in downlink subframes.
  • the muting pattern of an aggressor cell typically a macro base station
  • a neighbouring victim cell typically pico base stations within the macro-cell coverage area
  • User devices in the coverage are of the victim cell are configured to perform CSI measurements in correspondence of ABS and non-ABS resources to enable the serving cell determine whether the user device is affected by strong interference from the aggressor cell.
  • the time-domain muting patterns and the scheduling decisions are independently determined by the aggressor cell and the victim cell respectively.
  • an agent node for downlink power control of a radio cell of a communication system is provided.
  • the agent node is configured to obtain a power control policy. Further the agent node is configured to determine at least one feature representing a state of at least a part of the communication system, at a first time period.
  • the agent node is also configured to determine a power control action to be performed for downlink power control in the radio cell at the first time period, out of a set of available power control actions associated with the radio cell, based on the obtained power control policy and the determined at least one feature.
  • the agent node is configured to configure a downlink transmission power instruction of the radio cell based on the determined power control action.
  • the proposed concept By autonomously learning different downlink power control strategies using measurements collected from an algorithmic interaction with the radio environment, an appropriate adjustment of the downlink transmission power may be made.
  • downlink power control based on learning the network environmental performance is provided, which may adapt to changes in radio environment conditions without manual adjustment. Thereby downlink power control may be better managed, leading to increased network performance.
  • the proposed concept compared with traditional power control algorithms, has the advantage that reduced signalling overhead is required to learn different downlink power control strategies that can autonomously adapt to changes in the radio environment, users and traffic types.
  • the agent node may be further configured to determine a feature representing the state of the part of the communication system, at a second time period. Also, the agent node may be configured to determine a performance measurement, associated with the performance within the radio cell. Furthermore, the agent node may be configured to transmit a training data message to a trainer node, comprising one or more in the group of: the determined feature representing the state at the first time period, the determined power control action performed at the first time period, the determined feature representing the state at the second time period, and the determined performance measurement. Further the obtained power control policy is received from the trainer node.
  • the agent node may be further configured to select which at least one feature to utilise for representing the state of at least a part of the communication system. Also the agent node may be configured to select which performance measurement associated with the radio cell to utilise for representing the performance of the radio cell.
  • the agent node may be further configured to transmit the configured downlink transmission power instruction of the radio cell to the radio network node for downlink power control of the radio cell of the radio network node.
  • the feature representing the state of at least a part of the communication system is determined based on any of: a measurement related to received signal quality made by and received from a user device in the radio cell; a measurement related to received signal quality made by and received from a user device in another radio cell; a measurement related to downlink transmission power of the radio cell made by and obtained from the radio network node controlling the radio cell; a measurement related to a number of active user devices in the radio cell; a measurement related to types, or distribution, of traffic within the radio cell; a measurement related to location, or distribution, of user devices in the radio cell; or a performance measurement, associated with the performance within the radio cell.
  • the agent node may be further configured to compute a performance measurement associated with at least a part of the communication system, based on the determined performance measurement and at least one other network performance measurement received from another radio network node in the communication system. Also, the agent node may be configured to transmit the training data message to the trainer node comprising the computed performance measurement.
  • the agent node may be further configured to determine application of the obtained power control policy, based on the obtained exploration-to- exploitation control parameter.
  • a radio network node controlling the radio cell is another agent node
  • the training data message transmitted to the trainer node comprises a training data message received from the other agent node
  • the agent node may be further configured to forward the power control policy to be utilised for downlink power control in the radio cell of the other agent node in the communication system, received from the trainer node, to the other agent node.
  • the agent node may be further configured to iterate the determination of the feature representing the state of at least a part of the communication system; the determination of the power control action; the configuration of the downlink transmission power instruction; the determination of the performance measurement; the transmission of the training data message, or a plurality of training data messages, to the trainer node and the obtaining of the power control policy.
  • the agent node may be further configured to adjust the set of available power control actions associated with the radio cell, based on the determined at least one feature representing the state of the part of the communication system, or based on the obtained power control policy.
  • the obtained power control policy may be represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
  • the obtained power control policy further may comprise an adjustment of the set of available power control actions associated with the radio cell; an indicator configuring a combining method for the agent node to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain criterion; at least one indication of the stopping criteria to determine the depth of the decision forest.
  • the agent node may be further configured to determine the performance measurement, associated with the performance within the radio cell by computing a weighted sum of scalars parameterised by a scalar coefficient a G
  • x t represents a radio measurement or a performance indicator associated with the radio cell
  • X is the set of all radio measurements or performance indicators associated with the radio cell and used for the definition of the performance measurement, is a weight
  • a method in an agent node according to the first aspect, or any possible implementations thereof, for downlink power control of a radio cell of a communication system comprises obtaining a power control policy. Further the method may comprise determining at least one feature representing a state of at least a part of the communication system, at a first time period. In addition, the method also may comprise determining a power control action to be performed for downlink power control in the radio cell at a first time period, out of a set of available power control actions associated with the radio cell, based on the obtained power control policy. Also the method in addition comprises configuring a downlink transmission power instruction of the radio cell based on the determined power control action.
  • the method may further comprise determining a feature representing the state of the part of the communication system at a second time period. Also, the method may further comprise determining a performance measurement, associated with the performance within the radio cell. In addition, the method may also comprise transmitting a training data message to the trainer node, comprising one or more in the group of: the determined feature representing the state at the first time period, the determined power control action performed at the first time period, the determined feature representing the state at the second time period, and the determined performance measurement. The method may also comprise receiving the obtained power control policy from the trainer node.
  • the method further may comprise selecting which at least one feature to utilise for representing the state of at least a part of the communication system. Also, the method comprises selecting which performance measurement associated with the radio cell to utilise for representing the performance of the radio cell.
  • the method further may comprise transmitting the configured downlink transmission power instruction of the radio cell to the radio network node for downlink power control of the radio cell of the radio network node.
  • the feature representing the state of at least a part of the communication system is determined based on any of: a measurement related to received signal quality made by and received from a user device in the radio cell; a measurement related to received signal quality made by and received from a user device in another radio cell; a measurement related to downlink transmission power of the radio cell made by and obtained from the radio network node controlling the radio cell; a measurement related to a number of active user devices in the radio cell; a measurement related to types or distribution of traffic within the radio cell; a measurement related to location or distribution of user devices in the radio cell; or a performance measurement, associated with the performance within the radio cell.
  • the method further may comprise computing a performance measurement associated with at least a part of the communication system, based on the determined performance measurement and a network performance measurement received from another radio network node in the communication system; and wherein the training data message transmitted to the trainer node comprises the computed performance measurement.
  • the method further may comprise obtaining an exploration-to-exploitation control parameter, associated with a probability of applying the power control policy. The method further comprises determining application of the obtained power control policy, based on the obtained exploration-to-exploitation control parameter.
  • the radio network node controlling the radio cell of the downlink power control is another agent node
  • the training data message transmitted to the trainer node comprises a training data message received from the other agent node
  • the power control policy to be utilised for downlink power control in the radio cell of the other agent node in the communication system, obtained from the trainer node, is forwarded to the other agent node.
  • the method may comprise adjusting the set of available power control actions associated with the radio cell, based on the determined at least one feature representing the state of the part of the communication system, or based on the power control policy received by the trainer node.
  • the method may comprise iterating the method according to the second aspect, or any previously described implementation thereof.
  • the method may comprise representing the obtained power control policy by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non-consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
  • the obtained power control policy further comprises: an adjustment of the set of available power control actions associated with the radio cell; an indicator configuring a combining method for the agent node to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain criterion; at least one indication of the stopping criteria to determine the depth of the decision forest.
  • X is the set of all radio measurements or performance indicators associated with
  • w t is a weight associated with and is a vector comprising all x t G X.
  • the trainer node is configured to: receive a training data message, associated with the radio cell, from the agent node, wherein the training data message comprises one or more in the group of: a feature representing a state of at least a part of the communication system at a first time period, a power control action performed by the agent node in the radio cell at the first time period, a feature representing the state at the second time period, and a performance measurement. Furthermore, the trainer node is configured to store the received training data message in a database associated with the radio cell. Also, the trainer node is configured to determine a power control policy for the radio cell, based on at least one training data message, stored in the database. Furthermore, the trainer node is configured to transmit the determined power control policy to the agent node.
  • the trainer node may be further configured to determine an exploration-to-exploitation control parameter, associated with a probability of applying the determined power control policy; and wherein the determined exploration-to-exploitation control parameter is transmitted to the agent node together with the determined power control policy.
  • the trainer node may be configured to determine the exploration-to- exploitation control parameter so that the probability of applying the determined power control policy is increased over time.
  • the trainer node may be configured to select which at least one feature the agent node is to utilise for representing the state of at least a part of the communication system and provide the made selection to the agent node.
  • the trainer node may be configured to select which performance measurement associated with the radio cell, the agent node is to utilise for representing the performance of the radio cell and provide the made selection to the agent node.
  • the defined power control policy may be represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of nonlinear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non- consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
  • a method in a trainer node for determining a power control policy to be utilised by an agent node for downlink power control of a radio cell of a communication system.
  • the method comprises receiving a training data message, associated with the radio cell, from the agent node, wherein the training data message comprises one or more in the group of: a feature representing a state of at least a part of the communication system at a first time period, a power control action performed by the agent node in the radio cell at the first time period, a feature representing the state at the second time period, and a performance measurement.
  • the method also comprises storing the received training data message in a database, associated with the radio cell.
  • the method comprises determining a power control policy for the radio cell, based on at least one training data message, stored in the database.
  • the method also comprises transmitting the determined power control policy to the agent node.
  • the method may comprise determining an exploration-to-exploitation control parameter, associated with the power control policy; and wherein the determined exploration-to-exploitation control parameter is transmitted to the agent node together with the determined power control policy.
  • the method may also comprise computing a performance measurement associated with at least a part of the communication system, based on the performance measurement received in the training data message received from the agent node and at least one other performance measurement received from another radio network node in the communication system.
  • the method may also comprise selecting which at least one feature the agent node is to utilise for representing the state of at least a part of the communication system wherein the radio cell is comprised and sending the made selection to the agent node.
  • the method may also comprise selecting which performance measurement associated with the radio cell the agent node is to utilise for representing the performance of the radio cell and sending the made selection to the agent node.
  • the defined power control policy may be represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of nonlinear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non- consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
  • a computer program is provided, with a program code for performing a method according to the fifth aspect, or any possible implementation thereof, when the computer program runs on a computer. Thanks to the provided agent node, trainer node and methods therein, inter-cell interference management is simplified, also in a dense communication network. Thereby spectral efficiency of the system is enhanced. By appropriately adjusting transmission power, interference is reduced. Further, energy savings are made by not transmitting with higher power than required.
  • Another advantage of the disclosed aspects is that an adaptation may be made to changes concerning e.g. traffic load in the cell, traffic type in the cell, user device activity within the cell, etc., over time. Thereby downlink power control in the cell is improved.
  • Figure 1 is a block diagram illustrating different IOC approaches.
  • Figure 2 is a block diagram illustrating a wireless communication network with an agent node controlling the downlink transmission power budget for a co- located radio cell.
  • Figure 4 is a block diagram illustrating a wireless communication network illustrating interaction between agent node and trainer node.
  • Figure 5 is a block diagram illustrating a wireless communication network with multiple agent nodes sharing a single trainer node.
  • Figure 6 is a flow chart illustrating a method in an agent node according to an embodiment of the invention.
  • Figure 7 is a block diagram illustrating an agent node architecture according to an embodiment of the invention.
  • Figure 8 is a flow chart illustrating a method in a trainer node according to an embodiment of the invention.
  • Figure 9 is a block diagram illustrating a trainer node architecture according to an embodiment of the invention.
  • Embodiments of the invention described herein are defined as an agent node, a trainer node and methods therein, which may be put into practice in the embodiments described below. These embodiments may, however, be exemplified and realised in many different forms and are not to be limited to the examples set forth herein; rather, these illustrative examples of embodiments are provided so that this disclosure will be thorough and complete.
  • FIG. 2 is a schematic illustration over a radio communication system 200 wherein the agent node 210 resides in a radio network node, such as an eNodeB of an LTE system, and controls the downlink transmission power budget of at least one radio cell 215 co-located with the radio access node.
  • the downlink power budget is determined based on radio environmental measurements received from a user device 220 camping within the controlled radio cells 215, as well as based on information received by other radio network nodes 230 representing a performance measure associated to the communication system.
  • the neighbour radio network node 230 in the illustrated example may control three radio cells 235-1, 235-2, 235-3.
  • a massive densification of radio access nodes in future radio communication systems 200 makes inter-cell interference management particularly difficult due to the potentially large number of interferers affecting the transmission to/ from a user device 220, and therefore comes with a number of new challenges related to spectral efficiency and energy savings.
  • the agent node 210 may be configured to interact with the radio environment and determine an action for optimising/ improving the downlink transmission power budget of one or more radio network nodes 230, based on a power control policy.
  • the agent node 210 may be co-located with the radio network node 230 in some embodiments. However, in other embodiments, the agent node 210 may be a separate entity versus the radio network node 230. One agent node 210 may further control a plurality of radio cells either co-located 215 or not co-located 235-1, 235-2, 235-3. Thus the expression "radio network node" as utilised in this disclosure, may indicate both an agent node 210 co- located with a radio network node 230, or a separate radio network node 230.
  • the other logical entity is a trainer node (e.g., an eNodeB or a remote server in different embodiments), configured to learn a power control policy based on observation/s of the state of the radio communication system 200 received from one or more agent nodes 210 or from one or more radio network nodes 230.
  • the trainer node may be co-located with the agent node 210. In other embodiments however, the agent node 210 and the trainer node may be separate entities.
  • the trainer may be kept in a centralised server room or similar, where it may withstand wind and weather while being appropriately protected from theft and damage. Further, appropriate maintenance and software updates may conveniently be performed by skilled personnel.
  • the agent node 210 may be configured to receive, from at least one radio network node 230, a message comprising at least one network performance measurement, or local reward as it also may be referred to as, associated with the communication system 200, or a subset thereof, such as e.g. a subset of the communication system 200 wherein the cell 215, 235-1, 235-2, 235-3 is situated.
  • the agent node 210 may additionally be configured to determine at least one feature representing, partly or entirely, the state of the communication system 200 based on the received radio environmental measurement/ ⁇ or the at least one network performance measurement. Further, in some embodiments, the agent node 210 may also be configured to determine a power control action associated with the radio cell 215, 235-1, 235-2, 235-3 in the communication system 200 based on the power control policy, a set of available power control actions, and at least one feature representing the state of the communication system 200, or a subset thereof. The agent node 210 may further be configured to configure the downlink transmission power of the radio cell 215, 235-1, 235-2, 235-3 based on the determined power control action.
  • Radio Base Station which in some networks may be referred to as transmitter, "eNB”, “eNodeB”, “NodeB” or “B node”, depending on the technology and terminology used.
  • the radio network nodes 230 may be of different classes such as e.g. macro eNodeB, home eNodeB or pico base station, based on transmission power and thereby also cell size.
  • One or more radio cells 235-1, 235-2, 235-3 can be controlled by one radio network node 230, or possibly agent node 210, such as e.g. a tri-sectorial radio site.
  • the communication system 200 may at least partly be based on radio access technologies such as, e.g., 3GPP LTE, LTE -Advanced, Evolved Universal Terrestrial Radio Access Net- work (E-UTRAN), Universal Mobile Telecommunications System (UMTS), Global System for Mobile Communications (originally: Groupe Special Mobile) (GSM)/ Enhanced Data rate for GSM Evolution (GSM/EDGE), Wideband Code Division Multiple Access (WCDMA), Time Division Multiple Access (TDMA) networks, Frequency Division Multiple Access (FDMA) networks, Orthogonal FDMA (OFDMA) networks, Single- Carrier FDMA (SC-FDMA) networks, Worldwide Interoperability for Microwave Access (WiMax), or Ultra Mobile Broadband (UMB), High Speed Packet Access (HSPA) Evolved Universal Terrestrial Radio Access (E-UTRA), Universal Terrestrial Radio Access (UTRA), GSM EDGE Radio Access Network (GERAN), 3GPP2 CDMA technologies, e.g., CDMA2000 lx R
  • radio network node 230 and/ or agent node 210 are referred to in the present context, a plurality of user devices 220, radio network nodes 230 and/ or agent nodes 210 may be involved, according to some embodiments.
  • the set of available power control actions may comprise positive or negative power offset values to be applied to the current transmission power value of the radio cell 215, 235-1, 235-2, 235-3. Each value may therefore correspond to increase (positive offset), decrease (negative offset) or hold (zero offset) the current transmission power value.
  • the power control values can equivalently be expressed in binary, linear, logarithmic (decibel), or other suitable scales.
  • the feasible range of a power control action may further depend on the current state of the communication system 200.
  • Figure 3 illustrates an embodiment wherein the agent node 210 is not co-located with the radio network node 230.
  • the agent node 210 controls the downlink power budget associated with at least one radio cell 235-1, 235-2, 235-3 not co-located with the agent node 210.
  • the advantage of this embodiment is to enable centralised control of the downlink transmission power emitted by a plurality of radio network nodes 230, thereby mitigating interference and improving the spectral efficiency of the system 200.
  • the agent node 210 may further be configured to configure the downlink transmission power of the radio cell 235-1, 235-2, 235-3 of the other radio network node 230, based on the determined power control action and transmit a control message to the radio network node 230 comprising said downlink transmission power adjustment.
  • control message of the agent node 210 may further comprise an indication of time associated with the power control action, such as a starting time indicating when to apply the power control action and a power control window indicating the validity of the power control action.
  • control message may additionally comprise a plurality of power control actions and an associated indication of time.
  • the power control actions may further be associated with one or more radio cells 235-1, 235-2, 235-3 controlled by the radio network node 230, or a plurality of radio network nodes 230.
  • Each feature f j £ T can be determined based on radio environmental measurements received from e.g. user devices 220 within one or more radio cells 215, 235-1, 235-2, 235-3 controlled by the agent node 210 or based on measurements associated to the communication system 200 received from at least a radio network node
  • the set of features can therefore comprise e.g. an indicator of the
  • the interference may be measured by the user devices 220 within the at least one radio cell 215 controlled by the agent node 210 and reported to the agent node 210.
  • the set of features may in addition comprise an indicator of the average, minimum or maximum Signal to Noise Ratio (SNR) associated with the user devices 220 within at least one cell 215 controlled by the agent node 210.
  • the set of features may further comprise an indicator of the average, minimum or maximum Signal to Interference plus Noise Ratio (SINR) associated with the user devices 220 within at least one cell 215 controlled by the agent node 210.
  • the set of features may also comprise an indicator of the reward function associated with the radio cell 235-1, 235-2, 235-3 controlled by the network node 230.
  • RSSI Received Signal Strength Indication
  • RCPI Received Channel Power Indicator
  • SNR signal-to-interference ratio
  • PSNR Peak signal-to-noise ratio
  • SINAD Signal-to- noise and distortion ratio
  • Another possible feature in the set of features for determining the state of the communication system 200, or a subset thereof, may comprise a measurement related to a number of active user devices 220 in the radio cell 215, 235-1, 235-2, 235-3.
  • the feature in the set of features may comprise a measurement related to types of traffic within the radio cell 215, 235-1, 235-2, 235-3.
  • Yet another feature in the set of features may comprise a measurement related to location of user devices 220 in the radio cell 215, 235-1, 235-2, 235-3 or type of traffic of user devices 200.
  • the radio environmental measurement received from user devices 220 within the radio cell 215 controlled by the agent node 210 may comprise at least one or more in the group of: a measurement of the RSRP associated with at least one cell 215 controlled by the agent node 210; a measurement of RSRP associated with at least one neighbouring cell 235- 1, 235-2, 235-3, i.e., interference; a measurement of the SNR associated to at least one cell 215 controlled by the agent node 210; and/ or measurement of the SINR associated with at least one cell 215 controlled by the agent node 210.
  • the agent node 210 can further determine/ characterise the state s t associated with a part or the entire communication system 200 at a given time t by selecting a subset of features / ⁇ G
  • the state of the communication system 200 can be represented by different combinations of features and different number of features, such as the above enumerated.
  • the agent node 210 is further configured to determine a performance measurement r t , which may be referred to as a reward associated with one or more radio cells 215, 235-1, 235-2, 235-3 in the communication system 200 given the power control action a t G ⁇ taken at time t and the state s t of the system 200 at time Rather than
  • the agent node 210 can estimate the cell performance measurement r t based on the radio environmental measurements x t G X received by the user devices 220 served by the radio cell 215, 235-1, 235-2, 235-3.
  • the user measurements may provide observations of the cell state at time t resulting from the application of the power control action a t in some embodiments.
  • the agent node 210 determines the performance measurement associated with the controlled radio cell 215, 235-1, 235-2, 235-3 as weighted sum of scalars parameterised by a scalar coefficient a G [0, ⁇ ) and transformed by a function (with domain Xand range of real scalars),
  • x t represents a radio measurement or a performance indicator associated with the radio cell
  • X is the set of all radio measurements or performance indicators associated with the radio cell and used for the definition of the performance measurement, is a weight
  • the function and represents the average data throughput of user i in the radio cell and the reward in equation [1] can be
  • the agent node 210 determines a performance measurement r s associated with at least one part or the whole communication system 200 (e.g., a group of more than one radio cell 215, 235-1, 235-2, 235-3) based on the at least one network performance measurement, which may be referred to as a local reward, received from the radio network node 230 and the performance measurement r t associated with at least one radio cell 215 controlled by the agent node 210.
  • a performance measurement r s associated with at least one part or the whole communication system 200 e.g., a group of more than one radio cell 215, 235-1, 235-2, 235-3
  • the power control policy is a function, mapping the state of the communication system 200 to the set of the available actions.
  • the power control policy may be e.g. deterministic, stochastic, probabilistic or a combination thereof.
  • the power control policy may be represented by one or more neural network, wherein each neural network comprises an input layer consisting of set of input units, a set of hidden layers each consisting a set of hidden units, and one output layer consisting of one or more output units.
  • Each neural network is represented by a set of weights denoting a real-valued
  • the agent node 210 can be configured with a codebook of neural networks N indexed by eventually of different size (i.e. with different number of hidden layer, and
  • the agent node 210 determines a power control action by inputting at least one neural network associated with a power control policy with at least one action and with at least a feature The output layer of each neural network n determines
  • the agent node 210 may be configured with the power control policy represented by a single neural network and determines the power control action ⁇ ; * with the maximum likelihood coefficient as:
  • the agent node 210 may be configured with a power control policy represented by a set comprising number of neural networks and determines a power control action based on the following steps: Determine, for each neural network
  • the power control action with the maximum likelihood coefficient based on equation [1] Choose the power control action as the one that has been selected by largest number of neural networks (e.g., majority vote). If two or more power control action a t * have been selected by an equal number of neural networks with majority vote, one of said power control actions a t * may be chosen at random with equal probability, in some embodiments.
  • An advantage of these described embodiments may be to reduce the variance of the inter- cell interference experienced in the communication system 200, thereby enabling to achieve higher spectral efficiency within the system 200.
  • the power control policy can be learned by the agent node 210 based on the available radio environmental measurements received from user devices 220 in the radio cells 210 controlled by the agent node 210.
  • the agent node 210 may receive a control message comprising the power control policy from the trainer node.
  • the agent node 210 may be further configured to: receive, from the trainer node, the control message comprising the power control policy for downlink power control.
  • the received power control can be associated with one or more radio cells 215 controlled by the agent node 210, or with one or more radio network nodes 230 controlled by the agent node 210.
  • the embodiment is illustrated in Figure 4 wherein the control message comprising the power control policy 430 is transmitted by the trainer node 400 to the agent node 210.
  • the agent node 210 may be further configured to receive, from the trainer node 400, the first control message further comprising an exploration-to-exploitation control parameter s associated with the power control policy 430. Further, the agent node 210 may be configured to determine whether to apply the power control policy 430 based on the exploration-to-exploitation control parameter e.
  • the optional exploration-to-exploitation control parameter e may indicate how often in average the power control policy 430 should be used compared to an alternative power control policy 430, such as, for instance, selecting a random action in the set ⁇ .
  • the advantage of this method is to allow the exploration of new states of the communication system 200 that would otherwise not be observed by the agent node 210 by using always the received power control policy 430.
  • the agent node 210 is further configured to determine a Training Data Message (TDM) 420 comprising at least an indication of: The state s t of the communication system 200 measured at a certain time t; The power control action a t taken by the agent node 210 at time t; The state of the communication system 200 s t+1 measured after the power control action; A measurement of the system performance r t+1 i.e. the reward associated with the new state of the communication system 200 and the power control action.
  • TDM Training Data Message
  • the training data message 420 may furthermore be associated with one or more radio cells 215, 235-1, 235-2, 235-3 or with one or more radio network nodes 210, 230, in the communication system 200. Thereby, the training data message 420 may provide observations of the state of the communication system 200 associated with the power control action taken by one or more radio cells 215, 235-1, 235-2, 235-3 or radio network nodes 210, 230.
  • This embodiment, illustrated in Figure 4 enables the trainer node 400 to efficiently adapt the power control policy 430 to changes in the communication system 200 so as to optimise the system spectral efficiency.
  • the training data message 420 may further carry a batch of training data, i.e. a set of quadruplets associated with T ⁇ l observations of the state
  • the radio communication system 200 may in certain implementation comprise a plurality of agent nodes 210, 500 wherein some agent nodes 210 have a communication interface with the trainer node 400 (e.g., the X2 interface of an LTE system, or the SI interface), whilst some agent nodes 210, 500 have a communication interface with other agent nodes 210, 500, see e.g. Figure 5.
  • the trainer node 400 e.g., the X2 interface of an LTE system, or the SI interface
  • agent nodes 210, 500 have a communication interface with other agent nodes 210, 500, see e.g. Figure 5.
  • Figure 5 illustrates a communication system 200 with multiple agent nodes 210, 500 sharing a single trainer node 400.
  • the agent node 210 when connected to the trainer node 400 with a communication interface may be further configured to: receive from at least one other agent node 500 at least one training data message 420. Further the agent node 210 may also be configured to forward the received at least one training data message 420 associated with the other agent node 500, to the trainer node 400. The agent node 210 may further be configured to store multiple training data messages 420 from one or more second agent nodes 500. Thereafter, a batch of training data messages 420 associated with one or more second agent nodes 500 can be forwarded to the trainer node 400 over the communication interface. Thereby overhead signalling is decreased, leading to enhanced spectral efficiency of the communication system 200.
  • the agent node 210 may further be configured to elaborate its own training data message 420 with one or more training data messages 420 received from one or more second agent nodes 500.
  • the information comprised in multiple training data messages 420 can be compressed and signalled with reduced signalling overhead, in some embodiments.
  • the agent node 210 when connected to the trainer node 400 with a communication interface, may further be configured to forward to the at least second agent node 500, a power control policy 430 received from the trainer node 400;
  • the agent node 210 when not connected to the trainer node 400 with the communication interface, may optionally be further configured to transmit to at least one other agent node 500 at least one training data message 420.
  • the second agent node 500 may or may not have a communication interface with the trainer node 400.
  • the power control policy 430 for the agent node 210, 500 may be determined by the trainer node 400 upon receiving at least one training data message 420.
  • the trainer node 400 may in some embodiments be configured to: receive from at least one other agent node 500 at least one training data message 420; determine the power control policy 430 associated with the at least one radio cell 215, 235-1, 235-2, 235-3 in the communication system 200 based on the at least one training data message 420.
  • the trainer node 400 may in some embodiments be configured to transmit, to the agent node 210, the control message comprising the power control policy 430 for downlink power control.
  • the one or more radio cells 215, 235- 1, 235-2, 235-3 associated with the power control policy 430 can either be controlled by the agent node 210 or by another radio network node 230, in turn controlled by the agent node 210 in different embodiments. In the latter case, the agent node 210 may transmit, to the radio network node 230, power control commands associated with the radio cell 235-1, 235- 2, 235-3 computed based on the power control policy 430 according to any of the previously discussed embodiments.
  • the power control policy 430 may additionally be associated with a group of more than one radio cell 215, 235-1, 235-2, 235-3 in some embodiments. In one example, the power control policy 430 may be associated with a group of three radio cells 235-1, 235-2, 235-3 co-located in a tri-sectorial radio network node 230.
  • the agent node 210 can either reside in said radio network node 230, or control said radio network node 230, in different embodiments.
  • the trainer node 400 may determine a new power control policy 430 based on the new training data comprised on the received training data message 420 and based on formerly received training data, which may be stored in the database 410 by the trainer node 400, to be available for future evaluations of the power control policy 430.
  • the training data stored by the trainer node 400 may be received from one or more agent nodes 210, 500 and thereby be associated with different radio cells 215, 235-1, 235-2, 235-3 or different radio network nodes 210, 230, 500 in the communication system 200.
  • the trainer node 400 may further be additionally configured to determine an exploration-to-exploitation control parameter e associated with the power control policy 430.
  • the exploration-to-exploitation control parameter e may set a probability i.e. between 0 and 1, to utilise the provided power control policy 430; or to use e.g. a random power control policy 430.
  • the trainer node 400 may be configured to transmit, to the agent node 210, a control message further comprising the exploration-to- exploitation control parameter e, in some embodiments.
  • the exploration-to-exploitation control parameter e regulates the utilisation of the power control policy 430 at the agent node 210.
  • exploration-to- exploitation control parameter e takes values in the interval e £ [0, 1] (with zero and one included in the interval).
  • a given value of the exploration-to-exploitation control parameter e may indicate how often in average the power control policy 430 should be used compared to an alternative power control policy 430, such as, for instance, selecting a random action in the set ⁇
  • a value e 0.2 may indicate to the agent node 210 to
  • the advantage of this method is to efficiently control the exploration of states of the communication system 200 that would otherwise not be observed by the agent node 210 and the trainer node 400.
  • the explored states of the communication system 200 are reported by the agent node 210 to the trainer node 400 via the Training Data Message (TDM) 420 according to previously discussed embodiments.
  • TDM Training Data Message
  • An efficient control of the exploration versus exploitation trade-off may be an advantage as exploring the states of the communication system 200 at random may lead to degradation of the system spectral efficiency.
  • the exploration-to- exploitation control parameter e may gradually be reduced over time according to a predefined algorithm so as gradually reduce exploration and increase exploitation of the power control policy 430.
  • the exploration-to-exploitation control parameter updated every time an action is selected may be computed as:
  • the trainer node 400 can determine the optimal power control policy 430 based on a Reinforcement Learning (RL) algorithm.
  • the reinforcement learning algorithm solves the problem of associating the experienced reward to the control actions that, taken in a given state of the system 200, lead to that reward.
  • the power control policy 430 resulting from a reinforcement learning algorithm, maps a given system state to the action to be taken (among the available set of actions) in order to maximise the cumulative reward.
  • Some of the most popular methods in RL are critic-only methods. They are based on the idea of finding an optimal value function and then deriving a policy from it. Possibly the most well-known of the critic-only algorithms is that of Q-learning.
  • a Q-value function is a prediction of future reward, more precisely the Q-value function tries to learn "how much total reward can I expect from taking action a in state s and following the policy ⁇ ".
  • the difference may be calculated as the discrepancy between the Q-value predicted by the Q-value function at time step t, that is and the actual reward plus discounted Q- value at time step
  • Figure 6 is a flow chart illustrating embodiments of a method 600 in an agent node 210 for downlink power control of at least one radio cell 215, 235-1, 235-2, 235-3 of a communication system 200.
  • the radio cell 215, 235-1, 235-2, 235-3 is controlled by the agent node 210 or in some embodiments by another agent node 500, or alternatively by a network node 230.
  • the agent node 210 may be co-located with a trainer node 400 and/ or the network node 230 controlling the at least one radio cell 215, 235-1, 235-2, 235-3 in some embodiments.
  • the agent node 210 may be situated at a distance from the trainer node 400 and/ or the network node 230.
  • any, some or all of the described steps 601-612 may be performed in a somewhat different chronological order than the enumeration indicates, be performed simultaneously or even be performed in a completely reversed order according to different embodiments.
  • Some actions such as e.g. step 602-603, 607-612 may be performed within some, but not necessarily all embodiments.
  • some actions may be performed in a plurality of alternative manners according to different embodiments, and that some such alternative manners may be performed only within some, but not necessarily all embodiments.
  • the agent node 210 may in some embodiments periodically re-perform any, some or all of step 601-612, thereby enabling application of a new power control policy 430 according to some embodiments.
  • the method 600 may comprise the following steps:
  • Step 601 comprises obtaining a power control policy 430.
  • the power control policy 430 may be received from the trainer node 400.
  • the radio network node 500 controlling the radio cell 235-1, 235-2, 235-3 of the downlink power control is another agent node 500
  • the power control policy 430 to be utilised for downlink power control in the radio cell 235-1, 235-2, 235-3 of the other agent node 500 in the communication system 200, obtained from the trainer node 400, may be forwarded to the other agent node 500.
  • the power control policy 430 may be obtained iteratively in some embodiments.
  • the obtained power control policy 430 may be represented by one or more of: an indication of a neural network architecture, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks, size of a neural network configured for power control, in some embodiments.
  • the obtained power control policy 430 may in some embodiments be represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non- consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
  • Step 602 which only may be comprised in some embodiments, may comprise selecting which at least one feature to utilise for representing the state of at least a part of the communication system 200.
  • the feature representing the state of at least a part of the communication system 200 may be selected based on any, some or a combination of: a measurement related to received signal quality made by and received from a user device 220 in the radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to received signal quality made by and received from a user device 220 in another radio cell 215, 235-1, 235-2, 235-3; a measurement related to downlink transmission power of the radio cell 215, 235-1 , 235-2, 235-3 made by and obtained from the radio network node 210, 230, 500 controlling the radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to a number of active user devices 220 in the radio cell 215, 235-1, 235-2, 235-3; a measurement related to types or distribution of traffic within the radio cell 215, 235-1, 235-2, 235-3; a measurement related to location or distribution of user devices 220 in the radio cell 215, 235-1, 235-2, 235-3; or
  • Step 604 comprises determining at least one feature representing a state of at least a part of the communication system 200, at a first time period.
  • the part of the communication system 200 may be the part of the communication system 200 wherein the radio cell 215, 235-1, 235-2, 235-3 for which downlink power control is to be performed, is situated.
  • the determination of said feature may in some embodiments comprise iterating the determination of the feature representing the state of at least the part of the communication system 200.
  • the feature representing the state of at least a part of the communication system 200 wherein the radio cell 215, 235-1 , 235-2, 235-3 may be situated, may be determined based on any, some or a combination of: a measurement related to received signal quality made by and received from a user device 220 in the radio cell 215, 235-1, 235-2, 235-3; a measurement related to received signal quality made by and received from a user device 220 in another radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to downlink transmission power of the radio cell 215, 235-1 , 235-2, 235-3 made by and obtained from the radio network node 210, 230, 500 controlling the radio cell 215, 235-1, 235-2, 235-3; a measurement related to a number of active user devices 220 in the radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to types or distribution of traffic within the radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to location or distribution
  • Step 605 comprises determining a power control action to be performed for downlink power control in the radio cell 215, 235-1 , 235-2, 235-3 at the first time period, out of a set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3, based on the obtained 601 power control policy 430.
  • the determination of the power control action may be iterated according to some embodiments.
  • the set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3 may be adjusted according to some embodiments, based on the determined 604 at least one feature representing the state of the part of the communication system 200 wherein the radio cell 215, 235-1, 235-2, 235-3 may be comprised, or based on the power control policy 430 received by the trainer node 400.
  • Step 606 comprises configuring a downlink transmission power instruction of the radio cell 215, 235-1, 235-2, 235-3 based on the determined 605 power control action.
  • the configuration of the downlink transmission power instruction may be iterated according to some embodiments.
  • Step 607 which only may be comprised in some embodiments wherein the radio network node 230 controlling the radio cell 235-1, 235-2, 235-3 is not co-located with the agent node 210, may comprise transmitting the configured 606 downlink transmission power instruction of the radio cell 235-1, 235-2, 235-3 to the radio network node 230 for downlink power control of the radio cell 235-1, 235-2, 235-3 of the radio network node 230.
  • Step 608 which only may be comprised in some embodiments, may comprise determining the feature representing the state of the part of the communication system 200 wherein the radio cell 215, 235-1, 235-2, 235-3 may be situated, at a second time period.
  • the feature representing the state of the part of the communication system 200 may typically be the same as previously determined 604 at the first time period, enumerated in step 604.
  • Step 609 which only may be comprised in some embodiments, may comprise determining a performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3, or within the communication system 200 or a subset thereof.
  • the performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3 may in some embodiments be determined, given the feature representing the state at the first time period t and the power control action a t taken at the first time period t.
  • the performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3 may in some embodiments be determined by computing a weighted sum of scalars G X parameterised by a scalar coefficient and transformed by a function (with domain X and range of real scalars),
  • radio measurement or a performance indicator associated with the radio cell is the set of all radio measurements or performance indicators associated with the radio cell and used for the definition
  • Step 610 which only may be comprised in some embodiments wherein step 609 has been performed, may comprise computing a performance measurement associated with at least a part of the communication system 200 wherein the radio cell 215, 235-1 , 235-2, 235-3 may be situated, based on the determined 609 performance measurement and a network performance measurement received from another radio network node 230, 500 in the communication system 200.
  • Step 611 which only may be comprised in some embodiments, may comprise transmitting a training data message 420 to the trainer node 400, comprising one or more in the group of: the determined 604 feature representing the state at the first time period, the determined 605 power control action performed at the first time period, the determined 608 feature representing the state at the second time period, and the determined 609 performance measurement.
  • Step 612 which only may be comprised in some embodiments wherein the obtained 601 power control policy 430 further comprises an exploration-to-exploitation control parameter, associated with a probability of applying the power control policy 430, may comprise determining application of the obtained 601 power control policy 430, based on the obtained 601 exploration-to-exploitation control parameter.
  • any, some or all method steps 601 -612 may be iterated infinitely, for a limited period of time, or until a threshold limit is achieved.
  • Figure 7 illustrates an embodiment of an agent node 210, 500 for downlink power control of a radio cell 215, 235-1, 235-2, 235-3 of a communication system 200.
  • the agent node 210, 500 is configured to perform the method 600 according to any, some, all, or at least one of the enumerated method steps 601 -612, according to some embodiments.
  • the agent node 210 is thus configured to obtain a power control policy 430. Further, the agent node 210 is configured to determine at least one feature representing a state of at least a part of the communication system 200, e.g. wherein the radio cell 215, 235-1, 235-2, 235- 3 is situated, at a first time period. In addition, the agent node 210 is configured to determine a power control action to be performed for downlink power control in the radio cell 215, 235-1, 235-2, 235-3 at the first time period, out of a set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3, based on the obtained power control policy 430 and the determined at least one feature.
  • the agent node 210 is also configured to configure a downlink transmission power instruction of the radio cell 215, 235-1, 235-2, 235-3 based on the determined power control action. Furthermore, in some embodiments, the agent node 210, 500 may be further configured to determine a feature representing the state of the part of the communication system 200, at a second time period. The part of the communication system 200 may be the part wherein the radio cell 215, 235-1, 235-2, 235-3 is situated. The agent node 210, 500 may also be configured to determine a performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3.
  • the agent node 210, 500 may be configured to select which at least one feature to utilise for representing the state of at least a part of the communication system 200. Further, the agent node 210, 500 may also be configured to select which performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3 to utilise for representing the performance of the radio cell 215, 235-1 , 235-2, 235-3.
  • the agent node 210, 500 may be configured to transmit the configured downlink transmission power instruction of the radio cell 235-1, 235-2, 235- 3 to the radio network node 230 for downlink power control of the radio cell 235-1, 235-2, 235-3 of the radio network node 230, when the radio network node 230 controlling the radio cell 235-1, 235-2, 235-3 is not co-located with the agent node 210.
  • the agent node 210, 500 may also be configured to determine the feature representing the state of at least a part of the communication system 200, e.g. where the radio cell 215, 235-1 , 235-2, 235-3 is situated, based on any of: a measurement related to received signal quality made by and received from a user device 220 in the radio cell 215, 235-1, 235-2, 235-3; a measurement related to received signal quality made by and received from a user device 220 in another radio cell 215, 235-1, 235-2, 235-3; a measurement related to downlink transmission power of the radio cell 215, 235-1, 235-2, 235-3 made by and obtained from the radio network node 210, 230, 500 controlling the radio cell 215, 235- 1, 235-2, 235-3 ; a measurement related to a number of active user devices 220 in the radio cell 215, 235-1, 235-2, 235-3; a measurement related to types, or distribution, of traffic within the radio cell 215, 235-1, 235-2, 235-3
  • the agent node 210, 500 may be configured to compute a performance measurement associated with at least a part of the communication system 200, e.g. wherein the radio cell 215, 235-1, 235-2, 235-3 is comprised, based on the determined performance measurement and at least one other network performance measurement received from another radio network node 230, 500 in the communication system 200. Further, the agent node 210, 500 may also be configured to transmit the training data message 420 to the trainer node 400 comprising the computed performance measurement, in some embodiments. In addition, the agent node 210, 500 may be configured to obtain an exploration-to- exploitation control parameter, associated with a probability of applying the determined power control policy 430.
  • the agent node 210 may be configured to transmit the training data message 420 to the trainer node 400 comprising a training data message 420 received from the other agent node 500.
  • the agent node 210, 500 may also be further configured to forward the power control policy 430 to be utilised for downlink power control in the radio cell 235-1, 235-2, 235-3 of the other agent node 500 in the communication system 200, received from the trainer node 400, to the other agent node 500.
  • the agent node 210, 500 may be configured to iterate the determination of the feature representing the state of at least a part of the communication system 200; the determination of the power control action; the configuration of the downlink transmission power instruction; the determination of the performance measurement; the transmission of the training data message 420, or a plurality of training data messages 420, to the trainer node 400 and/ or the obtaining of the power control policy 430.
  • the agent node 210, 500 may in addition be configured to adjust the set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3, based on the determined at least one feature representing the state of the part of the communication system 200, or based on the obtained power control policy 430.
  • the agent node 210, 500 may be configured to obtain the power control policy 430, represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which may comprise an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non-consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks.
  • the power control policy 430 may be represented by an indication of a neural network architecture, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks, size of a neural network configured for power control.
  • the agent node 210, 500 may be configured, in some embodiments, to obtain the power control policy 430, comprising an adjustment of the set of available power control actions associated with the radio cell 215, 235-1 , 235-2, 235-3; an indicator configuring a combining method for the agent node 210, 500 to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy 430 when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy 430 based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy 430 based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain cri
  • the agent node 210, 500 may in some embodiments be configured to determine the performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3, given the feature representing the state of the communication system 200, or a subset thereof, at the first time period t and the power control action a t taken at the first time period t.
  • the agent node 210, 500 may be additionally configured to determine the performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3 by computing a weighted sum of scalars x t G X, parameterised by a scalar coefficient and transformed by a function with domain Xand
  • x t represents a radio measurement or a performance indicator associated with the radio cell 215, 235-1 , 235-2, 235-3
  • X is the set of all radio measurements or performance indicators associated with the radio cell 215, 235-1, 235-2, 235-3 and used for the definition of the performance measurement, is a weight associated with and x is a
  • the agent node 210 comprises a receiver 710, configured for receiving e.g. signal strength/ quality measurements from one or more user devices 220, for receiving e.g. signal strength/ quality measurements or other information from one or more radio network nodes 230; or for receiving e.g. the power control policy 430 from the trainer node 400.
  • the agent node 210 comprises a processor 720, configured for downlink power control of the radio cell 215, 235-1 , 235-2, 235-3 in the communication system 200, by performing at least some steps 601-612 of the described method 600.
  • Such processor 720 may comprise one or more instances of a processing circuit, i.e. a Central Processing Unit (CPU), a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC), a microprocessor, or other processing logic that may interpret and execute instructions.
  • a processing circuit i.e. a Central Processing Unit (CPU), a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC), a microprocessor, or other processing logic that may interpret and execute instructions.
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • microprocessor may thus represent a processing circuitry comprising a plurality of processing circuits, such as, e.g., any, some or all of the ones enumerated above.
  • agent node 210 may in some embodiments comprise a transmitter 730, configured for transmitting various signals, to be received by the user device 220, radio network node 230, other agent node 500 and/ or trainer node 400.
  • the agent node 210 may comprise at least one memory 725, according to some embodiments.
  • the optional memory 725 may comprise a physical device utilised to store data or programs, i.e., sequences of instructions, on a temporary or permanent basis.
  • the memory 725 may comprise integrated circuits comprising silicon-based transistors. Further, the memory 725 may be volatile or nonvolatile.
  • At least a sub-set of the previously described method steps 601-612 to be performed in the agent node 210 may be implemented through the one or more processing circuits 720 in the agent node 210, together with a computer program product for performing the functions of at least some of the method steps 601-612.
  • a computer program product comprising instructions for performing the method steps 601-612 may perform downlink power control of the radio cell 215, 235-1, 235-2, 235-3 in the communication system 200, when the computer program is loaded into the processor 720 of the agent node 210.
  • the computer program mentioned above may be provided for instance in the form of a data carrier carrying computer program code for performing at least some of the method steps 601 -612 according to some embodiments when being loaded into the processor 720.
  • the data carrier may be, e.g., a hard disk, a CD ROM disc, a memory stick, an optical storage device, a magnetic storage device or any other appropriate medium such as a disk or tape that may hold machine readable data in a non-transitory manner.
  • the computer program product may furthermore be provided as computer program code on a server and downloaded to the agent node 210 remotely, e.g., over an Internet or an intranet connection.
  • Figure 8 is a flow chart illustrating embodiments of a method 800 in a trainer node 400 for determining a power control policy 430 to be utilised by an agent node 210, 500 for downlink power control of a radio cell 215, 235-1, 235-2, 235-3 of a communication system 200.
  • the radio cell 215, 235-1, 235-2, 235-3 is controlled by the agent node 210 or in some embodiments by another agent node 500, or alternatively by a network node 230, which in turn is controlled by the agent node 210, 500.
  • the agent node 210 may be co-located with a trainer node 400 and/ or the network node 230 controlling the radio cell 215, 235-1, 235-2, 235-3 in some embodiments. Alternatively, the agent node 210 may be situated at a distance from the trainer node 400 and/ or the network node 230.
  • the power control policy 430 may be represented by one or more of: an indication of a neural network architecture, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks, size of a neural network configured for power control, in some embodiments.
  • the obtained power control policy 430 may further comprise: an adjustment of the set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3 ; an indicator configuring a combining method for the agent node 210, 500 to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy 430 when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy 430 based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy 430 based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain criterion; at least one indication of the stopping criteria to determine
  • the method 800 may comprise a number of steps 801-805.
  • any, some or all of the described steps 801-805 may be performed in a somewhat different chronological order than the enumeration indicates, be performed simultaneously or even be performed in a completely reversed order according to different embodiments.
  • Some actions such as e.g. step 804 may be performed within some, but not necessarily all embodiments.
  • some actions may be performed in a plurality of alternative manners according to different embodiments, and that some such alternative manners may be performed only within some, but not necessarily all embodiments.
  • the trainer node 400 may in some embodiments periodically re-perform any, some or all of step 801-805, thereby providing a continuously updated power control policy 430 according to some embodiments.
  • the method 800 may comprise the following steps:
  • Step 801 comprises receiving a training data message 420, associated with the radio cell 215, 235-1, 235-2, 235-3, from the agent node 210, 500, wherein the training data message 420 comprises one or more in the group of: a feature representing a state of at least a part of the communication system 200 at a first time period, a power control action performed by the agent node 210, 500 in the radio cell 215, 235-1, 235-2, 235-3 at the first time period, a feature representing the state at the second time period, and a performance measurement.
  • the performance measurement may have been made at the first time period in some embodiments and at the second time period in some embodiments.
  • a conjunct performance measurement may be computed, associated with at least a part of the communication system 200, based on the performance measurement received in the training data message 420 received from the agent node 210, 500 and another performance measurement received from another radio network node 230, 500 in the communication system 200.
  • Step 802 comprises storing the received 801 training data message 420 in a database 410, associated with the radio cell 215, 235-1 , 235-2, 235-3.
  • Step 803 comprises determining the power control policy 430 for the radio cell 215, 235-1, 235-2, 235-3, based on at least one training data message 420, stored 802 in the database 410.
  • Step 804 which only may be comprised in some embodiments, may comprise determining an exploration-to-exploitation control parameter, associated with the determined 803 power control policy 430.
  • Step 805 comprises transmitting the determined 803 power control policy 430 to the agent node 210, 500.
  • step 804 may comprise transmitting the determined 804 exploration-to-exploitation control parameter to the agent node 210, 500 together with the determined 803 power control policy 430.
  • the method 800 further may comprise selecting which at least one feature the agent node 210, 500 is to utilise for representing the state of at least a part of the communication system 200 wherein the radio cell 215, 235-1, 235-2, 235-3 is comprised and sending the made selection to the agent node 210, 500.
  • the method 800 may further comprise selecting which performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3 the agent node 210, 500 is to utilise for representing the performance of the radio cell 215, 235-1, 235-2, 235-3 and sending the made selection to the agent node 210, 500, in some embodiments.
  • any, some or all method steps 801 -805 may be iterated infinitely, for a limited period of time, or until a threshold limit is achieved.
  • Figure 9 illustrates an embodiment of a trainer node 400 for determining a power control policy 430 to be utilised by an agent node 210, 500 for downlink power control of a radio cell 215, 235-1, 235-2, 235-3 of a communication system 200.
  • the trainer node 400 is configured to perform the method 800 according to any, some, all, or at least one of the enumerated method steps 801-805, according to some embodiments.
  • the trainer node 400 is thus configured to receive a training data message 420, associated with the radio cell 215, 235-1, 235-2, 235-3, from the agent node 210, 500, wherein the training data message 420 comprises one or more in the group of: a feature representing a state of at least a part of the communication system 200 at a first time period, a power control action performed by the agent node 210, 500 in the radio cell 215, 235-1, 235-2, 235-3 at the first time period, a feature representing the state at the second time period, and a performance measurement.
  • the trainer node 400 is also configured to store the received training data message 420 in a database 410 associated with the radio cell 215, 235-1, 235- 2, 235-3.
  • the trainer node 400 is configured to determine the power control policy 430 for the radio cell 215, 235-1, 235-2, 235-3, based on at least one training data message 420, stored in the database 410. In addition, the trainer node 400 is further configured to transmit the determined power control policy 430 to the agent node 210, 500. In some embodiments, the trainer node 400 may be configured to determine an exploration- to-exploitation control parameter, associated with a probability of applying the determined power control policy 430, e.g. based on a time period parameter in some embodiments; and wherein the determined exploration-to-exploitation control parameter is transmitted to the agent node 210, 500 together with the determined power control policy 430.
  • the trainer node 400 may be configured to determine the exploration-to- exploitation control parameter so that the probability of applying the determined power control policy 430 is increased over time. Also, the trainer node 400 may be further configured to compute a performance measurement associated with at least a part of the communication system 200, based on the performance measurement received in the training data message 420 received from the agent node 210, 500 and another performance measurement received from another radio network node 230, 500 in the communication system 200.
  • trainer node 400 may be configured to select which at least one feature the agent node 210, 500 is to utilise for representing the state of at least a part of the communication system 200 and provide the made selection to the agent node 210, 500.
  • the trainer node 400 comprises a receiver 910, configured for receiving e.g. the training data message 420, from one or more agent nodes 210, 500.
  • the trainer node 400 comprises a processor 920, configured for determining a power control policy 430 to be utilised by an agent node 210, 500 for downlink power control of a radio cell 215, 235-1, 235-2, 235-3 of a communication system 200, by performing at least some steps 801-805 of the described method 800.
  • Such processor 920 may comprise one or more instances of a processing circuit, i.e. a Central Processing Unit (CPU), a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC), a microprocessor, or other processing logic that may interpret and execute instructions.
  • a processing circuit i.e. a Central Processing Unit (CPU), a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC), a microprocessor, or other processing logic that may interpret and execute instructions.
  • the herein utilised expression "processor” may thus represent a processing circuitry comprising a plurality of processing circuits, such as, e.g., any, some or all of the ones enumerated above.
  • the trainer node 400 may in some embodiments comprise a transmitter 930, configured for transmitting various signals and instructions, e.g. comprising the determined power control policy 430, to be received by the agent node 210, 500; or possibly another training node.
  • the trainer node 400 may comprise at least one memory 925, according to some embodiments.
  • the optional memory 925 may comprise a physical device utilised to store data or programs, i.e., sequences of instructions, on a temporary or permanent basis.
  • the memory 925 may comprise integrated circuits comprising silicon-based transistors. Further, the memory 925 may be volatile or nonvolatile.
  • At least a sub-set of the previously described method steps 801-805 to be performed in the trainer node 400 may be implemented through the one or more processing circuits 920 in the trainer node 400, together with a computer program product for performing the functions of at least some of the method steps 801 -805.
  • a computer program product comprising instructions for performing the method steps 801 -805 may determine the power control policy 430 to be utilised by the agent node 210, 500 for downlink power control of the radio cell 215, 235-1, 235-2, 235-3 in the communication system 200.
  • the computer program mentioned above may be provided for instance in the form of a data carrier carrying computer program code for performing at least some of the method steps 801 -805 according to some embodiments when being loaded into the processor 920.
  • the data carrier may be, e.g., a hard disk, a CD ROM disc, a memory stick, an optical storage device, a magnetic storage device or any other appropriate medium such as a disk or tape that may hold machine readable data in a non-transitory manner.
  • the computer program product may furthermore be provided as computer program code on a server and downloaded to the trainer node 400 remotely, e.g., over an Internet or an intranet connection.
  • the term “and/ or” comprises any and all combinations of one or more of the associated listed items.
  • the term “or” as used herein, is to be interpreted as a mathematical OR, i.e., as an inclusive disjunction; not as a mathematical exclusive OR (XOR), unless expressly stated otherwise.
  • the singular forms “a”, “an” and “the” are to be interpreted as “at least one”, thus also possibly comprising a plurality of entities of the same kind, unless expressly stated otherwise.
  • a computer program may be stored/ distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms such as via Internet or other wired or wireless communication system.

Abstract

Agent node (210, 500) and method (600) therein, for downlink power control of a radio cell (215) of a communication system (200). The agent node (210, 500) is configured to obtain a power control policy (430); determine a feature representing a state of at least a part of the communication system (200), at a first time period; determine a power control action to be performed for downlink power control in the radio cell (215) at the first time period, out of a set of available power control actions associated with the radio cell (215), based on the obtained power control policy (430) and the determined at least one feature; and configure a downlink transmission power instruction of the radio cell (215) based on the determined power control action. Also, a corresponding trainer node (400) and method (800) for providing the agent node (210, 500) with the power control policy (430), based on received parameters is disclosed.

Description

METHOD AND DEVICE IN A WIRELESS COMMUNICATION NETWORK FOR DOWNLINK POWER CONTROL
TECHNICAL FIELD
Implementations described herein generally relate to an agent node, a trainer node and methods therein. In particular is herein described a mechanism for downlink power control by the agent node, of a radio cell of a radio network node of a communication system, based on an obtained power control policy. BACKGROUND
Radio interference is a major cause of performance degradation in wireless radio systems. To mitigate radio interference and aid performance, state-of-the-art radio cellular systems have adopted Inter-Cell Interference Coordination (ICIC) schemes. In the related art Long Term Evolution (LTE) system, for instance, two forms of ICIC are supported: frequency domain ICIC (adopted in LTE Rel. 8-9); and time domain ICIC (adopted from LTE Rel. 10 and beyond).
Frequency domain ICIC relates to the usage of radio resources in the frequency domain and/ or power adaptation. Current methods, showed in Figure 1 include:
Full frequency reuse, (the basic operating mode of the LTE system) in which each base station uses the entire frequency spectrum with uniform power distributed across the system bandwidth, thereby creating strong interference to cell edge users. Hard frequency reuse, (used in the related art GSM and LTE Rel 8-9) in which each base station operates in one out of a set of non-overlapping portions of the available frequency spectrum in such a way that neighbouring base stations do not use the same set of frequencies. GSM is an abbreviation for Global System for Mobile Communications (originally: Groupe Special Mobile). While this minimises the interference at the cell-edge, the overall spectral efficiency is reduced by a factor equal to the reuse factor.
Fractional frequency reuse in which the available frequency spectrum is divided into two portions: a portion common to all base stations used for scheduling cell-centre users; and a second portion that is further divided among base stations in a hard frequency reuse manner and used to schedule transmission to/ from cell-edge users.
Soft frequency reuse enables base stations to transmit in the entire frequency spectrum with different power levels: higher transmission power in the portion of the spectrum where cell- edge users are scheduled; lower transmission power in the portion of spectrum where cell- centre users are scheduled.
Time domain ICIC consists in periodically muting the transmission of a base station in certain time- frequency resources to enable a further base station to serve mobile stations suffering severe interference in the muted radio resources. The related art LTE system introduced Almost Blank Subframes (ABS), i.e., downlink subframes where only the necessary signals to avoid radio link failure or to maintain backward compatibility are transmitted, including common reference signals (except subframes configured as Multicast-Broadcast Single-Frequency Network (MBSFN), Primary and Secondary Synchronisation Signals (PSS/ SSS), Physical Broadcast Channel (PBCH), System Information BlockType 1 (SIB-1) and paging with their associated Physical Downlink Control Channel (PDCCH).
Time domain muting patters are configured semi-statically by means of bitmaps of length 40, i.e. spanning up to four radio frames, signalled between eNodeBs over the X2 interface. Mobile stations in a victim cell are then categorised into two groups: Mobile stations affected by interference from a cell using ABS, which shall preferably be scheduled in correspondence of a muted subframe from said cell; and mobile stations that are not affected by the interference produced by a neighbouring cell using ABS, which can be scheduled freely in any subframe.
The above categorisation is done by comparing Channel State Information (CSI) feedback from mobile stations/ user devices in muted and non-muted subframes of a neighbouring cell.
In the examples of frequency-domain ICIC schemes in Figure 1, the frequency spectrum is on the horizontal axis while a representation of transmission power is shown on the vertical axis. ABS, adopted in LTE Rel-10 to mitigate interference for cell-edge users, comprise time- domain muting patterns of data transmission in downlink subframes. The muting pattern of an aggressor cell (typically a macro base station) is signalled over the X2 interface to a neighbouring victim cell (typically pico base stations within the macro-cell coverage area), so that the latter can schedule users suffering strong interference from the aggressor cell in ABS subframes of the aggressor cell. User devices in the coverage are of the victim cell are configured to perform CSI measurements in correspondence of ABS and non-ABS resources to enable the serving cell determine whether the user device is affected by strong interference from the aggressor cell. The time-domain muting patterns and the scheduling decisions, however, are independently determined by the aggressor cell and the victim cell respectively.
It can be seen that there is room for improvement for downlink power control in a network. SUMMARY
It is therefore an objective to obviate at least some of the above mentioned disadvantages and to improve downlink power control of a radio cell.
This and other objectives are achieved by the features of the appended independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
According to a first aspect, an agent node for downlink power control of a radio cell of a communication system is provided. The agent node is configured to obtain a power control policy. Further the agent node is configured to determine at least one feature representing a state of at least a part of the communication system, at a first time period. The agent node is also configured to determine a power control action to be performed for downlink power control in the radio cell at the first time period, out of a set of available power control actions associated with the radio cell, based on the obtained power control policy and the determined at least one feature. Also the agent node is configured to configure a downlink transmission power instruction of the radio cell based on the determined power control action.
By autonomously learning different downlink power control strategies using measurements collected from an algorithmic interaction with the radio environment, an appropriate adjustment of the downlink transmission power may be made. Thus, downlink power control based on learning the network environmental performance is provided, which may adapt to changes in radio environment conditions without manual adjustment. Thereby downlink power control may be better managed, leading to increased network performance. Additionally, compared with traditional power control algorithms, the proposed concept has the advantage that reduced signalling overhead is required to learn different downlink power control strategies that can autonomously adapt to changes in the radio environment, users and traffic types.
According to a first possible implementation of the first aspect, the agent node may be further configured to determine a feature representing the state of the part of the communication system, at a second time period. Also, the agent node may be configured to determine a performance measurement, associated with the performance within the radio cell. Furthermore, the agent node may be configured to transmit a training data message to a trainer node, comprising one or more in the group of: the determined feature representing the state at the first time period, the determined power control action performed at the first time period, the determined feature representing the state at the second time period, and the determined performance measurement. Further the obtained power control policy is received from the trainer node.
According to a second possible implementation of the first aspect, or the first possible implementation thereof, the agent node may be further configured to select which at least one feature to utilise for representing the state of at least a part of the communication system. Also the agent node may be configured to select which performance measurement associated with the radio cell to utilise for representing the performance of the radio cell.
According to a third possible implementation of the first aspect, or any previously described implementation thereof, wherein a radio network node controlling the radio cell is not co- located with the agent node, the agent node may be further configured to transmit the configured downlink transmission power instruction of the radio cell to the radio network node for downlink power control of the radio cell of the radio network node.
According to a fourth possible implementation of the first aspect, or any previously described implementation thereof, wherein the feature representing the state of at least a part of the communication system, is determined based on any of: a measurement related to received signal quality made by and received from a user device in the radio cell; a measurement related to received signal quality made by and received from a user device in another radio cell; a measurement related to downlink transmission power of the radio cell made by and obtained from the radio network node controlling the radio cell; a measurement related to a number of active user devices in the radio cell; a measurement related to types, or distribution, of traffic within the radio cell; a measurement related to location, or distribution, of user devices in the radio cell; or a performance measurement, associated with the performance within the radio cell.
According to a fifth possible implementation of the first aspect, or any previously described implementation thereof, the agent node may be further configured to compute a performance measurement associated with at least a part of the communication system, based on the determined performance measurement and at least one other network performance measurement received from another radio network node in the communication system. Also, the agent node may be configured to transmit the training data message to the trainer node comprising the computed performance measurement. According to a sixth possible implementation of the first aspect, or any previously described implementation thereof, wherein the obtained power control policy further comprises an exploration-to-exploitation control parameter, associated with a probability of applying the determined power control policy, the agent node may be further configured to determine application of the obtained power control policy, based on the obtained exploration-to- exploitation control parameter.
According to a seventh possible implementation of the first aspect, or any previously described implementation thereof, wherein a radio network node controlling the radio cell is another agent node, the training data message transmitted to the trainer node comprises a training data message received from the other agent node; the agent node may be further configured to forward the power control policy to be utilised for downlink power control in the radio cell of the other agent node in the communication system, received from the trainer node, to the other agent node. According to an eighth possible implementation of the first aspect, or any previously described implementation thereof, the agent node may be further configured to iterate the determination of the feature representing the state of at least a part of the communication system; the determination of the power control action; the configuration of the downlink transmission power instruction; the determination of the performance measurement; the transmission of the training data message, or a plurality of training data messages, to the trainer node and the obtaining of the power control policy.
According to a ninth possible implementation of the first aspect, or any previously described implementation thereof, the agent node may be further configured to adjust the set of available power control actions associated with the radio cell, based on the determined at least one feature representing the state of the part of the communication system, or based on the obtained power control policy. According to a tenth possible implementation of the first aspect, or any previously described implementation thereof, the obtained power control policy may be represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
According to an eleventh possible implementation of the first aspect, or any previously described implementation thereof, the obtained power control policy further may comprise an adjustment of the set of available power control actions associated with the radio cell; an indicator configuring a combining method for the agent node to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain criterion; at least one indication of the stopping criteria to determine the depth of the decision forest.
According to a twelfth possible implementation of the first aspect, or any previously described implementation thereof, the agent node may be further configured to determine the performance measurement associated with the radio cell, given the feature representing the state at the first time period t and the power control action at taken at the first time period t.
According to a thirteenth possible implementation of the first aspect, or any previously described implementation thereof, the agent node may be further configured to determine the performance measurement, associated with the performance within the radio cell by computing a weighted sum of scalars
Figure imgf000008_0004
parameterised by a scalar coefficient a G
[0, oo) and transformed by a function (with domain Xand range of real scalars),
Figure imgf000008_0003
Figure imgf000008_0001
where xt represents a radio measurement or a performance indicator associated with the radio cell, X is the set of all radio measurements or performance indicators associated with the radio cell and used for the definition of the performance measurement, is a weight
Figure imgf000008_0006
associated with χί; and is a vector comprising all
Figure imgf000008_0002
Figure imgf000008_0005
According to a second aspect, a method in an agent node according to the first aspect, or any possible implementations thereof, for downlink power control of a radio cell of a communication system, is provided. The method comprises obtaining a power control policy. Further the method may comprise determining at least one feature representing a state of at least a part of the communication system, at a first time period. In addition, the method also may comprise determining a power control action to be performed for downlink power control in the radio cell at a first time period, out of a set of available power control actions associated with the radio cell, based on the obtained power control policy. Also the method in addition comprises configuring a downlink transmission power instruction of the radio cell based on the determined power control action.
According to a first possible implementation of the second aspect, the method may further comprise determining a feature representing the state of the part of the communication system at a second time period. Also, the method may further comprise determining a performance measurement, associated with the performance within the radio cell. In addition, the method may also comprise transmitting a training data message to the trainer node, comprising one or more in the group of: the determined feature representing the state at the first time period, the determined power control action performed at the first time period, the determined feature representing the state at the second time period, and the determined performance measurement. The method may also comprise receiving the obtained power control policy from the trainer node.
According to a second possible implementation of the second aspect, or the first possible implementation thereof, the method further may comprise selecting which at least one feature to utilise for representing the state of at least a part of the communication system. Also, the method comprises selecting which performance measurement associated with the radio cell to utilise for representing the performance of the radio cell.
According to a third possible implementation of the second aspect, or the first possible implementation thereof, wherein the radio network node controlling the radio cell is not co- located with the agent node, the method further may comprise transmitting the configured downlink transmission power instruction of the radio cell to the radio network node for downlink power control of the radio cell of the radio network node.
According to a fourth possible implementation of the second aspect, or any previously described implementation thereof, wherein the feature representing the state of at least a part of the communication system, is determined based on any of: a measurement related to received signal quality made by and received from a user device in the radio cell; a measurement related to received signal quality made by and received from a user device in another radio cell; a measurement related to downlink transmission power of the radio cell made by and obtained from the radio network node controlling the radio cell; a measurement related to a number of active user devices in the radio cell; a measurement related to types or distribution of traffic within the radio cell; a measurement related to location or distribution of user devices in the radio cell; or a performance measurement, associated with the performance within the radio cell.
According to a fifth possible implementation of the second aspect, or any previously described implementation thereof, the method further may comprise computing a performance measurement associated with at least a part of the communication system, based on the determined performance measurement and a network performance measurement received from another radio network node in the communication system; and wherein the training data message transmitted to the trainer node comprises the computed performance measurement. According to a sixth possible implementation of the second aspect, or any previously described implementation thereof, the method further may comprise obtaining an exploration-to-exploitation control parameter, associated with a probability of applying the power control policy. The method further comprises determining application of the obtained power control policy, based on the obtained exploration-to-exploitation control parameter.
According to a seventh possible implementation of the second aspect, or any previously described implementation thereof, wherein the radio network node controlling the radio cell of the downlink power control is another agent node, the training data message transmitted to the trainer node comprises a training data message received from the other agent node; and wherein the power control policy to be utilised for downlink power control in the radio cell of the other agent node in the communication system, obtained from the trainer node, is forwarded to the other agent node.
According to an eighth possible implementation of the second aspect, or any previously described implementation thereof, the method comprises iterating the determination of the feature representing the state of at least a part of the communication system; the determination of the power control action; the configuration of the downlink transmission power instruction; the determination of the performance measurement; the transmission of the training data message, or a plurality of training data messages, to the trainer node and the obtaining of the power control policy.
According to a ninth possible implementation of the second aspect, or any previously described implementation thereof, the method may comprise adjusting the set of available power control actions associated with the radio cell, based on the determined at least one feature representing the state of the part of the communication system, or based on the power control policy received by the trainer node.
According to a tenth possible implementation of the second aspect, or any previously described implementation thereof, the method may comprise iterating the method according to the second aspect, or any previously described implementation thereof.
According to an eleventh possible implementation of the second aspect, or any previously described implementation thereof, the method may comprise representing the obtained power control policy by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non-consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
According to a twelfth possible implementation of the second aspect, or any previously described implementation thereof, wherein the obtained power control policy further comprises: an adjustment of the set of available power control actions associated with the radio cell; an indicator configuring a combining method for the agent node to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain criterion; at least one indication of the stopping criteria to determine the depth of the decision forest. According to a thirteenth possible implementation of the second aspect, or any previously described implementation thereof, further comprising determining the performance measurement associated with the radio cell, given the feature representing the state at the first time period t and the power control action at taken at the first time period t. According to a fourteenth possible implementation of the second aspect, or any previously described implementation thereof, further comprising determining the performance measurement, associated with the performance within the radio cell by computing a weighted sum of scalars parameterised by a scalar coefficient a G [0,∞) and
Figure imgf000012_0006
transformed by a function
Figure imgf000012_0002
(with domain Xand range of real scalars),
Figure imgf000012_0001
where represents a radio measurement or a performance indicator associated with the radio cell, X is the set of all radio measurements or performance indicators associated with
Figure imgf000012_0004
the radio cell and used for the definition of the performance measurement, wt is a weight associated with and is a vector comprising all xt G X.
Figure imgf000012_0005
Figure imgf000012_0003
According to a third aspect, a computer program is provided, with a program code for performing a method according to the second aspect, or any possible implementation thereof, when the computer program runs on a computer. According to a fourth aspect, a trainer node is provided for determining a power control policy to be utilised by an agent node for downlink power control of a radio cell of a communication system. The trainer node is configured to: receive a training data message, associated with the radio cell, from the agent node, wherein the training data message comprises one or more in the group of: a feature representing a state of at least a part of the communication system at a first time period, a power control action performed by the agent node in the radio cell at the first time period, a feature representing the state at the second time period, and a performance measurement. Furthermore, the trainer node is configured to store the received training data message in a database associated with the radio cell. Also, the trainer node is configured to determine a power control policy for the radio cell, based on at least one training data message, stored in the database. Furthermore, the trainer node is configured to transmit the determined power control policy to the agent node. According to a first possible implementation of the fourth aspect, the trainer node may be further configured to determine an exploration-to-exploitation control parameter, associated with a probability of applying the determined power control policy; and wherein the determined exploration-to-exploitation control parameter is transmitted to the agent node together with the determined power control policy.
According to a second possible implementation of the fourth aspect, or the first possible implementation thereof, the trainer node may be configured to determine the exploration-to- exploitation control parameter so that the probability of applying the determined power control policy is increased over time.
According to a third possible implementation of the fourth aspect, or any previously described implementation thereof, the trainer node may be configured to compute a performance measurement associated with at least a part of the communication system, based on the performance measurement received in the training data message received from the agent node and at least one other performance measurement received from another radio network node in the communication system.
According to a fourth possible implementation of the fourth aspect, or any previously described implementation thereof, the trainer node may be configured to select which at least one feature the agent node is to utilise for representing the state of at least a part of the communication system and provide the made selection to the agent node.
According to a fifth possible implementation of the fourth aspect, or any previously described implementation thereof, the trainer node may be configured to select which performance measurement associated with the radio cell, the agent node is to utilise for representing the performance of the radio cell and provide the made selection to the agent node. According to a sixth possible implementation of the fourth aspect, or any previously described implementation thereof, the defined power control policy may be represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of nonlinear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non- consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
According to a fifth aspect, a method in a trainer node is provided for determining a power control policy to be utilised by an agent node for downlink power control of a radio cell of a communication system. The method comprises receiving a training data message, associated with the radio cell, from the agent node, wherein the training data message comprises one or more in the group of: a feature representing a state of at least a part of the communication system at a first time period, a power control action performed by the agent node in the radio cell at the first time period, a feature representing the state at the second time period, and a performance measurement. The method also comprises storing the received training data message in a database, associated with the radio cell. In addition, the method comprises determining a power control policy for the radio cell, based on at least one training data message, stored in the database. The method also comprises transmitting the determined power control policy to the agent node.
According to a first possible implementation of the fifth aspect, the method may comprise determining an exploration-to-exploitation control parameter, associated with the power control policy; and wherein the determined exploration-to-exploitation control parameter is transmitted to the agent node together with the determined power control policy.
According to a second possible implementation of the fifth aspect, or the first possible implementation thereof, wherein the exploration-to-exploitation control parameter is determined to be reduced over time.
According to a third possible implementation of the fifth aspect, or any of the previously described implementation thereof, the method may also comprise computing a performance measurement associated with at least a part of the communication system, based on the performance measurement received in the training data message received from the agent node and at least one other performance measurement received from another radio network node in the communication system.
According to a fourth possible implementation of the fifth aspect, or any of the previously described implementation thereof, the method may also comprise selecting which at least one feature the agent node is to utilise for representing the state of at least a part of the communication system wherein the radio cell is comprised and sending the made selection to the agent node. According to a fifth possible implementation of the fifth aspect, or any of the previously described implementation thereof, the method may also comprise selecting which performance measurement associated with the radio cell the agent node is to utilise for representing the performance of the radio cell and sending the made selection to the agent node.
According to a sixth possible implementation of the fifth aspect, or any previously described implementation thereof, the defined power control policy may be represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of nonlinear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non- consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
According to a sixth aspect, a computer program is provided, with a program code for performing a method according to the fifth aspect, or any possible implementation thereof, when the computer program runs on a computer. Thanks to the provided agent node, trainer node and methods therein, inter-cell interference management is simplified, also in a dense communication network. Thereby spectral efficiency of the system is enhanced. By appropriately adjusting transmission power, interference is reduced. Further, energy savings are made by not transmitting with higher power than required.
Another advantage of the disclosed aspects is that an adaptation may be made to changes concerning e.g. traffic load in the cell, traffic type in the cell, user device activity within the cell, etc., over time. Thereby downlink power control in the cell is improved. Other objects, advantages and novel features of the aspects of the invention will become apparent from the following detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
Various embodiments are described in more detail with reference to attached drawings, illustrating examples of embodiments of the invention in which:
Figure 1 is a block diagram illustrating different IOC approaches.
Figure 2 is a block diagram illustrating a wireless communication network with an agent node controlling the downlink transmission power budget for a co- located radio cell.
Figure 3 is a block diagram illustrating a wireless communication network with an agent node controlling the downlink transmission power budget of a non co- located radio cell.
Figure 4 is a block diagram illustrating a wireless communication network illustrating interaction between agent node and trainer node. Figure 5 is a block diagram illustrating a wireless communication network with multiple agent nodes sharing a single trainer node.
Figure 6 is a flow chart illustrating a method in an agent node according to an embodiment of the invention.
Figure 7 is a block diagram illustrating an agent node architecture according to an embodiment of the invention.
Figure 8 is a flow chart illustrating a method in a trainer node according to an embodiment of the invention.
Figure 9 is a block diagram illustrating a trainer node architecture according to an embodiment of the invention.
DETAILED DESCRIPTION
Embodiments of the invention described herein are defined as an agent node, a trainer node and methods therein, which may be put into practice in the embodiments described below. These embodiments may, however, be exemplified and realised in many different forms and are not to be limited to the examples set forth herein; rather, these illustrative examples of embodiments are provided so that this disclosure will be thorough and complete.
Still other objects and features may become apparent from the following detailed description, considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the herein disclosed embodiments, for which reference is to be made to the appended claims. Further, the drawings are not necessarily drawn to scale and, unless otherwise indicated, they are merely intended to conceptually illustrate the structures and procedures described herein.
Figure 2 is a schematic illustration over a radio communication system 200 wherein the agent node 210 resides in a radio network node, such as an eNodeB of an LTE system, and controls the downlink transmission power budget of at least one radio cell 215 co-located with the radio access node. The downlink power budget is determined based on radio environmental measurements received from a user device 220 camping within the controlled radio cells 215, as well as based on information received by other radio network nodes 230 representing a performance measure associated to the communication system. The neighbour radio network node 230 in the illustrated example may control three radio cells 235-1, 235-2, 235-3.
A massive densification of radio access nodes in future radio communication systems 200 makes inter-cell interference management particularly difficult due to the potentially large number of interferers affecting the transmission to/ from a user device 220, and therefore comes with a number of new challenges related to spectral efficiency and energy savings.
Solving this problem to optimality requires extensive channel state measurements correlating the channel quality experienced by each user device from multiple radio network nodes 230 (i.e., radio cells) to fully characterise the state of the system. Such measurements should be collected by a central node so as to jointly optimise the downlink power budget used by multiple radio network nodes 230. The signalling overhead to implement such solution makes it prohibitive already with slow varying radio channel. Methods and Apparatus are therefore provided for downlink power control in the radio communication system 200 based on learning features characterising the state of the radio environment with limited signalling overhead and exploiting features to learn an optimal, or at least improved, downlink power control policy for radio cells 215, 235-1, 235-2, 235-3 in the radio communication system 200. In some embodiments, two logical entities may be involved: the agent node 210 and a trainer node.
The agent node 210 (e.g., an eNodeB), may be configured to interact with the radio environment and determine an action for optimising/ improving the downlink transmission power budget of one or more radio network nodes 230, based on a power control policy.
The agent node 210 may be co-located with the radio network node 230 in some embodiments. However, in other embodiments, the agent node 210 may be a separate entity versus the radio network node 230. One agent node 210 may further control a plurality of radio cells either co-located 215 or not co-located 235-1, 235-2, 235-3. Thus the expression "radio network node" as utilised in this disclosure, may indicate both an agent node 210 co- located with a radio network node 230, or a separate radio network node 230.
The other logical entity is a trainer node (e.g., an eNodeB or a remote server in different embodiments), configured to learn a power control policy based on observation/s of the state of the radio communication system 200 received from one or more agent nodes 210 or from one or more radio network nodes 230. In some embodiments, the trainer node may be co-located with the agent node 210. In other embodiments however, the agent node 210 and the trainer node may be separate entities.
In some embodiments, the trainer may be kept in a centralised server room or similar, where it may withstand wind and weather while being appropriately protected from theft and damage. Further, appropriate maintenance and software updates may conveniently be performed by skilled personnel.
Further, in some embodiments, information associated with the state of the radio communication system 200, or a subset thereof, may be communicated or exchanged either among agent nodes 210 or between the agent node 210 and the trainer node. Additionally, a method in the trainer node is disclosed, to efficiently reconstruct, store, and exploit knowledge of at least one part of the state of the communication system 200 to create a new downlink power control policy for downlink power control of the radio cells 215, 235-1, 235-2, 235-3. In some embodiments, the agent node 210 is configured for controlling the downlink power control budget of the agent node 210, of another agent node or at least one radio network node 230, configured to receive a message from a user device 220 in the radio cell 215 controlled by the agent node 210 comprising at least one radio environmental measurement, e.g. concerning received signal strength/ quality; or location of the user device 220 in the cell 215.
Further, the agent node 210 may be configured to receive, from at least one radio network node 230, a message comprising at least one network performance measurement, or local reward as it also may be referred to as, associated with the communication system 200, or a subset thereof, such as e.g. a subset of the communication system 200 wherein the cell 215, 235-1, 235-2, 235-3 is situated.
The agent node 210 may additionally be configured to determine at least one feature representing, partly or entirely, the state of the communication system 200 based on the received radio environmental measurement/^ or the at least one network performance measurement. Further, in some embodiments, the agent node 210 may also be configured to determine a power control action associated with the radio cell 215, 235-1, 235-2, 235-3 in the communication system 200 based on the power control policy, a set of available power control actions, and at least one feature representing the state of the communication system 200, or a subset thereof. The agent node 210 may further be configured to configure the downlink transmission power of the radio cell 215, 235-1, 235-2, 235-3 based on the determined power control action. One advantage of the herein described method embodiments is to enable efficient inter-cell interference mitigation in the communication system 200 with reduced control signalling overhead. While traditional interference mitigation methods via downlink power control would require full knowledge of the radio environment (such as channel gain/ strength measurements from every user device 220 to every radio network node 230 in the coordinated system 200), the proposed solution enables the agent node 210 to learn the state of the communication system 200 from a limited number of measurements. Thereby also communication/ reporting of the measurements are reduced, leading to enhanced spectral efficiency of the communication system 200. The radio network node 230 can be designated as a base station, e.g. a Radio Base Station (RBS), which in some networks may be referred to as transmitter, "eNB", "eNodeB", "NodeB" or "B node", depending on the technology and terminology used. The radio network nodes 230 may be of different classes such as e.g. macro eNodeB, home eNodeB or pico base station, based on transmission power and thereby also cell size. One or more radio cells 235-1, 235-2, 235-3 can be controlled by one radio network node 230, or possibly agent node 210, such as e.g. a tri-sectorial radio site.
The communication system 200 may at least partly be based on radio access technologies such as, e.g., 3GPP LTE, LTE -Advanced, Evolved Universal Terrestrial Radio Access Net- work (E-UTRAN), Universal Mobile Telecommunications System (UMTS), Global System for Mobile Communications (originally: Groupe Special Mobile) (GSM)/ Enhanced Data rate for GSM Evolution (GSM/EDGE), Wideband Code Division Multiple Access (WCDMA), Time Division Multiple Access (TDMA) networks, Frequency Division Multiple Access (FDMA) networks, Orthogonal FDMA (OFDMA) networks, Single- Carrier FDMA (SC-FDMA) networks, Worldwide Interoperability for Microwave Access (WiMax), or Ultra Mobile Broadband (UMB), High Speed Packet Access (HSPA) Evolved Universal Terrestrial Radio Access (E-UTRA), Universal Terrestrial Radio Access (UTRA), GSM EDGE Radio Access Network (GERAN), 3GPP2 CDMA technologies, e.g., CDMA2000 lx RTT and High Rate Packet Data (HRPD), just to mention some few options. The expressions "wireless communication network", "wireless communication system" and/ or "cellular telecommunication system" may within the technological context of this disclosure sometimes be utilised interchangeably. It is to be noted that the illustrated network setting of one agent node 210, one radio network node 230 and one user device 220 in Figure 2 is to be regarded as a non-limiting example of an embodiment only. The radio communication system 200 may comprise any other number and/ or combination of agent nodes 210, radio network nodes 230 and/ or user devices 220. A plurality of user devices 220 and another configuration of radio network nodes 230 and/ or agent node 210 may thus be involved in some embodiments of the disclosed invention.
Thus whenever "one" or "a/ an" user device 220, radio network node 230 and/ or agent node 210 is referred to in the present context, a plurality of user devices 220, radio network nodes 230 and/ or agent nodes 210 may be involved, according to some embodiments.
The set of available power control actions
Figure imgf000021_0001
may comprise positive or negative power offset values to be applied to the current transmission power value of the radio cell 215, 235-1, 235-2, 235-3. Each value may therefore correspond to increase (positive offset), decrease (negative offset) or hold (zero offset) the current transmission power value. The power control values can equivalently be expressed in binary, linear, logarithmic (decibel), or other suitable scales. In one arbitrary non-limiting example, the set of available power control actions comprises the values Λ = {0, +1, +3} dB. Furthermore, the feasible range of a power control action may further depend on the current state of the communication system 200. In one example, the transmit power budget of radio cells 235-1, 235-2, 235-3 may be used as the control parameter and the current value of the downlink transmit power may be a part of the state at a time t. Then the set of available power control actions to be taken at time t, further may depend on the current value of the power budget. For instance, if the current value of the power budget has reached a maximum value, e.g., 46dB for a LTE macro eNodeB, then the set of available power control actions (according to the previous example) may be limited to Λ = {0,— 1, -3} dB. Figure 3 illustrates an embodiment wherein the agent node 210 is not co-located with the radio network node 230. In this case the agent node 210 controls the downlink power budget associated with at least one radio cell 235-1, 235-2, 235-3 not co-located with the agent node 210. The advantage of this embodiment is to enable centralised control of the downlink transmission power emitted by a plurality of radio network nodes 230, thereby mitigating interference and improving the spectral efficiency of the system 200.
In one embodiment of the solution, the agent node 210 may be configured to determine the power control action for at least one radio cell 235-1, 235-2, 235-3 controlled by the radio network node 230, not co-located with the agent node 210. Further, the agent node 210 may then be configured to determine a power control action associated with the radio cell 235-1, 235-2, 235-3 based on the power control policy, the set of available power control actions, and at least one feature representing the state of the communication system 200, or a subset thereof. The agent node 210 may further be configured to configure the downlink transmission power of the radio cell 235-1, 235-2, 235-3 of the other radio network node 230, based on the determined power control action and transmit a control message to the radio network node 230 comprising said downlink transmission power adjustment.
Additionally, the control message of the agent node 210 may further comprise an indication of time associated with the power control action, such as a starting time indicating when to apply the power control action and a power control window indicating the validity of the power control action. Furthermore, the control message may additionally comprise a plurality of power control actions and an associated indication of time. The power control actions may further be associated with one or more radio cells 235-1, 235-2, 235-3 controlled by the radio network node 230, or a plurality of radio network nodes 230.
The agent node 210 may determine a set of features T = {fj]F in some embodiments, wherein each feature fj may be associated with a part of the state st of the communication system 200 at a given time t . Each feature fj £ T can be determined based on radio environmental measurements received from e.g. user devices 220 within one or more radio cells 215, 235-1, 235-2, 235-3 controlled by the agent node 210 or based on measurements associated to the communication system 200 received from at least a radio network node
210. The set of features can therefore comprise e.g. an indicator of the
Figure imgf000023_0001
downlink transmission power associated with at least one radio cell 215 controlled by the agent node 210.
Further, the set of features may comprise an indicator of the downlink transmission power associated with at least one radio cell 235-1, 235-2, 235-3 controlled by the radio network node 230, not co-located with the agent node 210. The set of features may in addition comprise an indicator of the average, minimum, or maximum Reference Signal Received Power (RSRP) associated with the user devices 220 within at least one cell 215 controlled by the agent node 210. The set of features may also comprise an indicator of the average, minimum or maximum interference associated to the radio cell 215 controlled by the agent node 210 and at least one neighbouring radio cell 235-1, 235-2, 235-3.
The interference may be measured by the user devices 220 within the at least one radio cell 215 controlled by the agent node 210 and reported to the agent node 210. Furthermore, the set of features may in addition comprise an indicator of the average, minimum or maximum Signal to Noise Ratio (SNR) associated with the user devices 220 within at least one cell 215 controlled by the agent node 210. Also, the set of features may further comprise an indicator of the average, minimum or maximum Signal to Interference plus Noise Ratio (SINR) associated with the user devices 220 within at least one cell 215 controlled by the agent node 210. In further addition, the set of features may also comprise an indicator of the reward function associated with the radio cell 235-1, 235-2, 235-3 controlled by the network node 230.
Instead of RSRP, other similar measurements related to signal strength/ quality may be used such as e.g. Received Signal Strength Indication (RSSI) and/ or Received Channel Power Indicator (RCPI).
Further, instead of SNR or SINR, other similar signal to interference ratios may be utilised such as e.g. signal-to-interference ratio (SIR), Peak signal-to-noise ratio (PSNR), Signal-to- noise and distortion ratio (SINAD), or similar. Another possible feature in the set of features for determining the state of the communication system 200, or a subset thereof, may comprise a measurement related to a number of active user devices 220 in the radio cell 215, 235-1, 235-2, 235-3. In yet some embodiments, the feature in the set of features may comprise a measurement related to types of traffic within the radio cell 215, 235-1, 235-2, 235-3. Yet another feature in the set of features according to some embodiments may comprise a measurement related to location of user devices 220 in the radio cell 215, 235-1, 235-2, 235-3 or type of traffic of user devices 200. Thereby, the radio environmental measurement received from user devices 220 within the radio cell 215 controlled by the agent node 210 may comprise at least one or more in the group of: a measurement of the RSRP associated with at least one cell 215 controlled by the agent node 210; a measurement of RSRP associated with at least one neighbouring cell 235- 1, 235-2, 235-3, i.e., interference; a measurement of the SNR associated to at least one cell 215 controlled by the agent node 210; and/ or measurement of the SINR associated with at least one cell 215 controlled by the agent node 210.
The agent node 210 can further determine/ characterise the state st associated with a part or the entire communication system 200 at a given time t by selecting a subset of features /} G
with indicating the value of feature G S at time t.
Figure imgf000024_0001
Figure imgf000024_0002
Figure imgf000024_0003
Thereby, the state of the communication system 200 can be represented by different combinations of features and different number of features, such as the above enumerated.
The agent node 210 is further configured to determine a performance measurement rt , which may be referred to as a reward associated with one or more radio cells 215, 235-1, 235-2, 235-3 in the communication system 200 given the power control action at G Λ taken at time t and the state st of the system 200 at time Rather than
Figure imgf000024_0004
explicitly modelling the dependency of rt on the state and power control action, the agent node 210 can estimate the cell performance measurement rt based on the radio environmental measurements xt G X received by the user devices 220 served by the radio cell 215, 235-1, 235-2, 235-3. In other words, the user measurements may provide observations of the cell state at time t resulting from the application of the power control action at in some embodiments. In one embodiment, the agent node 210 determines the performance measurement
Figure imgf000025_0022
associated with the controlled radio cell 215, 235-1, 235-2, 235-3 as weighted sum of scalars
Figure imgf000025_0004
parameterised by a scalar coefficient a G [0,∞) and transformed by a function
Figure imgf000025_0005
(with domain Xand range of real scalars),
Figure imgf000025_0002
where xt represents a radio measurement or a performance indicator associated with the radio cell, X is the set of all radio measurements or performance indicators associated with the radio cell and used for the definition of the performance measurement, is a weight
Figure imgf000025_0020
associated with xt, and
Figure imgf000025_0003
is a vector comprising all
Figure imgf000025_0021
G X.
In one exemplifying case, the function and
Figure imgf000025_0008
represents the average data throughput of user i in the radio cell
Figure imgf000025_0006
and the reward in equation [1] can be
Figure imgf000025_0007
approximated for different values of a and weights with e.g. any, some or all of the
Figure imgf000025_0019
following expressions: The average data throughput associated to the user devices 220 in the radio cell 2 ^ ^ for all
Figure imgf000025_0009
i; The average data throughput associated to the radio cell
Figure imgf000025_0017
for all i; The average log-transformed data throughput
Figure imgf000025_0010
associated to the user devices 220 in the radio cell
Figure imgf000025_0016
— all i; The average sum of log-transformed data
Figure imgf000025_0015
throughput associated to the radio cell
Figure imgf000025_0012
if The average harmonic data throughput
Figure imgf000025_0013
Figure imgf000025_0014
associated to the user devices 220 in the radio cell 215, 235-1, 235-2, 235-3, i.e., rt(x) = if for all i; The average harmonic data throughput associated
Figure imgf000025_0018
Figure imgf000025_0011
with the radio cell for all
Figure imgf000025_0001
i.
Each reward expression enables the agent node 210 to optimise a different performance metric that can either be associated with individual user devices 220, to radio cells 215, 235- 1, 235-2, 235-3, or the communication system 200 as a whole.
In one embodiment, the agent node 210 determines a performance measurement rs associated with at least one part or the whole communication system 200 (e.g., a group of more than one radio cell 215, 235-1, 235-2, 235-3) based on the at least one network performance measurement, which may be referred to as a local reward, received from the radio network node 230 and the performance measurement rt associated with at least one radio cell 215 controlled by the agent node 210.
The power control policy is a function, mapping the state of the communication system 200 to the set of the available actions. The power control policy may be e.g. deterministic, stochastic, probabilistic or a combination thereof. In one embodiment of the invention, the power control policy may be represented by one or more neural network, wherein each neural network comprises an input layer consisting of set of input units, a set of hidden layers each consisting a set of hidden units, and one output layer consisting of one or more output units. Each neural network is represented by a set of weights denoting a real-valued
Figure imgf000026_0001
Figure imgf000026_0002
vector whose components represent the weights between units of layer I and units of layer
Figure imgf000026_0003
The agent node 210 can be configured with a codebook of neural networks N indexed by eventually of different size (i.e. with different number of hidden layer, and
Figure imgf000026_0004
with different number of units in the input, output and hidden layers).
In one embodiment, the agent node 210 determines a power control action by inputting at least one neural network associated with a power control policy with at least one action
Figure imgf000026_0010
and with at least a feature The output layer of each neural network n determines
Figure imgf000026_0006
a real- value associated with the corresponding power control action inputted
Figure imgf000026_0005
Figure imgf000026_0009
at the input layer. The value represents the likelihood that the power control action
Figure imgf000026_0008
Figure imgf000026_0012
may be configured, based on the power control policy represented by the neural
Figure imgf000026_0011
network. Therefore, in one embodiment, the agent node 210 may be configured with the power control policy represented by a single neural network and determines the power control action α;* with the maximum likelihood coefficient as:
Figure imgf000026_0007
In one alternative embodiment, the agent node 210 may be configured with a power control policy represented by a set comprising number of neural networks and determines a power control action based on the following steps: Determine, for each neural network
Figure imgf000027_0002
Figure imgf000027_0001
representing the power control policy, the power control action with the maximum likelihood coefficient based on equation [1]; Choose the power control action as the one that has been selected by largest number of neural networks (e.g., majority vote). If two or more power control action at* have been selected by an equal number of neural networks with majority vote, one of said power control actions at* may be chosen at random with equal probability, in some embodiments.
An advantage of these described embodiments may be to reduce the variance of the inter- cell interference experienced in the communication system 200, thereby enabling to achieve higher spectral efficiency within the system 200.
The power control policy can be learned by the agent node 210 based on the available radio environmental measurements received from user devices 220 in the radio cells 210 controlled by the agent node 210. In an alternative embodiment, the agent node 210 may receive a control message comprising the power control policy from the trainer node.
Therefore, according to some embodiments, the agent node 210 may be further configured to: receive, from the trainer node, the control message comprising the power control policy for downlink power control. The received power control can be associated with one or more radio cells 215 controlled by the agent node 210, or with one or more radio network nodes 230 controlled by the agent node 210. The embodiment is illustrated in Figure 4 wherein the control message comprising the power control policy 430 is transmitted by the trainer node 400 to the agent node 210.
Figure 4 illustrates an example of interaction between the agent node 210 and the trainer node 400 in the communication system 200.
Furthermore, the power control policy 430 may be represented by one or more in the group of: at least one neural network indicator indicating one neural network in the codebook of neural networks K available for the agent node 210, an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which may comprise an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non-consecutive layers in the neural network, a set of neural network weights w to be configured for at least one neural network associated with the power control policy 430, the number of neural networks to be configured for power control; a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks. The advantage of this embodiment is to enable the configuration of the power control policy 430 through a centralised training so that to adapt the power control policy 430 to effectively track changes in the radio environment.
In some embodiments, the agent node 210 may be further configured to receive, from the trainer node 400, the first control message further comprising an exploration-to-exploitation control parameter s associated with the power control policy 430. Further, the agent node 210 may be configured to determine whether to apply the power control policy 430 based on the exploration-to-exploitation control parameter e. The optional exploration-to-exploitation control parameter e may indicate how often in average the power control policy 430 should be used compared to an alternative power control policy 430, such as, for instance, selecting a random action in the set Λ . The advantage of this method is to allow the exploration of new states of the communication system 200 that would otherwise not be observed by the agent node 210 by using always the received power control policy 430.
In some embodiments, the agent node 210 is further configured to determine a Training Data Message (TDM) 420 comprising at least an indication of: The state st of the communication system 200 measured at a certain time t; The power control action at taken by the agent node 210 at time t; The state of the communication system 200 st+1 measured after the power control action; A measurement of the system performance rt+1 i.e. the reward associated with the new state of the communication system 200 and the power control action. The determined training data message 420 is then transmitted to the trainer node 400.
The training data message 420 may furthermore be associated with one or more radio cells 215, 235-1, 235-2, 235-3 or with one or more radio network nodes 210, 230, in the communication system 200. Thereby, the training data message 420 may provide observations of the state of the communication system 200 associated with the power control action taken by one or more radio cells 215, 235-1, 235-2, 235-3 or radio network nodes 210, 230. This embodiment, illustrated in Figure 4, enables the trainer node 400 to efficiently adapt the power control policy 430 to changes in the communication system 200 so as to optimise the system spectral efficiency.
The training data message 420 may further carry a batch of training data, i.e. a set of quadruplets associated with T≥ l observations of the state
Figure imgf000029_0001
transition due to the power control action. The radio communication system 200 may in certain implementation comprise a plurality of agent nodes 210, 500 wherein some agent nodes 210 have a communication interface with the trainer node 400 (e.g., the X2 interface of an LTE system, or the SI interface), whilst some agent nodes 210, 500 have a communication interface with other agent nodes 210, 500, see e.g. Figure 5.
Figure 5 illustrates a communication system 200 with multiple agent nodes 210, 500 sharing a single trainer node 400.
Thereby, in some embodiments, the agent node 210 when connected to the trainer node 400 with a communication interface may be further configured to: receive from at least one other agent node 500 at least one training data message 420. Further the agent node 210 may also be configured to forward the received at least one training data message 420 associated with the other agent node 500, to the trainer node 400. The agent node 210 may further be configured to store multiple training data messages 420 from one or more second agent nodes 500. Thereafter, a batch of training data messages 420 associated with one or more second agent nodes 500 can be forwarded to the trainer node 400 over the communication interface. Thereby overhead signalling is decreased, leading to enhanced spectral efficiency of the communication system 200.
In one alternative implementation, the agent node 210 may further be configured to elaborate its own training data message 420 with one or more training data messages 420 received from one or more second agent nodes 500. Thereby, the information comprised in multiple training data messages 420 can be compressed and signalled with reduced signalling overhead, in some embodiments.
In some embodiments, the agent node 210, when connected to the trainer node 400 with a communication interface, may further be configured to forward to the at least second agent node 500, a power control policy 430 received from the trainer node 400;
In some embodiments, the agent node 210 when not connected to the trainer node 400 with the communication interface, may optionally be further configured to transmit to at least one other agent node 500 at least one training data message 420. The second agent node 500 may or may not have a communication interface with the trainer node 400.
In some embodiments, the power control policy 430 for the agent node 210, 500 may be determined by the trainer node 400 upon receiving at least one training data message 420. The trainer node 400 may in some embodiments be configured to: receive from at least one other agent node 500 at least one training data message 420; determine the power control policy 430 associated with the at least one radio cell 215, 235-1, 235-2, 235-3 in the communication system 200 based on the at least one training data message 420. Furthermore, the trainer node 400 may in some embodiments be configured to transmit, to the agent node 210, the control message comprising the power control policy 430 for downlink power control.
Returning to Figure 4, this embodiment is illustrated. The one or more radio cells 215, 235- 1, 235-2, 235-3 associated with the power control policy 430 can either be controlled by the agent node 210 or by another radio network node 230, in turn controlled by the agent node 210 in different embodiments. In the latter case, the agent node 210 may transmit, to the radio network node 230, power control commands associated with the radio cell 235-1, 235- 2, 235-3 computed based on the power control policy 430 according to any of the previously discussed embodiments.
The power control policy 430 may additionally be associated with a group of more than one radio cell 215, 235-1, 235-2, 235-3 in some embodiments. In one example, the power control policy 430 may be associated with a group of three radio cells 235-1, 235-2, 235-3 co-located in a tri-sectorial radio network node 230. The agent node 210 can either reside in said radio network node 230, or control said radio network node 230, in different embodiments.
In one implementation of the embodiment, the trainer node 400 may determine a new power control policy 430 based on the new training data comprised on the received training data message 420 and based on formerly received training data, which may be stored in the database 410 by the trainer node 400, to be available for future evaluations of the power control policy 430. The training data stored by the trainer node 400 may be received from one or more agent nodes 210, 500 and thereby be associated with different radio cells 215, 235-1, 235-2, 235-3 or different radio network nodes 210, 230, 500 in the communication system 200.
In some embodiments, the trainer node 400 may further be additionally configured to determine an exploration-to-exploitation control parameter e associated with the power control policy 430. The exploration-to-exploitation control parameter e may set a probability i.e. between 0 and 1, to utilise the provided power control policy 430; or to use e.g. a random power control policy 430. Further, the trainer node 400 may be configured to transmit, to the agent node 210, a control message further comprising the exploration-to- exploitation control parameter e, in some embodiments. The exploration-to-exploitation control parameter e regulates the utilisation of the power control policy 430 at the agent node 210. In one exemplifying case, exploration-to- exploitation control parameter e takes values in the interval e £ [0, 1] (with zero and one included in the interval). Thereby, a given value of the exploration-to-exploitation control parameter e may indicate how often in average the power control policy 430 should be used compared to an alternative power control policy 430, such as, for instance, selecting a random action in the set Λ Thereby, a value e = 0.2 may indicate to the agent node 210 to
Figure imgf000032_0003
select a power control action based on the received power control policy 430 in 80% of the times, whilst selecting a power control action at G Λ at random in 20% of times (or vice versa).
The advantage of this method is to efficiently control the exploration of states of the communication system 200 that would otherwise not be observed by the agent node 210 and the trainer node 400. The explored states of the communication system 200 are reported by the agent node 210 to the trainer node 400 via the Training Data Message (TDM) 420 according to previously discussed embodiments.
An efficient control of the exploration versus exploitation trade-off may be an advantage as exploring the states of the communication system 200 at random may lead to degradation of the system spectral efficiency. Thereby, in some embodiments, the exploration-to- exploitation control parameter e may gradually be reduced over time according to a predefined algorithm so as gradually reduce exploration and increase exploitation of the power control policy 430. In one example, the exploration-to-exploitation control parameter updated every time an action is selected may be computed as:
Figure imgf000032_0001
wherein represents a minimum value of the exploration-to-exploitation control
Figure imgf000032_0002
parameter that the agent node 210 will preserve once the main exploration phase is completed, emax is the largest epsilon value with which the agent node 210 will start. Here k≥ 0 is a discrete counter representing the number of actions executed up until the current time, and N is the total number of actions during which the annealing from emax to emin should take place.
The trainer node 400 can determine the optimal power control policy 430 based on a Reinforcement Learning (RL) algorithm. The reinforcement learning algorithm solves the problem of associating the experienced reward to the control actions that, taken in a given state of the system 200, lead to that reward. The power control policy 430, resulting from a reinforcement learning algorithm, maps a given system state to the action to be taken (among the available set of actions) in order to maximise the cumulative reward. Some of the most popular methods in RL are critic-only methods. They are based on the idea of finding an optimal value function and then deriving a policy from it. Possibly the most well-known of the critic-only algorithms is that of Q-learning. A Q-value function is a prediction of future reward, more precisely the Q-value function tries to learn "how much total reward can I expect from taking action a in state s and following the policy π". By extracting a Q-function, rather than learning directly what action to take in a given state, it may be learned how valuable it is to take an action in a given state and then derive a policy from it.
The Q-value function associated with taking action a in state s, following the policy π and with discount factor γ is written as:
Figure imgf000033_0001
Such Q-value function can be learned by the method of the temporal differences:
Figure imgf000033_0002
The difference may be calculated as the discrepancy between the Q-value predicted by the Q-value function at time step t, that is
Figure imgf000033_0004
and the actual reward plus discounted Q- value at time step
Figure imgf000033_0003
Such discrepancy between what is predicted and what is actually experienced is used to correct (after multiplication with a certain learning rate a) the estimated Q-function and bring it closer to the true Q-function which is to be learned. The algorithms described above assumed that the value function may somehow be represented in an appropriate way, e.g., by storing it in a table. However, in practice state spaces might become very large or even infinite so that a table based representation is not possible. Moreover, filling these large tables would require a huge amount of observed transitions. To overcome this problem, value functions are typically represented in terms of parameterised function approximators, e.g., linear functions or, as mentioned in this description, neural networks. Instead of updating individual entries of the value functions, the parameters of the function approximator are changed using gradient descent to minimise the error:
Figure imgf000034_0001
where is the target and Q(s, a) is the prediction.
Figure imgf000034_0002
Figure 6 is a flow chart illustrating embodiments of a method 600 in an agent node 210 for downlink power control of at least one radio cell 215, 235-1, 235-2, 235-3 of a communication system 200. The radio cell 215, 235-1, 235-2, 235-3 is controlled by the agent node 210 or in some embodiments by another agent node 500, or alternatively by a network node 230. The agent node 210 may be co-located with a trainer node 400 and/ or the network node 230 controlling the at least one radio cell 215, 235-1, 235-2, 235-3 in some embodiments. Alternatively, the agent node 210 may be situated at a distance from the trainer node 400 and/ or the network node 230.
The communication system 200 may be based on 3 GPP LTE. Further, the wireless communication system 200 may be based on Frequency-Division Duplex (FDD) or Time Division Duplex (TDD) in different embodiments.
To appropriately configure the downlink transmission power instruction of the radio cell 215, 235-1, 235-2, 235-3, the method 600 may comprise a number of steps 601-612.
It is however to be noted that any, some or all of the described steps 601-612, may be performed in a somewhat different chronological order than the enumeration indicates, be performed simultaneously or even be performed in a completely reversed order according to different embodiments. Some actions such as e.g. step 602-603, 607-612 may be performed within some, but not necessarily all embodiments. Further, it is to be noted that some actions may be performed in a plurality of alternative manners according to different embodiments, and that some such alternative manners may be performed only within some, but not necessarily all embodiments. The agent node 210 may in some embodiments periodically re-perform any, some or all of step 601-612, thereby enabling application of a new power control policy 430 according to some embodiments. The method 600 may comprise the following steps:
Step 601 comprises obtaining a power control policy 430. The power control policy 430 may be received from the trainer node 400.
In some embodiments wherein the radio network node 500 controlling the radio cell 235-1, 235-2, 235-3 of the downlink power control is another agent node 500, the power control policy 430 to be utilised for downlink power control in the radio cell 235-1, 235-2, 235-3 of the other agent node 500 in the communication system 200, obtained from the trainer node 400, may be forwarded to the other agent node 500.
The power control policy 430 may be obtained iteratively in some embodiments.
The obtained power control policy 430 may be represented by one or more of: an indication of a neural network architecture, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks, size of a neural network configured for power control, in some embodiments.
The obtained power control policy 430 may in some embodiments be represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which could include an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non- consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy comprises multiple neural networks.
According to various embodiments, the obtained power control policy 430 may further comprise: an adjustment of the set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3 ; an indicator configuring a combining method for the agent node 210, 500 to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy 430 when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy 430 based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy 430 based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain criterion; at least one indication of the stopping criteria to determine the depth of the decision forest.
Step 602, which only may be comprised in some embodiments, may comprise selecting which at least one feature to utilise for representing the state of at least a part of the communication system 200.
The feature representing the state of at least a part of the communication system 200, may be selected based on any, some or a combination of: a measurement related to received signal quality made by and received from a user device 220 in the radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to received signal quality made by and received from a user device 220 in another radio cell 215, 235-1, 235-2, 235-3; a measurement related to downlink transmission power of the radio cell 215, 235-1 , 235-2, 235-3 made by and obtained from the radio network node 210, 230, 500 controlling the radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to a number of active user devices 220 in the radio cell 215, 235-1, 235-2, 235-3; a measurement related to types or distribution of traffic within the radio cell 215, 235-1, 235-2, 235-3; a measurement related to location or distribution of user devices 220 in the radio cell 215, 235-1, 235-2, 235-3; or a performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3. Step 603, which only may be comprised in some embodiments, may comprise selecting which performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3 to utilise for representing the performance of the radio cell 215, 235-1, 235-2, 235-3.
Step 604 comprises determining at least one feature representing a state of at least a part of the communication system 200, at a first time period. The part of the communication system 200 may be the part of the communication system 200 wherein the radio cell 215, 235-1, 235-2, 235-3 for which downlink power control is to be performed, is situated. The determination of said feature may in some embodiments comprise iterating the determination of the feature representing the state of at least the part of the communication system 200.
The feature representing the state of at least a part of the communication system 200 wherein the radio cell 215, 235-1 , 235-2, 235-3 may be situated, may be determined based on any, some or a combination of: a measurement related to received signal quality made by and received from a user device 220 in the radio cell 215, 235-1, 235-2, 235-3; a measurement related to received signal quality made by and received from a user device 220 in another radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to downlink transmission power of the radio cell 215, 235-1 , 235-2, 235-3 made by and obtained from the radio network node 210, 230, 500 controlling the radio cell 215, 235-1, 235-2, 235-3; a measurement related to a number of active user devices 220 in the radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to types or distribution of traffic within the radio cell 215, 235-1, 235-2, 235-3 ; a measurement related to location or distribution of user devices 220 in the radio cell 215, 235-1, 235-2, 235-3; or a performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3.
Step 605 comprises determining a power control action to be performed for downlink power control in the radio cell 215, 235-1 , 235-2, 235-3 at the first time period, out of a set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3, based on the obtained 601 power control policy 430.
The determination of the power control action may be iterated according to some embodiments.
Further, the set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3 may be adjusted according to some embodiments, based on the determined 604 at least one feature representing the state of the part of the communication system 200 wherein the radio cell 215, 235-1, 235-2, 235-3 may be comprised, or based on the power control policy 430 received by the trainer node 400.
Step 606 comprises configuring a downlink transmission power instruction of the radio cell 215, 235-1, 235-2, 235-3 based on the determined 605 power control action.
The configuration of the downlink transmission power instruction may be iterated according to some embodiments.
Step 607, which only may be comprised in some embodiments wherein the radio network node 230 controlling the radio cell 235-1, 235-2, 235-3 is not co-located with the agent node 210, may comprise transmitting the configured 606 downlink transmission power instruction of the radio cell 235-1, 235-2, 235-3 to the radio network node 230 for downlink power control of the radio cell 235-1, 235-2, 235-3 of the radio network node 230. Step 608, which only may be comprised in some embodiments, may comprise determining the feature representing the state of the part of the communication system 200 wherein the radio cell 215, 235-1, 235-2, 235-3 may be situated, at a second time period.
The feature representing the state of the part of the communication system 200 may typically be the same as previously determined 604 at the first time period, enumerated in step 604.
Step 609, which only may be comprised in some embodiments, may comprise determining a performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3, or within the communication system 200 or a subset thereof.
The performance measurement may be made at the first time period in some embodiments and at the second time period in some embodiments. The determined performance measurement may be iterated in some embodiments.
The performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3 may in some embodiments be determined, given the feature representing the state at the first time period t and the power control action at taken at the first time period t. The performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3 may in some embodiments be determined by computing a weighted sum of scalars
Figure imgf000039_0012
G X
Figure imgf000039_0011
parameterised by a scalar coefficient
Figure imgf000039_0010
and transformed by a function
Figure imgf000039_0006
(with domain X and range of real scalars),
Figure imgf000039_0009
Figure imgf000039_0001
where represents a radio measurement or a performance indicator associated with the radio cell
Figure imgf000039_0004
is the set of all radio measurements or performance indicators associated with the radio cell and used for the definition
Figure imgf000039_0005
of the performance measurement,
Figure imgf000039_0008
is a weight associated with and x is a
Figure imgf000039_0007
Figure imgf000039_0003
vector comprising all x
Figure imgf000039_0002
Step 610, which only may be comprised in some embodiments wherein step 609 has been performed, may comprise computing a performance measurement associated with at least a part of the communication system 200 wherein the radio cell 215, 235-1 , 235-2, 235-3 may be situated, based on the determined 609 performance measurement and a network performance measurement received from another radio network node 230, 500 in the communication system 200. Step 611, which only may be comprised in some embodiments, may comprise transmitting a training data message 420 to the trainer node 400, comprising one or more in the group of: the determined 604 feature representing the state at the first time period, the determined 605 power control action performed at the first time period, the determined 608 feature representing the state at the second time period, and the determined 609 performance measurement.
In some embodiments wherein step 610 has been performed, the training data message 420 transmitted to the trainer node 400 comprises the computed 610 performance measurement. In some embodiments wherein the radio network node 500 controlling the radio cell 235-1, 235-2, 235-3 of the downlink power control is another agent node 500, the training data message 420 transmitted to the trainer node 400 may comprise a training data message 420 which has previously been received from the other agent node 500. The transmission of the training data message 420, or of a plurality of training data messages 420 in a collected batch of training data messages 420 to the trainer node 400, may be iterated in some embodiments. Step 612, which only may be comprised in some embodiments wherein the obtained 601 power control policy 430 further comprises an exploration-to-exploitation control parameter, associated with a probability of applying the power control policy 430, may comprise determining application of the obtained 601 power control policy 430, based on the obtained 601 exploration-to-exploitation control parameter.
In some embodiments, any, some or all method steps 601 -612 may be iterated infinitely, for a limited period of time, or until a threshold limit is achieved.
Figure 7 illustrates an embodiment of an agent node 210, 500 for downlink power control of a radio cell 215, 235-1, 235-2, 235-3 of a communication system 200. The agent node 210, 500 is configured to perform the method 600 according to any, some, all, or at least one of the enumerated method steps 601 -612, according to some embodiments.
The agent node 210 is thus configured to obtain a power control policy 430. Further, the agent node 210 is configured to determine at least one feature representing a state of at least a part of the communication system 200, e.g. wherein the radio cell 215, 235-1, 235-2, 235- 3 is situated, at a first time period. In addition, the agent node 210 is configured to determine a power control action to be performed for downlink power control in the radio cell 215, 235-1, 235-2, 235-3 at the first time period, out of a set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3, based on the obtained power control policy 430 and the determined at least one feature. Furthermore, the agent node 210 is also configured to configure a downlink transmission power instruction of the radio cell 215, 235-1, 235-2, 235-3 based on the determined power control action. Furthermore, in some embodiments, the agent node 210, 500 may be further configured to determine a feature representing the state of the part of the communication system 200, at a second time period. The part of the communication system 200 may be the part wherein the radio cell 215, 235-1, 235-2, 235-3 is situated. The agent node 210, 500 may also be configured to determine a performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3. Also, the agent node 210, 500 may be additionally configured to transmit a training data message 420 to a trainer node 400, comprising one or more in the group of: the determined feature representing the state at the first time period, the determined power control action performed at the first time period, the determined feature representing the state at the second time period, and/ or the determined performance measurement. The agent node 210, 500 may also be configured to obtain the power control policy 430 by receiving the power control policy 430 from the trainer node 400.
In some optional embodiments, the agent node 210, 500 may be configured to select which at least one feature to utilise for representing the state of at least a part of the communication system 200. Further, the agent node 210, 500 may also be configured to select which performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3 to utilise for representing the performance of the radio cell 215, 235-1 , 235-2, 235-3. According to some embodiments, the agent node 210, 500 may be configured to transmit the configured downlink transmission power instruction of the radio cell 235-1, 235-2, 235- 3 to the radio network node 230 for downlink power control of the radio cell 235-1, 235-2, 235-3 of the radio network node 230, when the radio network node 230 controlling the radio cell 235-1, 235-2, 235-3 is not co-located with the agent node 210.
Furthermore, the agent node 210, 500 may also be configured to determine the feature representing the state of at least a part of the communication system 200, e.g. where the radio cell 215, 235-1 , 235-2, 235-3 is situated, based on any of: a measurement related to received signal quality made by and received from a user device 220 in the radio cell 215, 235-1, 235-2, 235-3; a measurement related to received signal quality made by and received from a user device 220 in another radio cell 215, 235-1, 235-2, 235-3; a measurement related to downlink transmission power of the radio cell 215, 235-1, 235-2, 235-3 made by and obtained from the radio network node 210, 230, 500 controlling the radio cell 215, 235- 1, 235-2, 235-3 ; a measurement related to a number of active user devices 220 in the radio cell 215, 235-1, 235-2, 235-3; a measurement related to types, or distribution, of traffic within the radio cell 215, 235-1, 235-2, 235-3; a measurement related to location, or distribution, of user devices 220 in the radio cell 215, 235-1, 235-2, 235-3; or a performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235- 3. The agent node 210, 500 may be configured to compute a performance measurement associated with at least a part of the communication system 200, e.g. wherein the radio cell 215, 235-1, 235-2, 235-3 is comprised, based on the determined performance measurement and at least one other network performance measurement received from another radio network node 230, 500 in the communication system 200. Further, the agent node 210, 500 may also be configured to transmit the training data message 420 to the trainer node 400 comprising the computed performance measurement, in some embodiments. In addition, the agent node 210, 500 may be configured to obtain an exploration-to- exploitation control parameter, associated with a probability of applying the determined power control policy 430. The agent node 210, 500 may further be configured to determine application of the obtained power control policy 430, based on the obtained exploration-to- exploitation control parameter. The exploration-to-exploitation control parameter may e.g. be received from the trainer node 400 together with the power control policy 430 in some embodiments.
In some embodiments, wherein a radio network node 500 controlling the radio cell 235-1, 235-2, 235-3 is another agent node 500, the agent node 210 may be configured to transmit the training data message 420 to the trainer node 400 comprising a training data message 420 received from the other agent node 500. The agent node 210, 500 may also be further configured to forward the power control policy 430 to be utilised for downlink power control in the radio cell 235-1, 235-2, 235-3 of the other agent node 500 in the communication system 200, received from the trainer node 400, to the other agent node 500.
According to some optional embodiments, the agent node 210, 500 may be configured to iterate the determination of the feature representing the state of at least a part of the communication system 200; the determination of the power control action; the configuration of the downlink transmission power instruction; the determination of the performance measurement; the transmission of the training data message 420, or a plurality of training data messages 420, to the trainer node 400 and/ or the obtaining of the power control policy 430.
The agent node 210, 500 may in addition be configured to adjust the set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3, based on the determined at least one feature representing the state of the part of the communication system 200, or based on the obtained power control policy 430. Also, in some embodiments, the agent node 210, 500 may be configured to obtain the power control policy 430, represented by one or more of: an indication of a neural network architecture, a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer (which may comprise an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit, etc.), a set of parameters describing the connections between units of consecutive or non-consecutive layers in the neural network, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks.
In some embodiments, the power control policy 430 may be represented by an indication of a neural network architecture, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks, size of a neural network configured for power control.
Additionally, the agent node 210, 500 may be configured, in some embodiments, to obtain the power control policy 430, comprising an adjustment of the set of available power control actions associated with the radio cell 215, 235-1 , 235-2, 235-3; an indicator configuring a combining method for the agent node 210, 500 to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy 430 when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy 430 based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy 430 based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain criterion; and/ or at least one indication of the stopping criteria to determine the depth of the decision forest.
The agent node 210, 500 may in some embodiments be configured to determine the performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3, given the feature representing the state of the communication system 200, or a subset thereof, at the first time period t and the power control action at taken at the first time period t.
In some embodiments, the agent node 210, 500 may be additionally configured to determine the performance measurement, associated with the performance within the radio cell 215, 235-1, 235-2, 235-3 by computing a weighted sum of scalars xt G X, parameterised by a scalar coefficient and transformed by a function with domain Xand
Figure imgf000044_0003
Figure imgf000044_0007
range of real scalars),
Figure imgf000044_0001
where xt represents a radio measurement or a performance indicator associated with the radio cell 215, 235-1 , 235-2, 235-3, X is the set of all radio measurements or performance indicators associated with the radio cell 215, 235-1, 235-2, 235-3 and used for the definition of the performance measurement, is a weight associated with and x is a
Figure imgf000044_0006
Figure imgf000044_0005
Figure imgf000044_0002
vector comprising all
Figure imgf000044_0004
For enhanced clarity, any internal electronics or other components of the agent node 210, not completely indispensable for understanding the herein described embodiments has been omitted from Figure 7. The agent node 210 comprises a receiver 710, configured for receiving e.g. signal strength/ quality measurements from one or more user devices 220, for receiving e.g. signal strength/ quality measurements or other information from one or more radio network nodes 230; or for receiving e.g. the power control policy 430 from the trainer node 400. Further, the agent node 210 comprises a processor 720, configured for downlink power control of the radio cell 215, 235-1 , 235-2, 235-3 in the communication system 200, by performing at least some steps 601-612 of the described method 600. Such processor 720 may comprise one or more instances of a processing circuit, i.e. a Central Processing Unit (CPU), a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC), a microprocessor, or other processing logic that may interpret and execute instructions. The herein utilised expression "processor" may thus represent a processing circuitry comprising a plurality of processing circuits, such as, e.g., any, some or all of the ones enumerated above.
Further, the agent node 210 may in some embodiments comprise a transmitter 730, configured for transmitting various signals, to be received by the user device 220, radio network node 230, other agent node 500 and/ or trainer node 400.
In further addition, the agent node 210 may comprise at least one memory 725, according to some embodiments. The optional memory 725 may comprise a physical device utilised to store data or programs, i.e., sequences of instructions, on a temporary or permanent basis. According to some embodiments, the memory 725 may comprise integrated circuits comprising silicon-based transistors. Further, the memory 725 may be volatile or nonvolatile.
At least a sub-set of the previously described method steps 601-612 to be performed in the agent node 210 may be implemented through the one or more processing circuits 720 in the agent node 210, together with a computer program product for performing the functions of at least some of the method steps 601-612. Thus a computer program product, comprising instructions for performing the method steps 601-612 may perform downlink power control of the radio cell 215, 235-1, 235-2, 235-3 in the communication system 200, when the computer program is loaded into the processor 720 of the agent node 210.
The computer program mentioned above may be provided for instance in the form of a data carrier carrying computer program code for performing at least some of the method steps 601 -612 according to some embodiments when being loaded into the processor 720. The data carrier may be, e.g., a hard disk, a CD ROM disc, a memory stick, an optical storage device, a magnetic storage device or any other appropriate medium such as a disk or tape that may hold machine readable data in a non-transitory manner. The computer program product may furthermore be provided as computer program code on a server and downloaded to the agent node 210 remotely, e.g., over an Internet or an intranet connection.
Figure 8 is a flow chart illustrating embodiments of a method 800 in a trainer node 400 for determining a power control policy 430 to be utilised by an agent node 210, 500 for downlink power control of a radio cell 215, 235-1, 235-2, 235-3 of a communication system 200.
The radio cell 215, 235-1, 235-2, 235-3 is controlled by the agent node 210 or in some embodiments by another agent node 500, or alternatively by a network node 230, which in turn is controlled by the agent node 210, 500. The agent node 210 may be co-located with a trainer node 400 and/ or the network node 230 controlling the radio cell 215, 235-1, 235-2, 235-3 in some embodiments. Alternatively, the agent node 210 may be situated at a distance from the trainer node 400 and/ or the network node 230.
The power control policy 430 may be represented by one or more of: an indication of a neural network architecture, a set of indicators to at least one neural network index, a set of neural network weights, a number of neural networks to be configured for power control, a voting policy to be configured for determining the power control action when the power control policy 430 comprises multiple neural networks, size of a neural network configured for power control, in some embodiments. According to various embodiments, the obtained power control policy 430 may further comprise: an adjustment of the set of available power control actions associated with the radio cell 215, 235-1, 235-2, 235-3 ; an indicator configuring a combining method for the agent node 210, 500 to aggregate results produced by individual neural networks for determining the power control action; an indicator of the activation function for neurons in at least one neural network; an indicator of a parameter representing an axis aligned data split function for the power control policy 430 when based on a decision forest; an indicator of at least two parameters representing a linear data split function for the power control policy 430 based on the decision forest; an indicator of a parameter representing a quadratic data split function for the power control policy 430 based on the decision forest; an indication one or more hyper-parameters characterising the decision forest in the group of: maximum or minimum depth of the decision forest; depth of the decision forest; maximum or minimum number of decision trees; number of decision trees in the decision forest; information gain criterion; at least one indication of the stopping criteria to determine the depth of the decision forest.
To appropriately determine the power control policy 430 to be utilised by the agent node 210, 500 for downlink power control of the radio cell 215, 235-1, 235-2, 235-3 the method 800 may comprise a number of steps 801-805.
It is however to be noted that any, some or all of the described steps 801-805, may be performed in a somewhat different chronological order than the enumeration indicates, be performed simultaneously or even be performed in a completely reversed order according to different embodiments. Some actions such as e.g. step 804 may be performed within some, but not necessarily all embodiments. Further, it is to be noted that some actions may be performed in a plurality of alternative manners according to different embodiments, and that some such alternative manners may be performed only within some, but not necessarily all embodiments. The trainer node 400 may in some embodiments periodically re-perform any, some or all of step 801-805, thereby providing a continuously updated power control policy 430 according to some embodiments. The method 800 may comprise the following steps:
Step 801 comprises receiving a training data message 420, associated with the radio cell 215, 235-1, 235-2, 235-3, from the agent node 210, 500, wherein the training data message 420 comprises one or more in the group of: a feature representing a state of at least a part of the communication system 200 at a first time period, a power control action performed by the agent node 210, 500 in the radio cell 215, 235-1, 235-2, 235-3 at the first time period, a feature representing the state at the second time period, and a performance measurement.
The performance measurement may have been made at the first time period in some embodiments and at the second time period in some embodiments.
In some embodiments, a conjunct performance measurement may be computed, associated with at least a part of the communication system 200, based on the performance measurement received in the training data message 420 received from the agent node 210, 500 and another performance measurement received from another radio network node 230, 500 in the communication system 200.
Step 802 comprises storing the received 801 training data message 420 in a database 410, associated with the radio cell 215, 235-1 , 235-2, 235-3.
Step 803 comprises determining the power control policy 430 for the radio cell 215, 235-1, 235-2, 235-3, based on at least one training data message 420, stored 802 in the database 410.
Step 804, which only may be comprised in some embodiments, may comprise determining an exploration-to-exploitation control parameter, associated with the determined 803 power control policy 430.
The exploration-to-exploitation control parameter may be determined to be reduced over time, in some embodiments. Step 805 comprises transmitting the determined 803 power control policy 430 to the agent node 210, 500.
In some embodiments, wherein step 804 has been performed may comprise transmitting the determined 804 exploration-to-exploitation control parameter to the agent node 210, 500 together with the determined 803 power control policy 430.
In some alternative embodiments, the method 800 further may comprise selecting which at least one feature the agent node 210, 500 is to utilise for representing the state of at least a part of the communication system 200 wherein the radio cell 215, 235-1, 235-2, 235-3 is comprised and sending the made selection to the agent node 210, 500.
The method 800 may further comprise selecting which performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3 the agent node 210, 500 is to utilise for representing the performance of the radio cell 215, 235-1, 235-2, 235-3 and sending the made selection to the agent node 210, 500, in some embodiments.
In some embodiments, any, some or all method steps 801 -805 may be iterated infinitely, for a limited period of time, or until a threshold limit is achieved.
Figure 9 illustrates an embodiment of a trainer node 400 for determining a power control policy 430 to be utilised by an agent node 210, 500 for downlink power control of a radio cell 215, 235-1, 235-2, 235-3 of a communication system 200. The trainer node 400 is configured to perform the method 800 according to any, some, all, or at least one of the enumerated method steps 801-805, according to some embodiments.
The trainer node 400 is thus configured to receive a training data message 420, associated with the radio cell 215, 235-1, 235-2, 235-3, from the agent node 210, 500, wherein the training data message 420 comprises one or more in the group of: a feature representing a state of at least a part of the communication system 200 at a first time period, a power control action performed by the agent node 210, 500 in the radio cell 215, 235-1, 235-2, 235-3 at the first time period, a feature representing the state at the second time period, and a performance measurement. The trainer node 400 is also configured to store the received training data message 420 in a database 410 associated with the radio cell 215, 235-1, 235- 2, 235-3. Further, the trainer node 400 is configured to determine the power control policy 430 for the radio cell 215, 235-1, 235-2, 235-3, based on at least one training data message 420, stored in the database 410. In addition, the trainer node 400 is further configured to transmit the determined power control policy 430 to the agent node 210, 500. In some embodiments, the trainer node 400 may be configured to determine an exploration- to-exploitation control parameter, associated with a probability of applying the determined power control policy 430, e.g. based on a time period parameter in some embodiments; and wherein the determined exploration-to-exploitation control parameter is transmitted to the agent node 210, 500 together with the determined power control policy 430.
In addition, the trainer node 400 may be configured to determine the exploration-to- exploitation control parameter so that the probability of applying the determined power control policy 430 is increased over time. Also, the trainer node 400 may be further configured to compute a performance measurement associated with at least a part of the communication system 200, based on the performance measurement received in the training data message 420 received from the agent node 210, 500 and another performance measurement received from another radio network node 230, 500 in the communication system 200.
Furthermore, the trainer node 400 may be configured to select which at least one feature the agent node 210, 500 is to utilise for representing the state of at least a part of the communication system 200 and provide the made selection to the agent node 210, 500.
The trainer node 400 may in addition be configured to select which performance measurement associated with the radio cell 215, 235-1, 235-2, 235-3, the agent node 210, 500 is to utilise for representing the performance of the radio cell 215, 235-1, 235-2, 235-3 and provide the made selection to the agent node 210, 500.
For enhanced clarity, any internal electronics or other components of the trainer node 400, not completely indispensable for understanding the herein described embodiments has been omitted from Figure 9. The trainer node 400 comprises a receiver 910, configured for receiving e.g. the training data message 420, from one or more agent nodes 210, 500.
Further, the trainer node 400 comprises a processor 920, configured for determining a power control policy 430 to be utilised by an agent node 210, 500 for downlink power control of a radio cell 215, 235-1, 235-2, 235-3 of a communication system 200, by performing at least some steps 801-805 of the described method 800.
Such processor 920 may comprise one or more instances of a processing circuit, i.e. a Central Processing Unit (CPU), a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC), a microprocessor, or other processing logic that may interpret and execute instructions. The herein utilised expression "processor" may thus represent a processing circuitry comprising a plurality of processing circuits, such as, e.g., any, some or all of the ones enumerated above. Further, the trainer node 400 may in some embodiments comprise a transmitter 930, configured for transmitting various signals and instructions, e.g. comprising the determined power control policy 430, to be received by the agent node 210, 500; or possibly another training node.
In further addition, the trainer node 400 may comprise at least one memory 925, according to some embodiments. The optional memory 925 may comprise a physical device utilised to store data or programs, i.e., sequences of instructions, on a temporary or permanent basis. According to some embodiments, the memory 925 may comprise integrated circuits comprising silicon-based transistors. Further, the memory 925 may be volatile or nonvolatile.
At least a sub-set of the previously described method steps 801-805 to be performed in the trainer node 400 may be implemented through the one or more processing circuits 920 in the trainer node 400, together with a computer program product for performing the functions of at least some of the method steps 801 -805. Thus a computer program product, comprising instructions for performing the method steps 801 -805 may determine the power control policy 430 to be utilised by the agent node 210, 500 for downlink power control of the radio cell 215, 235-1, 235-2, 235-3 in the communication system 200.
The computer program mentioned above may be provided for instance in the form of a data carrier carrying computer program code for performing at least some of the method steps 801 -805 according to some embodiments when being loaded into the processor 920. The data carrier may be, e.g., a hard disk, a CD ROM disc, a memory stick, an optical storage device, a magnetic storage device or any other appropriate medium such as a disk or tape that may hold machine readable data in a non-transitory manner. The computer program product may furthermore be provided as computer program code on a server and downloaded to the trainer node 400 remotely, e.g., over an Internet or an intranet connection.
The terminology used in the description of the embodiments as illustrated in the accompanying drawings is not intended to be limiting of the described agent node 210, method 600 therein, trainer node 400, or method 800 therein. Various changes, substitutions and/ or alterations may be made, without departing from the invention as defined by the appended claims.
As used herein, the term "and/ or" comprises any and all combinations of one or more of the associated listed items. The term "or" as used herein, is to be interpreted as a mathematical OR, i.e., as an inclusive disjunction; not as a mathematical exclusive OR (XOR), unless expressly stated otherwise. In addition, the singular forms "a", "an" and "the" are to be interpreted as "at least one", thus also possibly comprising a plurality of entities of the same kind, unless expressly stated otherwise. It will be further understood that the terms "includes", "comprises", "including" and/ or "comprising", specifies the presence of stated features, actions, integers, steps, operations, elements, and/ or components, but do not preclude the presence or addition of one or more other features, actions, integers, steps, operations, elements, components, and/ or groups thereof. A single unit such as e.g. a processor may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored/ distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms such as via Internet or other wired or wireless communication system. Finally, it should be understood that the present invention is not limited to the embodiments described above, but also relates to and incorporates all embodiments within the scope of the appended independent claims.

Claims

1. An agent node (210, 500) for downlink power control of a radio cell (215, 235-1, 235-2, 235-3) of a communication system (200); wherein the agent node (210, 500) is configured to:
obtain a power control policy (430);
determine at least one feature representing a state of at least a part of the communication system (200), at a first time period;
determine a power control action to be performed for downlink power control in the radio cell (215, 235-1, 235-2, 235-3) at the first time period, out of a set of available power control actions associated with the radio cell (215, 235-1, 235-2, 235-3), based on the obtained power control policy (430) and the determined at least one feature; and
configure a downlink transmission power instruction of the radio cell (215, 235-1, 235-2, 235-3) based on the determined power control action.
2. The agent node (210, 500) according to claim 1, further configured to:
determine a feature representing the state of the part of the communication system (200), at a second time period;
determine a performance measurement, associated with the performance within the radio cell (215, 235-1, 235-2, 235-3); and
transmit a training data message (420) to a trainer node (400), comprising one or more in the group of:
the determined feature representing the state at the first time period, the determined power control action performed at the first time period, the determined feature representing the state at the second time period, and the determined performance measurement;
and wherein the obtained power control policy (430) is received from the trainer node (400).
3. The agent node (210, 500) according to any of claim 1 or claim 2, wherein a radio network node (230) controlling the radio cell (235-1, 235-2, 235-3) is not co-located with the agent node (210); and wherein the agent node (210, 500) is further configured to:
transmit the configured downlink transmission power instruction of the radio cell (235-1, 235-2, 235-3) to the radio network node (230) for downlink power control of the radio cell (235-1, 235-2, 235-3) of the radio network node (230).
4. The agent node (210, 500) according to any of claims 1-3, wherein the feature representing the state of at least a part of the communication system (200), is determined based on any of:
a measurement related to received signal quality made by and received from a user device (220) in the radio cell (215, 235-1, 235-2, 235-3);
a measurement related to received signal quality made by and received from a user device (220) in another radio cell (215, 235-1, 235-2, 235-3);
a measurement related to downlink transmission power of the radio cell (215, 235- 1, 235-2, 235-3) made by and obtained from the radio network node (210, 230, 500) controlling the radio cell (215, 235-1, 235-2, 235-3);
a measurement related to a number of active user devices (220) in the radio cell (215, 235-1, 235-2, 235-3);
a measurement related to types, or distribution, of traffic within the radio cell (215, 235-1, 235-2, 235-3);
a measurement related to location, or distribution, of user devices (220) in the radio cell (215, 235-1, 235-2, 235-3); or
a performance measurement, associated with the performance within the radio cell (215, 235-1, 235-2, 235-3).
5. The agent node (210, 500) according to any of claims 2-4, further configured to: compute a performance measurement associated with at least a part of the communication system (200), based on the determined performance measurement and at least one other network performance measurement received from another radio network node (230, 500) in the communication system (200);
and wherein the training data message (420) transmitted to the trainer node (400) comprises the computed performance measurement.
6. The agent node (210, 500) according to any of claims 1 -5, wherein the obtained power control policy (430) further comprises an exploration-to-exploitation control parameter, associated with a probability of applying the determined power control policy (430); and wherein the agent node (210, 500) is further configured to:
determine application of the obtained power control policy (430), based on the obtained exploration-to-exploitation control parameter.
7. The agent node (210, 500) according to any of claims 2-6, wherein a radio network node (500) controlling the radio cell (235-1, 235-2, 235-3) is another agent node (500), the training data message (420) transmitted to the trainer node (400) comprises a training data message (420) received from the other agent node (500); and wherein the agent node (210, 500) is further configured to forward the power control policy (430) to be utilised for downlink power control in the radio cell (235-1, 235-2, 235-3) of the other agent node (500) in the communication system (200), received from the trainer node (400), to the other agent node (500).
8. The agent node (210, 500) according to any of claims 2-7, further configured to iterate the determination of the feature representing the state of at least a part of the communication system (200); the determination of the power control action; the configuration of the downlink transmission power instruction; the determination of the performance measurement; the transmission of the training data message (420), or a plurality of training data messages (420), to the trainer node (400) and the obtaining of the power control policy (430).
9. A method (600) in an agent node (210, 500) for downlink power control of a radio cell (215, 235-1, 235-2, 235-3) of a communication system (200); which method (600) comprises:
obtaining (601) a power control policy (430);
determining (604) at least one feature representing a state of at least a part of the communication system (200), at a first time period;
determining (605) a power control action to be performed for downlink power control in the radio cell (215, 235-1, 235-2, 235-3) at a first time period, out of a set of available power control actions associated with the radio cell (215, 235-1, 235-2, 235-3), based on the obtained (601) power control policy (430); and
configuring (606) a downlink transmission power instruction of the radio cell (215, 235-1, 235-2, 235-3) based on the determined (605) power control action.
10. A computer program with a program code for performing a method (600) according to claim 9, when the computer program runs on a computer.
1 1. A trainer node (400), for determining a power control policy (430) to be utilised by an agent node (210, 500) for downlink power control of a radio cell (215, 235-1, 235-2, 235-3) of a communication system (200); which trainer node (400) is configured to:
receive a training data message (420), associated with the radio cell (215, 235-1, 235-2, 235-3), from the agent node (210, 500), wherein the training data message (420) comprises one or more in the group of:
a feature representing a state of at least a part of the communication system (200) at a first time period,
a power control action performed by the agent node (210, 500) in the radio cell (215, 235-1, 235-2, 235-3) at the first time period,
a feature representing the state at the second time period, and a performance measurement;
store the received training data message (420) in a database (410) associated with the radio cell (215, 235-1, 235-2, 235-3);
determine a power control policy (430) for the radio cell (215, 235-1, 235-2, 235-
3), based on at least one training data message (420), stored in the database (410); and
transmit the determined power control policy (430) to the agent node (210, 500).
12. The trainer node (400) according to claim 11 , further configured to:
determine an exploration-to-exploitation control parameter, associated with a probability of applying the determined power control policy (430); and wherein the determined exploration-to-exploitation control parameter is transmitted to the agent node (210, 500) together with the determined power control policy (430).
13. The trainer node (400) according to any of claim 1 1 or claim 12, further configured to determine the power control policy (430) for the radio cell (215, 235-1, 235-2, 235-3), represented by one or more of:
an indication of a neural network architecture,
a parameter describing the number of inputs to the first layer of the neural network, a set of parameters describing the number of layers of the neural network, a set of parameters describing the number of neurons in different layers of the neural network, a parameter indicating the type of non-linear activation function used at each layer such as an s-shaped function like the sigmoid or the hyperbolic tangent, or a rectified linear unit, or an exponential linear unit,
a set of parameters describing the connections between units of consecutive or non- consecutive layers in the neural network,
a set of indicators to at least one neural network index,
a set of neural network weights, a number of neural networks to be configured for power control,
a voting policy to be configured for determining the power control action when the power control policy (430) comprises multiple neural networks.
14. A method (800) in a trainer node (400), for determining a power control policy (430) to be utilised by an agent node (210, 500) for downlink power control of a radio cell (215, 235-1, 235-2, 235-3) of a communication system (200); which method (800) comprises:
receiving (801) a training data message (420), associated with the radio cell (215, 235-1, 235-2, 235-3), from the agent node (210, 500), wherein the training data message (420) comprises one or more in the group of:
a feature representing a state of at least a part of the communication system (200) at a first time period,
a power control action performed by the agent node (210, 500) in the radio cell (215, 235-1, 235-2, 235-3) at the first time period,
a feature representing the state at the second time period, and a performance measurement;
storing (802) the received (801) training data message (420) in a database (410), associated with the radio cell (215, 235-1 , 235-2, 235-3);
determining (803) a power control policy (430) for the radio cell (215, 235-1, 235- 2, 235-3), based on at least one training data message (420), stored (802) in the database (410); and
transmitting (805) the determined (803) power control policy (430) to the agent node (210, 500).
15. A computer program with a program code for performing a method (800) according to claim 14, when the computer program runs on a computer.
PCT/EP2016/074618 2016-10-13 2016-10-13 Method and device in a wireless communication network for downlink power control WO2018068858A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/074618 WO2018068858A1 (en) 2016-10-13 2016-10-13 Method and device in a wireless communication network for downlink power control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/074618 WO2018068858A1 (en) 2016-10-13 2016-10-13 Method and device in a wireless communication network for downlink power control

Publications (1)

Publication Number Publication Date
WO2018068858A1 true WO2018068858A1 (en) 2018-04-19

Family

ID=57136880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2016/074618 WO2018068858A1 (en) 2016-10-13 2016-10-13 Method and device in a wireless communication network for downlink power control

Country Status (1)

Country Link
WO (1) WO2018068858A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992002866A1 (en) * 1990-08-03 1992-02-20 E.I. Du Pont De Nemours And Company Computer neural network process measurement and control system and method
US20090042593A1 (en) * 2007-08-10 2009-02-12 Qualcomm Incorporated Adaptation of transmit power for neighboring nodes
US20140215242A1 (en) * 2013-01-29 2014-07-31 Broadcom Corporation Wearable Device-Aware Supervised Power Management for Mobile Platforms
WO2015100159A1 (en) * 2013-12-26 2015-07-02 Qualcomm Incorporated Methods and apparatus for joint power and resource management

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992002866A1 (en) * 1990-08-03 1992-02-20 E.I. Du Pont De Nemours And Company Computer neural network process measurement and control system and method
US20090042593A1 (en) * 2007-08-10 2009-02-12 Qualcomm Incorporated Adaptation of transmit power for neighboring nodes
US20140215242A1 (en) * 2013-01-29 2014-07-31 Broadcom Corporation Wearable Device-Aware Supervised Power Management for Mobile Platforms
WO2015100159A1 (en) * 2013-12-26 2015-07-02 Qualcomm Incorporated Methods and apparatus for joint power and resource management

Similar Documents

Publication Publication Date Title
US11071122B2 (en) Method and unit for radio resource management using reinforcement learning
US9648569B2 (en) System and method to facilitate small cell uplink power control in a network environment
CN105898849B (en) Transmission power management design and implementation mode
US10159048B2 (en) System and method to facilitate small cell uplink power control in a network environment
CN107196682B (en) Method, user equipment and base station for performing measurement in communication system
Lee et al. Intercell interference coordination for LTE systems
WO2020181533A1 (en) Device, method and computer readable medium for adjusting beamforming profiles
CN104519002A (en) Methods and devices for determining effective mutual information
EP2870712B1 (en) Method and access point for assigning sounding resources
WO2013102485A1 (en) Power control in a wireless communication system for uplink transmissions with coordinated reception
JP2016524424A (en) Timely activation of relays in cloud radio access networks
EP3117669A1 (en) Method and apparatus for uplink power control in a radio communication network
WO2021048594A1 (en) Methods for block error rate target selection for a communication session and related apparatus
WO2012093057A1 (en) Method and a device for adjusting the transmission power of signals transferred by plural mobile terminals
WO2018068858A1 (en) Method and device in a wireless communication network for downlink power control
WO2016133438A1 (en) First network node, second network node, first wireless device and methods therein, for determining a prediction of an interference
Kuang Joint user association and reuse pattern selection in heterogeneous networks
CN107615675A (en) Reference signal power measurement filtering
Daeinabi et al. A fuzzy Q-learning approach for enhanced intercell interference coordination in LTE-Advanced heterogeneous networks
Adeel et al. Performance analysis of random neural networks in lte-ul of a cognitive radio system
Gao et al. A fairness guaranteed dynamic PF scheduler in LTE-A networks
Parruca et al. On semi-static interference coordination under proportional fair scheduling in LTE systems
EP2836004A1 (en) Methods and apparatuses for determining a policy for reporting a cell performance indicator
EP4327249A1 (en) Dynamic pucch format configuration using machine learning
Chaves et al. Distributed power control for QoS-flexible services in wireless communication networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16781775

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16781775

Country of ref document: EP

Kind code of ref document: A1