EP4316117A1

EP4316117A1 - Devices and methods for efficient distributed multiple access in wireless networks

Info

Publication number: EP4316117A1
Application number: EP21716692.5A
Authority: EP
Inventors: Apostolos Destounis; Dimitrios TSILIMANTOS; Merouane Debbah
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2024-02-07
Also published as: WO2022207109A1; CN117121610A

Abstract

A wireless access point (110) comprises a communication interface (113) configured to receive a plurality of data packets from a plurality of wireless terminals (120) operating an adjustable transmission policy for transmitting the plurality of data packets. The wireless access point (110) further comprises a processing circuitry (111) configured to determine, based on the plurality of data packets, a first transmission policy performance metric of the plurality of wireless terminals (120). The communication interface (113) is further configured to send the first transmission policy performance metric to the plurality of wireless terminals (120) to adjust the adjustable transmission policy operated by a respective wireless terminal (120). Moreover, a corresponding wireless terminal (120) is disclosed.

Description

Devices and methods for efficient distributed multiple access in wireless networks

TECHNICAL FIELD

The present disclosure relates to wireless communication networks in general. More specifically, the present disclosure relates to devices and methods for efficient distributed multiple access in wireless communication networks.

BACKGROUND

In the upcoming next generation wireless networks, an immense number of devices is expected to be connected to each wireless access point (AP). For example, for Internet-of- Things (loT) wireless networks, forecasts predict one million wireless devices per cellular base station. In this type of settings, connectivity is primarily aimed at establishing central authentication, security, and management of those devices. However, fine-tuned coordination functionalities and in particular medium access, a key aspect for wireless networks, are considered very expensive to be handled centrally, since the access point would need to collect a bulky state information for each device and solve large-scale optimization problems. For these reasons, it is anticipated that communications will rely on uncoordinated access, i.e. , a channel will be dedicated to access and each device will decide individually, without schedule grants from the access point, which transmission policy to use.

Traditional protocols that can operate in the setting of distributed medium access, where multiple devices contend for uplink transmissions, are based on random access. Historically, pure ALOHA (Additive Links On-line Hawaii Area) was the first such protocol, in which a user transmits with a probability p. This was later extended to slotted-ALOHA, which used synchronization to double user throughput. A more mature random access protocol is the Carrier Sense Multiple Access (CSMA), in which the transmitter checks whether the medium is idle before sending. Also, in the enhanced version with collision avoidance (CSMA/CA), the transmitter “backs-off” (selects a smaller probability of access) every time there is a collision. All these random access protocols suffer from collisions and idle time, and therefore achieve lower throughput than the maximum possible. In an effort to improve the throughput achievable by uncoordinated access, many algorithmic ideas have been proposed as extensions of CSMA protocols, but their convergence to good solutions is very slow in practice and their delay performance is poor.

Distributed medium access becomes even more attractive with the recent success of Deep Neural Networks (DNNs), which have the potential to provide better performance than the existing protocols, similarly to many other applications that require solutions to very complex optimization problems. The need to have a decision-making framework for distributed medium access that adapts to the dynamic environment has naturally led to solutions that combine the representation power of DNNs with ideas from Reinforcement Learning (RL). RL is an area of machine learning where devices/agents learn an optimal behavior based on their trial-and-error interaction with the environment, and eventually learn to maximize a long term objective. The original RL methods suffer from the "curse of dimensionality", which limits their application to models with a small number of states.

Even if a large state space is no longer a prohibitive factor, there is another challenge in RL- based distributed medium access that needs to be addressed. Besides the dynamics of the environment, each agent must also take into account the behavior of other agents, where intuitively, a device must learn to predict the decisions of other devices and take strategic decisions in response. Multi-agent RL (MARL) is the category of RL that addresses the decision-making problem while having at the same time multiple independent agents that interact with the environment and with each other. Convergence can be a problem in this setting, and often it is assumed that there is centralized training, which however is impractical for medium access since it works only for a fixed number of agents. Another approach is to have distributed training on each device, as if each device was the only device in the network running an RL algorithm, but this usually leads to unfair medium utilization, which is a poor performance indicator for wireless networks.

In light of the above, there is a need for improved devices and methods for efficient distributed multiple access in wireless communication networks.

SUMMARY

It is an objective of the present disclosure to provide improved devices and methods for efficient distributed multiple access in wireless communication networks.

The foregoing and other objectives are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect, a wireless access point is provided. The wireless access point comprises a communication interface configured to receive a plurality of data packets from a plurality of wireless terminals (also referred to as non-AP stations), wherein each wireless terminal operates an adjustable transmission, i.e. medium access policy, for transmitting the plurality of data packets using the adjustable medium access policy to the wireless access point. Moreover, the wireless access point comprises a processing circuitry configured to determine, based on the plurality of data packets, a first (global) transmission, i.e. medium access policy performance metric of the plurality of wireless terminals. For instance, the processing circuitry may be configured to determine, based on the plurality of data packets, a first transmission, i.e. medium access policy performance metric value associated with the plurality of wireless terminals.

The communication interface of the wireless access point is further configured to send, i.e. broadcast, the first (global) transmission, i.e. medium access policy performance metric value to the plurality of wireless terminals for adjusting the adjustable transmission, i.e. medium access policy operated by a respective wireless terminal.

Thus, advantageously, a wireless access point for efficient distributed multiple access in wireless communication networks is provided.

In a further possible implementation form of the first aspect, the first (global) transmission, i.e. medium access policy performance metric value is a delay value of the plurality of data packets.

In a further possible implementation form of the first aspect, the processing circuitry is configured to determine based on the plurality of data packets the delay value as an average of the delays of the plurality of data packets from the plurality of wireless terminals.

In a further possible implementation form of the first aspect, the communication interface is further configured to receive one or more further data packets from one or more legacy wireless terminals. The one or more legacy wireless terminals are configured to transmit the one or more further data packets using a fixed transmission policy, wherein the processing circuitry of the wireless access point is configured to determine, based on the plurality of data packets and the one or more further data packets, the delay value d using the following equation: where iV_AI denotes the number of the plurality of wireless terminals, d_n denotes the delay for each of the plurality of wireless terminals, l_egacy denotes the delay since the last time at least one of the one or more legacy wireless terminals has transmitted and w denotes a predefined weight. As will be appreciated, the weight w allows controlling the balance of access time utilization between the two classes of Al-based and legacy wireless terminals.

In a further possible implementation form of the first aspect, the fixed transmission policy used by the one or more legacy wireless terminals comprises a CSMA (carrier sense multiple access) or ALOHA medium access policy. ALOHA is a medium access policy for transmission of data via a shared network channel. It operates in the medium access control sublayer (MAC sublayer) of the open systems interconnection (OSI) model. Using this protocol, several data streams originating from multiple nodes are transferred through a multi-point transmission channel. In ALOHA, each node or station transmits a packet without trying to detect whether the transmission channel is idle or busy. If the channel is idle, then the packets will be successfully transmitted. If two packets attempt to occupy the channel simultaneously, collisions of packets will occur and the packets will be discarded. These stations may choose to retransmit the corrupted packets repeatedly until successful transmission occurs.

According to a second aspect, a method for operating a wireless access point is provided. The method according to the second aspect comprises the following steps performed by the wireless access point: receiving a plurality of data packets from a plurality of wireless terminals, each wireless terminal operating an adjustable transmission, i.e. medium access policy for transmitting the plurality of data packets; determining based on the plurality of data packets a first (global) transmission, i.e. medium access policy performance metric value of the plurality of wireless terminals; and sending the first (global) transmission, i.e. medium access policy performance metric value to the plurality of wireless terminals for adjusting the adjustable transmission, i.e. medium access policy operated by a respective wireless terminal.

According to third aspect, a wireless terminal is provided. The wireless terminal according to the third aspect comprises a communication interface configured to send one or more data packets to a wireless access point using an adjustable transmission, i.e. medium access policy. The communication interface is further configured to receive from the wireless access point a first (global) transmission, i.e. medium access policy performance metric value of the wireless terminal and a plurality of further wireless terminals based on the one or more data packets and a plurality of further data packets. The wireless terminal according to the third aspect further comprises a processing circuitry configured to determine based on the one or more data packets a terminal-specific second (local) transmission, i.e. medium access policy performance metric value and to adjust the adjustable transmission, i.e. medium access policy based on the first transmission, i.e. medium access policy performance metric value and the terminal-specific second transmission, i.e. medium access policy performance metric value.

In a further possible implementation form of the third aspect, the processing circuitry is further configured to operate a neural network, wherein the neural network is configured to adjust the adjustable transmission, i.e. medium access policy based on the first transmission, i.e. medium access policy performance metric value and the terminal-specific second transmission, i.e. medium access policy performance metric value.

In a further possible implementation form of the third aspect, the processing circuitry of the wireless terminal is further configured to implement a reinforcement learning algorithm for adjusting the adjustable transmission, i.e. medium access policy based on the first transmission, i.e. medium access policy performance metric value and the terminal-specific second transmission, i.e. medium access policy performance metric value, wherein a reward used by the reinforcement learning algorithm is based on, i.e. depends on the first transmission, i.e. medium access policy performance metric value and the terminal-specific second transmission, i.e. medium access policy performance metric value.

In a further possible implementation form of the third aspect, the first (global) transmission, i.e. medium access policy performance metric value is a delay value of the one or more data packets and the plurality of further data packets from the further plurality of wireless terminals.

In a further possible implementation form of the third aspect, the delay value is an average of the delays of the one or more data packets and the plurality of further data packets from the further plurality of wireless terminals.

In a further possible implementation form of the third aspect, the terminal-specific second transmission policy performance metric has a first value, if the one or more data packets have been successfully transmitted by the wireless terminal, and a second value, if the one or more data packets have been not successfully transmitted by the wireless terminal.

According to a fourth aspect, a method of operating a wireless terminal is provided. The method according to the fourth aspect comprises the following steps performed by the wireless terminal: sending one or more data packets to a wireless access point using an adjustable transmission, i.e. medium access policy; receiving from the wireless access point a first (global) transmission, i.e. medium access policy performance metric value of the wireless terminal and of a plurality of further wireless terminals based on the one or more data packets and a plurality of further data packets; determining, based on the one or more data packets, a terminal-specific second (local) transmission, i.e. medium access policy performance metric value; and adjusting the adjustable transmission, i.e. medium access policy based on the first transmission, i.e. medium access policy performance metric value and the terminal-specific second transmission, i.e. medium access policy performance metric value.

According to a fifth aspect, a computer program product is provided, comprising program code which causes a computer or a processor to perform the method according to the second aspect or the method according to the fourth aspect, when the program code is executed by the computer or the processor.

Details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, embodiments of the present disclosure are described in more detail with reference to the attached figures and drawings, in which:

Fig. 1 shows a schematic diagram illustrating a wireless communication network including a wireless access point according to an embodiment and a plurality of wireless terminals according to an embodiment;

Fig. 2 shows a schematic diagram illustrating further details of the wireless access point and a wireless terminal of figure 1 ;

Fig. 3 shows a schematic diagram illustrating processing stages implemented by a wireless terminal according to an embodiment;

Fig. 4 shows a flow diagram illustrating a method implemented by a wireless access point according to an embodiment; Fig. 5 shows a flow diagram illustrating a method implemented by a wireless terminal according to an embodiment;

Fig. 6 shows graphs illustrating the performance of a wireless access point according to an embodiment and a plurality of wireless terminals according to an embodiment; and

Fig. 7 shows graphs illustrating the performance of a wireless access point according to an embodiment and a plurality of wireless terminals according to an embodiment.

In the following, identical reference signs refer to identical or at least functionally equivalent features.

DETAILED DESCRIPTION OF THE EMBODIMENTS

In the following description, reference is made to the accompanying figures, which form part of the disclosure, and which show, by way of illustration, specific aspects of embodiments of the present disclosure or specific aspects in which embodiments of the present disclosure may be used. It is understood that embodiments of the present disclosure may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.

For instance, it is to be understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g. functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise. Figure 1 shows a schematic diagram illustrating a wireless communication network 100 including a wireless access point 110 according to an embodiment and a plurality of wireless terminals 120 according to an embodiment, which may include, for instance, wireless mobile phones 120, wireless tablet computers 120, wireless laptop computers 120, wireless loT devices 120 and the like. The wireless access point 110 may be connected, for instance, to the Internet for providing Internet access to the plurality of wireless terminals 120.

As illustrated in figure 1 , the wireless access point 110 comprises a communication interface 113 that may include one or more antennas configured to communicate with the plurality of wireless terminals 120. Each wireless terminal 120 is configured to operate an adjustable transmission, i.e. medium access policy for transmitting a plurality of data packets to the wireless access point 110 using the adjustable medium access policy. As will be described in more detail below, the processing circuitry 111 of the wireless access point 110 is configured to determine based on the plurality of data packets a first (global) transmission policy performance metric value of the plurality of wireless terminals 120. In an embodiment, the first transmission, i.e. medium access policy performance metric value may be a global, e.g. an averaged delay value of the plurality of data packets.

The processing circuitry 111 of the wireless access point 110 may be implemented in hardware and/or software. The hardware may comprise digital circuitry, or both analog and digital circuitry. Digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or general-purpose processors. As illustrated in figure 1, the wireless access point 110 may further comprise a non-transitory memory 115 configured to store data and executable program code which, when executed by the processing circuitry 111 causes the wireless access point 110 to perform the functions, operations and methods described herein.

The communication interface 113 of the wireless access point 110 is further configured to send, i.e. broadcast the first transmission, i.e. medium access policy performance metric value, e.g. the delay value of the plurality of data packets, to the plurality of wireless terminals 120 for adjusting the adjustable transmission, i.e. medium access policy operated by a respective wireless terminal 120. The memory 115 of the wireless access point 110 may be configured to cache the data packets.

As illustrated in figure 1, an exemplary wireless terminal 120 according to an embodiment comprises a communication interface 123 configured to send one or more data packets to the wireless access point 110 using the adjustable transmission, i.e. medium access policy. The communication interface 123 is further configured to receive from the wireless access point 110 the first transmission, i.e. medium access policy performance metric value indicative of the transmission performance of the exemplary wireless terminal 120 and the plurality of further wireless terminals 120 based on the one or more data packets transmitted by the exemplary wireless terminal 120 and a plurality of further data packets transmitted by the further wireless terminals 120. The exemplary wireless terminal 120 further comprises a processing circuitry 121 configured to determine based on the one or more data packets a terminal-specific second (local) transmission, i.e. medium access policy performance metric value and to adjust the adjustable transmission, i.e. medium access policy based on the first transmission, i.e. medium access policy performance metric value and the terminal-specific second transmission, i.e. medium access policy performance metric value. As used herein "terminal-specific" second transmission, i.e. medium access policy performance metric value means that each of the plurality of wireless terminals 120 may determine different, i.e. local or the same terminal-specific medium access policy performance metric values. The first transmission, i.e. medium access policy performance metric value is the same for all of the plurality of wireless terminals 120 and is, therefore, also referred to as first global transmission, i.e. medium access policy performance metric value.

In an embodiment, adjusting the adjustable transmission, i.e. medium access policy, may comprise adjusting a transmission parameter of the adjustable transmission policy or selecting a different transmission policy. As illustrated in figure 1, the wireless terminal 120 may further comprise a memory 125 for storing the one or more data packets.

The processing circuitry 121 of the wireless terminal 120 may be implemented in hardware and/or software. The hardware may comprise digital circuitry, or both analog and digital circuitry. Digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or general-purpose processors. The non-transitory memory 125 of the wireless terminal 120 may be configured to store data and executable program code which, when executed by the processing circuitry 121 causes the wireless terminal 120 to perform the functions, operations and methods described herein.

As will be described in more detail below in the context of figures 2 and 3, in an embodiment, the processing circuitry 121 of the wireless terminal 120 may be further configured to operate a neural network, wherein the neural network is configured to adjust the adjustable transmission, i.e. medium access policy based on the first and the terminal-specific second transmission, i.e. medium access policy performance metric value. In an embodiment, the processing circuitry 121 of the wireless terminal 120 is further configured to implement a reinforcement learning algorithm for adjusting the adjustable transmission, i.e. medium access policy based on the first and the terminal-specific second transmission, i.e. medium access policy performance metric value, wherein a reward used by the reinforcement learning algorithm is a weighted sum of the first and the terminal-specific second transmission policy performance metric value.

Thus, in an embodiment, artificial intelligence (Al) may be employed at a respective wireless terminal 120 for efficient decentralized medium access in the wireless communication network 100. As illustrated in figure 1, the wireless access point 110 may be further configured to communicate with one or more wireless legacy terminals 130 that operate using a substantially fixed medium access policy. Thus, embodiments disclosed herein promote fairness when both legacy wireless terminals 130 that use a fixed medium access policy, like CSMA or ALOHA, and the Al-based wireless terminals 120.

Figure 2 shows a schematic diagram illustrating further details of the wireless access point 110 and the exemplary wireless terminal 120 of figure 1 , while figure 3 illustrates processing stages implemented by the exemplary wireless terminal 120 according to an embodiment.

As can be taken from figure 2, in an embodiment, the processing circuitry 111 of the wireless access point 110 may implement a delay estimation module 111a configured to determine the first transmission, i.e. medium access policy performance metric value and provide this information for the plurality of wireless terminals 120 for allowing the wireless terminals 120 to properly update, i.e. adjust their transmission policies. In an embodiment, the delay estimation module 111a observes the outcome of the transmissions of the wireless terminals 120 (see processing block 113a in figure 2) and determines the first metric based on information from the transmissions of all wireless terminals 120 that contend for access at this specific access point 110, whose value is then broadcasted to the plurality of wireless terminals 120. In an embodiment, this first delay-inspired metric d may take into account the delay of delivered data packets of the wireless terminals 120 (and the legacy wireless terminals 130, as illustrated by processing block 131 in figure 2), i.e. the head-of-line waiting time of successfully delivered packets in the transmission queue of each terminal, on the basis of the following equation: where N_Al is the number of Al-based wireless terminals 120, d_n is the delay of each Al-based wireless terminal 120 (i.e. the local transmission metric), di_egacy is the delay since the last time a legacy wireless terminal 130 transmitted and w is a predefined weight that does not depend on the number of devices, which controls the balance between the utilization of access time between the two classes of Al-based wireless terminals 120 and the legacy wireless terminals 130. As will be appreciated, d expresses the weighted sum between the mean of delays of all Al-based wireless terminals 120 in the access point 110 and the delay since a legacy wireless terminal 130 transmitted to the access point 110. If the exact values of the delays cannot be obtained, estimates of them from the access point 110 can be used instead. The value of the metric d may be updated every time slot and transmitted to all Al- based wireless terminals 120 via a broadcast control signal generated by the delay estimation module 111a.

There are several possible options for the access point 110 to obtain the delay information required for the calculation of the first metric d. In one possible embodiment the access point 110 keeps an estimate iV_AI of the number of Al-based wireless terminals 120 within its memory 115 and then, a specific field in the packets of the MAC sublayer may be used to show when each packet was generated at the device or directly show its delay to be used. In an alternative embodiment, a specific field in the packets of the MAC sublayer may be used to show the delay of the next packet in the queue of the device, possibly along with the timestamp that shows the delay of the current packet. In a further embodiment, an additional control channel may be introduced, where new wireless terminals 120 that are active and wireless terminals 120 that become inactive notify the access point 110 accordingly, so that the access point 110 keeps an estimate of the delay of each active wireless terminal 120.

As illustrated in figure 2 and already described above, in an embodiment, the processing circuitry 121 of the exemplary wireless terminal 120 may implement an Al module 121a configured to optimize the transmission policy of the exemplary wireless terminal 120 based on the received control signal from the delay estimation module 111a of the access point. A more detailed illustration of the processing stages implemented by the Al module 121a is shown in figure 3. Upon reception of the broadcasted signal including the first delay-inspired method, the Al- based wireless terminal 120 determines a reward that will be attributed to its most recent decision (see processing block 121b in figure 2). In RL terminology, the reward r(s, a, s') is collected for this specific transition from an original state s to a new state s' by taking action a. Here, a state may include information about the wireless terminal 120 or its transmissions, such as the history of the outcome for a given number M of the last contention slots (collision, idleness, single transmission) and the action a may include decisions whether to remain idle or to transmit and which value to select for given transmission parameters. In one possible embodiment, the reward r may be calculated by the processing circuitry 121 of the wireless terminal 120 on the basis of the first delay-inspired metho and the terminal- specific second transmission policy performance metric value in the following way (see processing stage 301 of figure 3):

_ ( —d, if the device was idle or if the device transmitted successfully

\—d — c, if the device transmitted and the result was a collision where c is a pre-specified cost parameter for the Al-based wireless terminals 120 that penalizes transmissions leading to collisions. In other words, in an embodiment, the terminal- specific second transmission policy performance metric value may have a first value, such as zero, if the wireless terminal 120 was idle or the transmission was successful, and a second value, such as -c, if the transmission resulted in a collision, i.e. was not successful. The reward r may then be fed to a Deep RL (DRL) algorithm (see processing stage 303 in figure 3), along with the corresponding transition information (states, action) in order to update the transmission, i.e. access policy (see processing stage 305 in figure 3) by training the parameters of the neural network implemented by the processing circuitry 121 of the wireless terminal 120.

In an embodiment, the wireless communication network 100 may be a WiFi communication network 100. In the following an exemplary WiFi scenario will be described, where each wireless terminal 120 has a full buffer with packets to send. An RTS/CTS signaling scheme may be used for contention resolution among the wireless terminals 120 in order to deal with the hidden terminal problem. In an embodiment, a wireless terminal 120 may decide to transmit an RTS (Request-to-Send) packet or not and the access point 110 may broadcast a CTS (Clear-to-Send) packet if only one such packet is received. Then, the corresponding wireless terminal 120 may transmit its data packet over the air. In such an embodiment, the delay estimation module 111a may be implemented with the access point 110 knowing how many Al-enabled wireless terminals 120 are within the wireless communication network 100. Regarding the Al module 121a, the state x of each Al-based wireless terminal i may consist of a history window of the past M contention slots of the following form: where d is the delay for wireless terminal i at the t-th contention slot and y_t is the status of the channel in the (t - l)-th contention slot, i.e. idle, collision or single successful transmission.

In an embodiment, the specific RL algorithm used in the Al module 121a of the Al-based wireless terminal 120 may be the actor-critic algorithm for DQN networks. The above state is used as input and the neural networks have two outputs, one for each possible action (transmit an RTS packet or not). The actions may be taken from an actor neural network using an e-greedy strategy: with probability 1 - e the action whose corresponding output has the largest value is taken, and otherwise a random action is selected with probability e. The critic neural network may be a neural network with an identical architecture, where the trajectories of observations and rewards may be used to perform Q-learning, and its parameters are periodically copied to the actor network.

Fig. 4 shows a flow diagram illustrating a method 400 implemented by the wireless access point 110 according to an embodiment. The method 400 comprises the following steps (some of which already have been described above). A first step 401 of receiving a plurality of data packets from the plurality of wireless terminals 120, wherein each wireless terminal 120 is configured to operate an adjustable transmission policy for transmitting the plurality of data packets to the wireless access point 110. The method 400 comprises a further step 403 of determining based on the plurality of data packets the first (global) transmission policy performance metric of the plurality of wireless terminals 120. Moreover, the method 400 comprises a step 405 of sending the first (global) transmission policy performance metric to the plurality of wireless terminals 120.

Figure 5 shows a flow diagram illustrating a method 500 implemented by the exemplary wireless terminal 120 according to an embodiment. The method 500 comprises the following steps (some of which already have been described above). The method 500 comprises a first step 501 of sending one or more data packets to the wireless access point 110 using an adjustable transmission policy. The method 500 further comprises a step 503 of receiving from the wireless access point a first (global) transmission policy performance metric of the wireless terminal and a plurality of further wireless terminals based on the one or more data packets and a plurality of further data packets. Moreover, the method 500 comprises a step 505 of determining, based on the one or more data packets, a terminal-specific second (local) transmission policy performance metric. The method 500 comprises a further step 507 of adjusting the adjustable transmission policy based on the first transmission, i.e. medium access policy performance metric and the terminal-specific second transmission, i.e. medium access policy performance metric.

Figures 6 and 7 show graphs illustrating the performance of the wireless access point 110 according to an embodiment and the plurality of wireless terminals 120 according to an embodiment. Figures 6 and 7 are based on an exemplary WiFi scenario, where the packet length is 10 times the time needed for the RTS/CTS signaling. This implies that the maximum achievable throughput is almost 0.91.

Figure 6 illustrates the dependency of the throughput on the number of devices, i.e. wireless terminals 120 for the wireless communication network 100 according to an embodiment (referred to as "Proposed" in figure 6) and compares it with a conventional system, where the devices use a fixed CSMA protocol instead. Figure 6 shows the results averaged over 10 simulation runs. As can be taken from figure 6, the proposed wireless network 100 including the wireless access point 110 according to an embodiment and the plurality of wireless terminals 120 according to an embodiment outperforms the conventional CSMA-based system for more than 4 devices and achieves a throughput that is almost constant as the number of devices, i.e. wireless terminals 120 increases. This illustrates that the wireless terminals 120 are now able to learn how they can share the medium and mostly how to avoid collisions.

Figure 7 illustrates how a wireless terminal 120 according to an embodiment may co-exist with a legacy wireless terminal 130 using a fixed CSMA scheme. The result for the empirical average of the throughput for each device at one simulation run is shown in figure 7, where it is illustrated that the wireless communication network 100 enables a respective wireless terminal 120 to learn to share the medium with the legacy wireless terminal 130, allowing the latter to have an equal share of throughput while keeping the total system throughput high.

Embodiments disclosed herein enable a decentralized medium access in wireless networks, since they may be based on distributed learning where the wireless terminals 120 learn in parallel, each using its own observations. The cost signal that may be used by a respective wireless terminal 120 device enables fairness among Al-based devices. This is because, a single device using the channel for too long would lead to increased costs since the delays of the other devices will increase. Embodiments disclosed herein enable the co-existence of the wireless terminals 120 with legacy wireless terminals 130 (an increasing delay of legacy devices will increase the cost of the Al-based devices). Embodiments disclosed herein provide a high throughput by avoiding collisions through the introduced penalty for collisions in the reward function.

The person skilled in the art will understand that the "blocks" ("units") of the various figures (method and apparatus) represent or describe functionalities of embodiments of the present disclosure (rather than necessarily individual "units" in hardware or software) and thus describe equally functions or features of apparatus embodiments as well as method embodiments (unit = step).

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described embodiment of an apparatus is merely exemplary. For example, the unit division is merely logical function division and may be another division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.

Claims

1. A wireless access point (110), comprising: a communication interface (113) configured to receive a plurality of data packets from a plurality of wireless terminals (120) operating an adjustable transmission policy for transmitting the plurality of data packets; and a processing circuitry (111) configured to determine, based on the plurality of data packets, a first transmission policy performance metric of the plurality of wireless terminals (120), wherein the communication interface (113) is further configured to send the first transmission policy performance metric to the plurality of wireless terminals (120).

2. The wireless access point (110) of claim 1, wherein the first transmission policy performance metric is a delay of the plurality of data packets.

3. The wireless access point (110) of claim 2, wherein the processing circuitry (111) is configured to determine, based on the plurality of data packets, the delay as an average of the delays of the plurality of data packets from the plurality of wireless terminals (120).

4. The wireless access point (110) of claim 3, wherein the communication interface (113) is further configured to receive one or more further data packets from one or more legacy wireless terminals (130) transmitting the one or more further data packets using a fixed transmission policy, and wherein the processing circuitry (111) is configured to determine, based on the plurality of data packets and the one or more further data packets, the delay d using the following equation: where iV_AI denotes the number of the plurality of wireless terminals (120), d_n denotes the delay for each of the plurality of wireless terminals (120), l_egacy denotes the delay since the last time at least one of the one or more legacy wireless terminals (130) has transmitted and w denotes a predefined weight.

5. The wireless access point (110) of claim 4, wherein the fixed transmission policy used by the one or more legacy wireless terminals (130) comprises a Carrier Sense Multiple Access medium access policy or an ALOHA medium access policy.

6. A method (400) for operating a wireless access point (110), comprising the following steps implemented by the wireless access point (110): receiving (401) a plurality of data packets from a plurality of wireless terminals (120) operating an adjustable transmission policy for transmitting the plurality of data packets; determining (403), based on the plurality of data packets, a first transmission policy performance metric of the plurality of wireless terminals (120); and sending (405) the first transmission policy performance metric to the plurality of wireless terminals (120).

7. A wireless terminal (120), comprising: a communication interface (123) configured to: send one or more data packets to a wireless access point (110) using an adjustable transmission policy, and receive from the wireless access point (110) a first transmission policy performance metric of the wireless terminal (120) and a plurality of further wireless terminals (120) based on the one or more data packets and a plurality of further data packets; and a processing circuitry (121) configured to: determine, based on the one or more data packets, a terminal-specific second transmission policy performance metric, and adjust the adjustable transmission policy based on the first transmission policy performance metric and the terminal-specific second transmission policy performance metric.

8. The wireless terminal (120) of claim 7, wherein the processing circuitry (121) is further configured to operate a neural network (121a), and wherein the neural network (121a) is configured to adjust the adjustable transmission policy based on the first transmission policy performance metric and the terminal-specific second transmission policy performance metric.

9. The wireless terminal (120) of claim 7 or 8, wherein the processing circuitry (121) is further configured to implement a reinforcement learning algorithm for adjusting the adjustable transmission policy based on the first transmission policy performance metric and the terminal-specific second transmission policy performance metric, wherein a reward used by the reinforcement learning algorithm is based on the first transmission policy performance metric and the terminal-specific second transmission policy performance metric.

10. The wireless terminal (120) of claim 9, wherein the first transmission policy performance metric is a delay of the one or more data packets and the plurality of further data packets from the further plurality of wireless terminals.

11. The wireless terminal (120) of claim 10, wherein the delay is an average of the delays of the one or more data packets and the plurality of further data packets from the further plurality of wireless terminals (120).

12. The wireless terminal (120) of any one of claims 9 to 11, wherein the terminal-specific second transmission policy performance metric has a first value, if the one or more data packets have been successfully transmitted by the wireless terminal (120), and a second value, if the one or more data packets have been not successfully transmitted by the wireless terminal (120).

13. A method (500) of operating a wireless terminal (120), wherein the method (500) comprises the following steps implemented by the wireless terminal (120): sending (501) one or more data packets to a wireless access point (110) using an adjustable transmission policy; receiving (503) from the wireless access point (110) a first transmission policy performance metric of the wireless terminal (120) and a plurality of further wireless terminals (120) based on the one or more data packets and a plurality of further data packets; determining (505), based on the one or more data packets, a terminal-specific second transmission policy performance metric; and adjusting (507) the adjustable transmission policy based on the first transmission policy performance metric and the terminal-specific second transmission policy performance metric.

14. A computer program product comprising a computer-readable storage medium for storing program code which causes a computer or a processor to perform the method (400) of claim 6 or the method (500) of claim 13 when the program code is executed by the computer or the processor.