WO2022036642A1

WO2022036642A1 - Method and apparatus for beamforming

Info

Publication number: WO2022036642A1
Application number: PCT/CN2020/110291
Authority: WO
Inventors: Dongdong HUANG; Fenfen HU; Haibin Lu
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2022-02-24

Abstract

Various embodiments of the present disclosure provide a method for beamforming. The method which may be performed by a network node comprises obtaining channel information of one or more terminal devices with respect to the network node. In accordance with an embodiment, the method further comprises determining beamforming information of the one or more terminal devices, according to a Markov decision process based at least in part on the channel information.

Description

METHOD AND APPARATUS FOR BEAMFORMING

FIELD OF THE INVENTION

The present disclosure generally relates to communication networks, and more specifically, to a method and apparatus for beamforming.

BACKGROUND

This section introduces aspects that may facilitate a better understanding of the disclosure. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is in the prior art or what is not in the prior art.

Communication service providers and network operators have been continually facing challenges to deliver value and convenience to consumers by, for example, providing compelling network services and performance. With the rapid development of networking and communication technologies, wireless communication networks such as fourth generation (4G) /long term evolution (LTE) and fifth generation (5G) /new radio (NR) networks are expected to achieve large traffic capacity and high end-user data rate with lower latency. In order to meet dramatically increasing network requirements, one interesting option for communication technique development is to employ multiple antenna technology such as multiple input multiple output (MIMO) technology. Multiple antenna systems allow transmitting signals focused towards certain spatial regions. This creates beams (also referred to as beamforming) whose coverage can go beyond transmissions using non-beamformed signals. From the perspective of networks, it may be advantageous to achieve potentially performance gain by implementing proper beamforming in consideration of variability of wireless communication environments and the diversity of radio application scenarios.

SUMMARY

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

A wireless communication network such as a 5G/NR network may be able to support flexible network deployments and diverse communication services. Various signal propagation environments in the network may have different effects on the accuracy of channel estimation for communication devices. For example, the signal propagation environment of the indoor space may be different from that of the outdoor space, and some outdoor channel estimation technologies may not be applicable to the indoor environment with complex spatial topological relationship. Inaccurate channel estimation may lead to inaccurate beamforming weight calculation, which may degrade network performance. Therefore, it may be desirable to achieve accurate beamforming weight calculation effectively in various communication environments.

Various embodiments of the present disclosure propose a solution for beamforming, which can enable deep reinforcement learning (DRL) based beamforming weight finetune, for example, by using a Markov decision process (MDP) to obtain accurate beamforming information according to channel information of a user equipment (UE) , so as to improve throughput with reduced interference.

According to a first aspect of the present disclosure, there is provided a method performed by a network node (e.g., a base station) . The method comprises obtaining channel information of one or more terminal devices with respect to the network node. In accordance with an exemplary embodiment, the method further comprises determining beamforming information of the one or more terminal devices, according to a MDP based at least in part on the channel information.

In accordance with an exemplary embodiment, the channel information of the one or more terminal devices may be a state input of the MDP.

In accordance with an exemplary embodiment, the channel information of the one or more terminal devices may comprise: a channel matrix per terminal device, a channel impulse response per terminal device, or a beamforming weight per terminal device.

In accordance with an exemplary embodiment, the beamforming information of the one or more terminal devices may be an action output of the MDP.

In accordance with an exemplary embodiment, the beamforming information of the one or more terminal devices may comprise: a beamforming weight tune factor per terminal device, or a beamforming weight per terminal device.

In accordance with an exemplary embodiment, when the channel information of the one or more terminal devices is the channel matrix per terminal device or the channel impulse response per terminal device, the beamforming information of the one or more terminal devices may be the beamforming weight tune factor per terminal device or the beamforming weight per terminal device.

In accordance with an exemplary embodiment, when the channel information of the one or more terminal devices is the beamforming weight per terminal device, the beamforming information of the one or more terminal devices may be the beamforming weight tune factor per terminal device.

In accordance with an exemplary embodiment, the MDP may enable optimization of communication performance of the one or more terminal devices.

In accordance with an exemplary embodiment, the communication performance of the one or more terminal devices may comprise one or more of:

- a signal to interference plus noise ratio (SINR) expected per terminal device;

- an average SNIR expected for the one or more terminal devices;

- a data rate expected per terminal device;

- an average data rate expected for the one or more terminal devices;

- traffic throughput expected per terminal device; and

- an average traffic throughput expected for the one or more terminal devices.

In accordance with an exemplary embodiment, the MDP may use an action-state value function determined by a convolution neural network.

In accordance with an exemplary embodiment, the determination of the action-state value function may comprise determining one or more parameters of the action-state value function by iterated training according to:

- channel information of one or more training devices;

- beamforming information of the one or more training devices; and

- communication performance of the one or more training devices.

In accordance with an exemplary embodiment, the beamforming information of the one or more training devices may comprise a beamforming weight tune factor randomly generated for each of the one or more training devices.

In accordance with an exemplary embodiment, the convolution neural network may have a regression functionality to provide a continuous output.

In accordance with an exemplary embodiment, the channel information, the beamforming information and communication performance of the one or more terminal devices may be used as training data of the MDP.

According to a second aspect of the present disclosure, there is provided an apparatus which may be implemented as a network node. The apparatus may comprise one or more processors and one or more memories comprising computer program codes. The one or more memories and the computer program codes may be configured to, with the one or more processors, cause the apparatus at least to obtain channel information of one or more terminal devices with respect to the network node. According to some exemplary embodiments, the one or more memories and the computer program codes may be configured to, with the one or more processors, cause the apparatus at least further to determine beamforming information of the one or more terminal devices, according to a MDP based at least in part on the channel information.

In accordance with some exemplary embodiments, the one or more memories and the computer program codes may be configured to, with the one or more processors, cause the apparatus according to the second aspect of the present disclosure at least to perform any step of the method according to the first aspect of the present disclosure.

According to a third aspect of the present disclosure, there is provided a computer-readable medium having computer program codes embodied thereon which, when executed on a computer, cause the computer to perform any step of the method according to the first aspect of the present disclosure.

According to a fourth aspect of the present disclosure, there is provided an apparatus which may be implemented as a network node. The apparatus may comprise an obtaining unit and a determining unit. In accordance with some exemplary embodiments, the obtaining unit may be operable to carry out at least the obtaining step of the method according to the first aspect of the present disclosure. The determining unit may be operable to carry out at least the determining step of the method according to the first aspect of the present disclosure.

According to a fifth aspect of the present disclosure, there is provided a method performed by a terminal device (e.g., a UE) . The method comprises obtaining channel information of the terminal device with respect to a network node. In accordance with an exemplary embodiment, the method further comprises determining beamforming information of the terminal device, according to a MDP based at least in part on the channel information.

In accordance with an exemplary embodiment, the channel information of the terminal device may be a state input of the MDP.

In accordance with an exemplary embodiment, the channel information of the terminal device may comprise: a channel matrix of the terminal device, a channel impulse response of the terminal device, or a beamforming weight of the terminal device.

In accordance with an exemplary embodiment, the beamforming information of the terminal device may be an action output of the MDP.

In accordance with an exemplary embodiment, the beamforming information of the terminal device may comprise: a beamforming weight tune factor of the terminal device, or a beamforming weight of the terminal device.

In accordance with an exemplary embodiment, when the channel information of the terminal device is the channel matrix of the terminal device or the channel impulse response of the terminal device, the beamforming information of the terminal device may be the beamforming weight tune factor of the terminal device or the beamforming weight of the terminal device.

In accordance with an exemplary embodiment, when the channel information of the terminal device is the beamforming weight of the terminal device, the beamforming information of the terminal device may be the beamforming weight tune factor of the terminal device.

In accordance with an exemplary embodiment, the MDP may enable optimization of communication performance of the terminal device.

In accordance with an exemplary embodiment, the communication performance of the terminal device may comprise one or more of:

- a SINR expected for the terminal device;

- a data rate expected for the terminal device; and

- traffic throughput expected for the terminal device.

- channel information of one or more training devices;

- beamforming information of the one or more training devices; and

- communication performance of the one or more training devices.

In accordance with an exemplary embodiment, the channel information, the beamforming information and communication performance of the terminal device may be used as training data of the MDP.

According to a sixth aspect of the present disclosure, there is provided an apparatus which may be implemented as a terminal device. The apparatus comprises one or more processors and one or more memories comprising computer program codes. The one or more memories and the computer program codes may be configured to, with the one or more processors, cause the apparatus at least to obtain channel information of the terminal device with respect to a network node. According to some exemplary embodiments, the one or more memories and the computer program codes may be configured to, with the one or more processors, cause the apparatus at least further to determine beamforming information of the terminal device, according to a MDP based at least in part on the channel information.

In accordance with some exemplary embodiments, the one or more memories and the computer program codes may be configured to, with the one or more processors, cause the apparatus according to the sixth aspect of the present disclosure at least to perform any step of the method according to the fifth aspect of the present disclosure.

According to a seventh aspect of the present disclosure, there is provided a computer-readable medium having computer program codes embodied thereon which, when executed on a computer, cause the computer to perform any step of the method according to the fifth aspect of the present disclosure.

According to an eighth aspect of the present disclosure, there is provided an apparatus which may be implemented as a terminal device. The apparatus may comprise an obtaining unit and a determining unit. In accordance with some exemplary embodiments, the obtaining unit may be operable to carry out at least the obtaining step of the method according to the fifth aspect of the present disclosure. The determining unit may be operable to carry out at least the determining step of the method according to the fifth aspect of the present disclosure.

According to a ninth aspect of the present disclosure, there is provided a method implemented in a communication system which may include a host computer, a base station and a UE. The method may comprise providing user data at the host computer. Optionally, the method may comprise, at the host computer, initiating a transmission carrying the user data to the UE via a cellular network comprising the base station which may perform any step of the method according to the first aspect of the present disclosure.

According to a tenth aspect of the present disclosure, there is provided a communication system including a host computer. The host computer may comprise processing circuitry configured to provide user data, and a communication interface configured to forward the user data to a cellular network for transmission to a UE. The cellular network may comprise a base station having a radio interface and processing circuitry. The base station’s processing circuitry may be configured to perform any step of the method according to the first aspect of the present disclosure.

According to an eleventh aspect of the present disclosure, there is provided a method implemented in a communication system which may include a host computer, a base station and a UE. The method may comprise providing user data at the host computer. Optionally, the method may comprise, at the host computer, initiating a transmission carrying the user data to the UE via a cellular network comprising the base station. The UE may perform any step of the method according to the fifth aspect of the present disclosure.

According to a twelfth aspect of the present disclosure, there is provided a communication system including a host computer. The host computer may comprise processing circuitry configured to provide user data, and a communication interface configured to forward user data to a cellular network for transmission to a UE. The UE may comprise a radio interface and processing circuitry. The UE’s processing circuitry may be configured to perform any step of the method according to the fifth aspect of the present disclosure.

According to a thirteenth aspect of the present disclosure, there is provided a method implemented in a communication system which may include a host computer, a base station and a UE. The method may comprise, at the host computer, receiving user data transmitted to the base station from the UE which may perform any step of the method according to the fifth aspect of the present disclosure.

According to a fourteenth aspect of the present disclosure, there is provided a communication system including a host computer. The host computer may comprise a communication interface configured to receive user data originating from a transmission from a UE to a base station. The UE may comprise a radio interface and processing circuitry. The UE’s processing circuitry may be configured to perform any step of the method according to the fifth aspect of the present disclosure.

According to a fifteenth aspect of the present disclosure, there is provided a method implemented in a communication system which may include a host computer, a base station and a UE. The method may comprise, at the host computer, receiving, from the base station, user data originating from a transmission which the base station has received from the UE. The base station may perform any step of the method according to the first aspect of the present disclosure.

According to a sixteenth aspect of the present disclosure, there is provided a communication system which may include a host computer. The host computer may comprise a communication interface configured to receive user data originating from a transmission from a UE to a base station. The base station may comprise a radio interface and processing circuitry. The base station’s processing circuitry may be configured to perform any step of the method according to the first aspect of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure itself, the preferable mode of use and further objectives are best understood by reference to the following detailed description of the embodiments when read in conjunction with the accompanying drawings, in which:

Fig. 1A is a diagram illustrating an overview of LTE transmitter and receiver according to an embodiment of the present disclosure;

Fig. 1B is a diagram illustrating an exemplary multipath effect according to an embodiment of the present disclosure;

Fig. 2A is a diagram illustrating an exemplary beamforming finetune procedure according to an embodiment of the present disclosure;

Fig. 2B is a diagram illustrating an exemplary MDP architecture according to an embodiment of the present disclosure;

Fig. 2C is a diagram illustrating an exemplary convolution neural network architecture according to an embodiment of the present disclosure;

Fig. 2D is a diagram illustrating an exemplary convolution kernel extracting data feature process according to an embodiment of the present disclosure;

Fig. 2E is a diagram illustrating an exemplary pooling layer process according to an embodiment of the present disclosure;

Fig. 2F is a diagram illustrating an exemplary fully connected layer process according to an embodiment of the present disclosure;

Fig. 2G is a diagram illustrating another exemplary convolution neural network architecture according to an embodiment of the present disclosure;

Fig. 3A is a diagram illustrating an exemplary MDP according to an embodiment of the present disclosure;

Fig. 3B is a diagram illustrating an exemplary MDP model with a convolution neural network according to an embodiment of the present disclosure;

Fig. 4 is a flowchart illustrating a method according to some embodiments of the present disclosure;

Fig. 5 is a flowchart illustrating another method according to some embodiments of the present disclosure;

Fig. 6A is a block diagram illustrating an apparatus according to some embodiments of the present disclosure;

Fig. 6B is a block diagram illustrating another apparatus according to some embodiments of the present disclosure;

Fig. 7 is a block diagram illustrating a telecommunication network connected via an intermediate network to a host computer in accordance with some embodiments of the present disclosure;

Fig. 8 is a block diagram illustrating a host computer communicating via a base station with a UE over a partially wireless connection in accordance with some embodiments of the present disclosure;

Fig. 9 is a flowchart illustrating a method implemented in a communication system, in accordance with an embodiment of the present disclosure;

Fig. 10 is a flowchart illustrating a method implemented in a communication system, in accordance with an embodiment of the present disclosure;

Fig. 11 is a flowchart illustrating a method implemented in a communication system, in accordance with an embodiment of the present disclosure; and

Fig. 12 is a flowchart illustrating a method implemented in a communication system, in accordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION

The embodiments of the present disclosure are described in detail with reference to the accompanying drawings. It should be understood that these embodiments are discussed only for the purpose of enabling those skilled persons in the art to better understand and thus implement the present disclosure, rather than suggesting any limitations on the scope of the present disclosure. Reference throughout this specification to features, advantages, or similar language does not imply that all of the features and advantages that may be realized with the present disclosure should be or are in any single embodiment of the disclosure. Rather, language referring to the features and advantages is understood to mean that a specific feature, advantage, or characteristic described in connection with an embodiment is included in at least one embodiment of the present disclosure. Furthermore, the described features, advantages, and characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize that the disclosure may be practiced without one or more of the specific features or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments of the disclosure.

As used herein, the term “communication network” refers to a network following any suitable communication standards, such as new radio (NR) , long term evolution (LTE) , LTE-Advanced, wideband code division multiple access (WCDMA) , high-speed packet access (HSPA) , and so on. Furthermore, the communications between a terminal device and a network node in the communication network may be performed according to any suitable generation communication protocols, including, but not limited to, the first generation (1G) , the second generation (2G) , 2.5G, 2.75G, the third generation (3G) , 4G, 4.5G, 5G communication protocols, and/or any other protocols either currently known or to be developed in the future.

The term “network node” refers to a network device in a communication network via which a terminal device accesses to the network and/or obtains services therefrom. The network node may refer to a base station (BS) , an access point (AP) , a multi-cell/multicast coordination entity (MCE) , a controller or any other suitable device in a wireless communication network. The BS may be, for example, a node B (NodeB or NB) , an evolved NodeB (eNodeB or eNB) , a next generation NodeB (gNodeB or gNB) , a remote radio unit (RRU) , a radio header (RH) , a remote radio head (RRH) , a relay, a low power node such as a femto, a pico, and so forth.

Yet further examples of the network node comprise multi-standard radio (MSR) radio equipment such as MSR BSs, network controllers such as radio network controllers (RNCs) or base station controllers (BSCs) , base transceiver stations (BTSs) , transmission points, transmission nodes, positioning nodes and/or the like. More generally, however, the network node may represent any suitable device (or group of devices) capable, configured, arranged, and/or operable to enable and/or provide a terminal device access to a wireless communication network or to provide some service to a terminal device that has accessed to the wireless communication network.

The term “terminal device” refers to any end device that can access a communication network and receive services therefrom. By way of example and not limitation, the terminal device may refer to a mobile terminal, a user equipment (UE) , or other suitable devices. The UE may be, for example, a subscriber station, a portable subscriber station, a mobile station (MS) or an access terminal (AT) . The terminal device may include, but not limited to, portable computers, image capture terminal devices such as digital cameras, gaming terminal devices, music storage and playback appliances, a mobile phone, a cellular phone, a smart phone, a tablet, a wearable device, a personal digital assistant (PDA) , a vehicle, and the like.

As yet another specific example, in an Internet of things (IoT) scenario, a terminal device may also be called an IoT device and represent a machine or other device that performs monitoring, sensing and/or measurements etc., and transmits the results of such monitoring, sensing and/or measurements etc. to another terminal device and/or a network equipment. The terminal device may in this case be a machine-to-machine (M2M) device, which may in a 3rd generation partnership project (3GPP) context be referred to as a machine-type communication (MTC) device.

As one particular example, the terminal device may be a UE implementing the 3GPP narrow band Internet of things (NB-IoT) standard. Particular examples of such machines or devices are sensors, metering devices such as power meters, industrial machinery, or home or personal appliances, e.g. refrigerators, televisions, personal wearables such as watches etc. In other scenarios, a terminal device may represent a vehicle or other equipment, for example, a medical instrument that is capable of monitoring, sensing and/or reporting etc. on its operational status or other functions associated with its operation.

As used herein, the terms “first” , “second” and so forth refer to different elements. The singular forms “a” and “an” are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms “comprises” , “comprising” , “has” , “having” , “includes” and/or “including” as used herein, specify the presence of stated features, elements, and/or components and the like, but do not preclude the presence or addition of one or more other features, elements, components and/or combinations thereof. The term “based on” is to be read as “based at least in part on” . The term “one embodiment” and “an embodiment” are to be read as “at least one embodiment” . The term “another embodiment” is to be read as “at least one other embodiment” . Other definitions, explicit and implicit, may be included below.

Fig. 1A is a diagram illustrating an overview of LTE transmitter and receiver according to an embodiment of the present disclosure. In an LTE network, Orthogonal Frequency Division Multiplexing (OFDM) system transmitter and receiver may be designed e.g. as shown in Fig. 1A. In the transmitter, the input of channel coding is source bit streams. Various functional components of the transmitter may perform the following operations: modulating the channel encoded bit streams to generate complex-valued symbols, inserting the pilot or guard band, performing Inverse Discrete Fourier Transform (IDFT) such as Inverse Fast Fourier Transform (IFFT) to generate the time domain OFDM symbols, copying the IQ (In-phase/Quadrature) data stream from the end as a cyclic prefix (CP) , and up-converting the baseband IQ streams to the radio frequency (RF) stream. Then the radio wave may be propagated over a wireless channel. In the receiver, a radio unit may down convert the RF stream to the baseband IQ streams, and perform a reverse procedure with respect to the transmitter to get the output bits, e.g. including removing the CP first, transforming the rest of the IQ data from time domain to frequency domain via Discrete Fourier Transform (DFT) such as Fast Fourier Transform (FFT) , performing channel equalization, demodulation and channel decoding, etc.

In order to implement channel estimation, the first step in determining the least squares estimate is to extract the pilot symbols from their known locations within the received subframe. Because the values of these pilot symbols are known, the channel response at these locations can be determined using the least squares estimate. For example, the least squares estimate may be obtained by dividing the received pilot symbols by their expected values.

As an example, the value Y _(k) of a received complex symbol k may be represented as below:

Y _(k) = H _(k) X _(k) + noise (1)

where X _(k) is the value of a transmitted complex symbol k, H _(k) is a complex channel gain experienced by symbol k, and the symbol “noise” means the noise in the channel. Some known pilot symbols may be sent to estimate the channel for a subset of resource elements (REs) within a subframe. In particular, if pilot symbol Xp _(k) is sent in a RE, an instantaneous channel estimate ^～Hp _(k) for that RE can be computed using the following formula:

^～Hp _(k) = Yp _(k) /Xp _(k) = Hp _(k) + noise (2)

where k represents the kth symbol, Yp _(k) represents the received pilot symbol value, Xp _(k) represents the known transmitted pilot symbol value, ^～Hp _(k) is the true channel response for the RE occupied by the pilot symbol, Hp _(k) is the estimated channel response by the pilot symbol, and the symbol “noise” means the noise in the channel.

In accordance with an exemplary embodiment, a frequency domain channel equalizer (e.g., based on minimum mean squared error (MMSE) ) may be used to estimate a channel matrix with pilot information, and equalize the received IQ data. Then the IQ data may be demodulated to soft bits which may be further decoded into binary bits with Viterbi algorithm. As an example, according to the channel matrix H and the noise r, the beamforming weight W may be calculated as below:

W = H ^H (HH ^H + r) ^-1 (3)

In accordance with an exemplary embodiment, the beamforming weight may be used for downlink/uplink (DL/UL) digital beamforming as below:

where

represents the current channel matrix of a user;

represents the current beamforming weight;

represents the current user data;

represents a diagonal matrix; and

represents the received data by the user.

It can be understood that due to the complex topological relationship of indoor/outdoor space, the problem of multipath effect may be often encountered in many places, which may lead to the complex signal propagation environment. The complex spatial topological relationship of wireless communications may bring many constraints to channel estimation from two aspects of accuracy and availability.

Fig. 1B is a diagram illustrating an exemplary multipath effect according to an embodiment of the present disclosure. As shown in Fig. 1B, many reflected signals in a complex signal propagation environment may directly affect the measurement accuracy for a UE, resulting in a large channel estimation error. Considering the inaccurate channel estimation due to the complex spatial topology, the beamforming weight calculation for the UE may also be inaccurate. In this case, the current throughput of the UE may not be the largest and the throughput of the whole cell also may not be the largest. In addition, a UE in a communication network may bring interference to other UEs. Therefore, it may be advantageous to find the appropriate beamforming weight of each UE, which can minimize the mutual interference and maximize the throughput of the whole cell.

Various exemplary embodiments of the present disclosure propose a solution for beamforming, which may utilize state information (e.g., channel information, etc. ) of a user to calculate beamforming information of the user, e.g. based on DRL using an MDP, so as to implement beamforming weight finetune for the user with high accuracy and energy efficiency.

Due to the variety of communication environments, different UEs may have different current state features, which may be more obvious in 5G massive multiple-input multiple-output (MIMO) communications. In an embodiment, the state information of a system may be at least partly reflected by channel information (e.g., the current channel matrix, etc. ) of a UE. According to the current state information of the system, an MDP may be used to calculate the local/global reward (e.g., SINR or throughput, etc. ) , so that a beamforming weight tune factor of the UE may be learned or estimated to maximize the local/global reward. As an example, the MDP may have three elements, including a state (e.g., a channel matrix, etc. ) , an action (e.g., a beamforming weight tune factor, which is also called “factor” for short) and a reward (e.g., SINR, etc. )

According to the standard reinforcement learning process such as MDP, an agent in a learning system may obtain current state information S of the external environment, take the exploratory behavior A to the environment, and obtain the evaluation R (also called reward) of this action and the new environment state in the environmental feedback. If an action A of the agent results in a positive reward (immediate reward) R from the environment, the trend of the agent to generate this action may be strengthened in the future; otherwise, the trend of the agent to generate this action may be weakened. During the interaction between the control behavior of the learning system and the state and evaluation in the environmental feedback, the mapping strategy from state to action may be modified in the way of learning to optimize the system performance.

Fig. 2A is a diagram illustrating an exemplary beamforming finetune procedure according to an embodiment of the present disclosure. A DRL model may be used in the exemplary beamforming finetune procedure to find the current beamforming weight tune factor of a UE and make the UE’s throughput maximum. In an embodiment, the DRL model may be trained offline, e.g., according to historical data of beamforming. As shown in Fig. 2A, the current channel information such as a channel matrix of the UE may be used as the state input of the DRL model. The output of the DRL model may be a beamforming weight tune factor (also called beamforming tune factor) for calculating the beamforming weight of the UE. In an embodiment, the expectation local/global reward corresponding to the state input may also be the output of the DRL model. According to the exemplary beamforming finetune procedure, the beamforming weight tune factor generated by the DRL model may be used to calculate the final beamforming weight for the UE, so as to implement beamforming weight finetune of the UE. In an embodiment, the new online data (e.g. the current state, action and reward applied to the beamforming finetune procedure in the Fig. 2A) may be used as training data for the DRL model.

Various MDPs are the fundamental formalism for reinforcement learning (RL) problems as well as other learning problems in stochastic domains. MDPs are frequently used to model stochastic planning problems, game playing problems and autonomous robot control problems, etc. In fact, the MDPs become the standard formalism for learning sequential decision-making.

MDPs may not differ a lot from Markov processes. A Markov process is a stochastic process in which the past of the process is not important if the current state is known, because all the past information that may be useful to predict the future states are included in the present state. What separates the MDPs from the Markov processes is the fact that the MDPs have an agent that makes decisions which influence the system over time. The agent’s actions, given the current state, may affect the future states. Since other types of Markov chains, e.g. discrete-time and continuous-time Markov chains, do not include such a decision-making agent, the MDPs may be seen as an extension of the Markov chains.

For RL problems, an environment may be modeled through a set of states and these states may be controlled through actions. The objective is to maximize the performance representation called reward of an action. The decision maker, called the agent, is supposed to choose the best action to execute at a given state of the environment. When this process is repeated, it is known as an MDP. A finite MDP may be defined as a tuple of (S, A, R) , where S is a finite set of states, A is a finite set of actions, and R is a reward function depending only on the previous state-action pair. The MDP may provide a mathematical framework for solving decision-making problems.

Fig. 2B is a diagram illustrating an exemplary MDP architecture according to an embodiment of the present disclosure. As shown in Fig. 2B, the agent π may take an action A _t on the environment at time t, and obtain a reward R _t+1 and a new state S _t+1 at time t+1 as environmental feedback. According to the current environmental feedback, the agent may adjust the future actions to the environment, so as to optimize the system performance which may be reflected by the reward at least partly.

In accordance with an exemplary embodiment, the reward R _t can be determined according to the following algorithm:

where R _t is the sum of future rewards r _t+k+1, (k=0, 1, 2, …) , γ is a buckle factor in order to avoid the algorithm not converging due to time infinity.

In accordance with an exemplary embodiment, the expected reward R _t+1 after the action A _t=a in state S _t=s may be achieved according to an action-state value function as below:

Qt (s, a) =E [R _t+1|S _t=s, A _t=a] (7)

where Qt (s, a) represents the action-state value function with variables of state s and action a, and E [R _t+1|S _t=s, A _t=a] means the expectation obtained in the case of state s and action a at time t, which may reflect the reward R _t+1 in the future (i.e. at time t+1) .

As an example, a state transition probability P (r|s, a) related to action a, state s and reward r may be represented as below:

P (r|s, a) =P _r {R _t=r|S _t-1=s, A _t-1=a] (8)

The state transition probability P (r|s, a) may be translated to the probability P _r {R _t=r|S _t-1=s, A _t-1=a] of producing the reward r at time t given that the agent is handling the state s by executing the action a at time t-1. After the action a is executed, the current state s may transfer to another state according to the state transition probability P (r|s, a) and the reward r may be obtained correspondingly. The states may be some specific representation of the environment’s situations. According to the state transition probability, the agent can make decisions about what action (s) may be taken.

In accordance with an exemplary embodiment, the action-state value function may be determined by using a convolution neural network (e.g. a LeNet model, a ResNet50 model, etc. ) . The convolution neural network may have many advantages in data feature extraction with spatial continuity. For example, the convolution neural network may be responsible for extracting the spatial topological feature of data. The convolution kernel can make weight sharing to reduce network parameters and increase the receptive field.

Fig. 2C is a diagram illustrating an exemplary convolution neural network architecture according to an embodiment of the present disclosure. The exemplary convolution neural network architecture may be applicable for a LeNet model which may have 7 layers. It can be appreciated that the LeNet model shown in Fig. 2C is just an example, and more or less alternative layers may be included in various suitable models for a convolution neural network.

According to an exemplary embodiment, the convolution neural network may contain multiple layers with different functions, for example, including but not limited to a convolution layer, a pooling layer and a full connect layer. The convolution layer may be used to train convolution kernel parameters through back-propagation. The pooling layer may be responsible for dimension reduction in order to reduce network parameters. The full connect layer may be responsible for flattening data in order to perform regression (e.g. into SoftMax regression) .

Fig. 2D is a diagram illustrating an exemplary convolution kernel extracting data feature process according to an embodiment of the present disclosure. The convolution layer may be a core building block of a convolutional network that may do most of the computational heavy lifting. For simplicity, Fig. 2D only shows the two-dimensional convolution kernel which may be used to extract low complex features of data. It can be appreciated that other types of convolution kernels with suitable dimensions also may be used to extract data features. As shown in Fig. 2D, given a multidimensional input volume, a multidimensional output volume can be calculated according to certain filter parameters (e.g. W0 and W1 in Fig. 2D) and the corresponding bias parameters (e.g. b0 and b1 in Fig. 2D) . In an embodiment, the two-dimensional filter may slide to all positions on the data and multiply the data at each position. Different convolution kernels can extract different data features, such as point, edge, line, angle, position, shape and so on.

Fig. 2E is a diagram illustrating an exemplary pooling layer process according to an embodiment of the present disclosure. The exemplary pooling layer process may be used for dimension reduction. In an embodiment, a pooling layer may be periodically inserted in-between the successive convolution layers in a convolution neural network architecture, so as to progressively reduce the spatial size of the data representation. Thus, the number of parameters and computation in the network may be reduced and the problem of overfitting also can be controlled. The pooling layer may operate independently on every depth slice of the input and resize it spatially, e.g. using the MAX operation. A common form is a pooling layer with filters of size 2x2 applied with a stride of 2 down samples every depth slice in the input by 2 along both width and height, discarding 75%of the activations. In this case, every MAX operation may be taking a maximum over 4 numbers (e.g. a little 2x2 region in some depth slice, as shown by the MaxPooling operation from a feature map to feature-map1/feature-map2 in Fig. 2E) . The depth dimension may remain unchanged in an embodiment.

Fig. 2F is a diagram illustrating an exemplary fully connected layer process according to an embodiment of the present disclosure. The exemplary fully connected layer process may be used for fully connected layer flatten data. In an embodiment, neurons in a fully connected layer may have full connections to all activations in the previous layer, as in regular neural networks. Thus, the activations may be computed with a matrix multiplication followed by a bias offset, as shown by the flatten operation for feature-map1 and feature-map2 in Fig. 2F.

Fig. 2G is a diagram illustrating another exemplary convolution neural network architecture according to an embodiment of the present disclosure. In the exemplary convolution neural network architecture, a ResNet50 model having 50 layers may be used as a residual convolution neural network to extract channel matrix data features. The depth of the residual convolution neural network may have an impact on the data feature extraction. The deeper the network is, the more features it can extract, and the stronger the expression ability is. The more layers the network has, the more features it can extract at different levels. Moreover, the deeper the network is, the more abstract the feature is, and the more semantic and spatial information it has.

It can be appreciated that the dimension, size, number, parameters and values of the matrixes shown in Figs. 2C-2G are just examples, and according to different application requirements, other suitable dimension, size, number, parameters and/or values of the matrixes also may be applicable for various embodiments of the present disclosure.

Fig. 3A is a diagram illustrating an exemplary MDP according to an embodiment of the present disclosure. In the exemplary MDP, an action A _t to the environment made by an agent may result in a reward R _t+1 and the next state S _t+1, as shown in Fig. 3A. In accordance with an exemplary embodiment, a current channel matrix of a user may be taken as the current state of the MDP, and the MDP may select an action according to the current state and receive a reward of the next state. Then the MDP may change to the next state and select another action to the environment according to the new state. As an example, the channel matrix of the user may be represented as below:

where m is the number of uplink/downlink (UL/DL) antennas, e.g. m=1; n is the number of antenna subarrays, e.g. n=16; and h _mn is the weight of one antenna subarray. The channel matrix may contain data of the current UL/DL channel of the user, e.g., including the UL/DL channel matrix for time division duplex (TDD) or frequency division duplex (FDD) . According to an embodiment, the complex data may need to concatenate with real and imaginary parts.

In accordance with an exemplary embodiment, the environment state of the MDP is a finite set, and the action at the current state may cause the MDP to switch to the next state. According to the channel transmission as described with respect to formula (1) , the value of H _(k) may be recalculated at every transmission time interval (TTI) . This satisfies the Markov property. Thus, the channel matrix (e.g., Current_User_Channel_Matrix given in formula (9) , etc. ) may be used as the environment state S of the MDP.

S = Current_User_Channel_Matrix (10)

Another important reason for using the channel matrix as the environment state of the MDP is that the channel matrix may contain rich information, such as user location. Alternatively, the beamforming weight also may be used as the environment state of the MDP, since the beamforming weight may be calculated by the channel matrix. In another embodiment, the environment state may be the channel impulse response or other channel conditions for the user.

In accordance with an exemplary embodiment, the action of the MDP is a finite set. As an example, a beamforming weight tune factor vector as below may be the action of the MDP.

where m is the number of UL/DL antennas, e.g. m=1; n is the number of antenna subarrays, e.g. n=16; and α _nm is the beamforming weight tune factor, which may be complex data containing real and imaginary parts.

According to an embodiment, the beamforming weight tune factor for training the MDP may be obtained by using the mean and standard deviation to generate random numbers. For example, if there is no historical data of the beamforming weight tune factor, the real part and the imaginary part of the beamforming weight tune factor may be generated by using the mean of 3 and standard deviation of 0.01. It can be appreciated that other suitable values of the mean and standard deviation also may be used to generate the beamforming weight tune factor for training the MDP.

In addition to the beamforming weight tune factor, the action of the MDP may also be the beamforming weight. In this case, the beamforming weight for training the MDP may be obtained through historical data, e.g., from the beamforming data collected for one or more specific UEs in a communication network.

It can be appreciated that the action of the MDP may be associated with the environment state of the MDP. For example, if the environment state of the MDP is the channel matrix or the channel impulse response, the action of the MDP may be the beamforming weight tune factor, the beamforming weight or any other suitable beamforming information which may affect the state of the MDP. Alternatively, if the environment state of the MDP is the beamforming weight, the action of the MDP may be the beamforming weight tune factor.

In accordance with an exemplary embodiment, the reward of the MDP may be a local reward for a specific UE, which may be long term benefits obtained by selecting the action for this UE according to the current state of the MDP. As an example, the local reward R _{local_t} may be calculated as below:

r _{local_t+1}=SINR _t+1 (13)

where r _{local_t+1} is the reward for the specific UE at time t+1, and SINR _t+1 is proportional to the maximum transmission rate according to Shannon theorem as below:

C=B*log ₂ (1+SINR _t+1) (14)

where C is the channel capacity, and B is the channel bandwidth.

In accordance with another exemplary embodiment, the reward of the MDP may be a global reward for a group of users (e.g., UE1, UE2, …, UEi, …, UEw) , which may be long term benefits obtained by selecting the action for these UEs according to the current state of the MDP. As an example, the global reward R _{global_t} may be calculated as below:

where r _{global_t+1} is the reward for the UEs at time t+1, and SINR _aver is the average SINR which is proportional to the maximum transmission rate according to Shannon theorem as given in formula (17) .

C=B*log ₂ (1+SINR _aver) (17)

According to an embodiment, in the case that the reward of the MDP is the global reward, the channel matrix used as the environment state of the MDP may be a channel matrix which contain the channel data for all UEs (e.g., UE1, UE2, …, UEi, …, UEw) , and the action of the MDP may contain the action applicable for all UEs.

In accordance with an exemplary embodiment, the association between the action and the state of the MDP may be reflected by an action-state value function, e.g. as given in formula (7) . After selecting the operation (i.e., action) based on the current state of the MDP, the expectation (i.e. reward) of the selected operation may be obtained according to the action-state value function. In an embodiment, a convolution neural network e.g. with one-dimensional convolution kernel may be used to fit the action-state value function.

Fig. 3B is a diagram illustrating an exemplary MDP model with a convolution neural network according to an embodiment of the present disclosure. As shown in Fig. 3B, state data may be input to the MDP model, and then according to a specific action-state value function determined by the convolutional neural network, action data and reward data may be output from the MDP model. The MDP model with the convolution neural network may be pre-trained, so that the parameters of the action-state value function may be obtained by iterated training with historical data of state, action and reward.

In accordance with an exemplary embodiment, the historical data related to beamforming of some predetermined UEs may be used as training data of the MDP model. As an example, a channel matrix per UE, a beamforming weight tune factor per UE, and the corresponding SINR per UE may be input to the convolutional neural network (e.g. a ResNet50 model, etc. ) for data feature extraction. Through deep reinforcement learning, the optimal model of the MDP may be obtained and saved as a pre-trained model for beamforming weight finetune. If it is needed to estimate a beamforming weight tune factor for a test UE, the current channel matrix of this UE may be used as input data of the pre-trained MDP model. Then the beamforming weight tune factor estimated for the test UE may be output from the pre-trained MDP model as action data (e.g., containing real and imaginary parts) .

In accordance with an exemplary embodiment, an offline data set including the historical data of the predetermined UEs may be used to train the MDP model, and an online data set including the current state of the test UE may be input to the pre-trained MDP model to estimate the corresponding action with expected reward. In an embodiment, the current state and the estimated action as well as the expected reward for the test UE may be put into the offline data set to update the parameters of the action-state value function.

According to an exemplary embodiment, some model parameters may be set to train iterations with big data many times, so as to find the model with the lowest loss. For example, parameters of a reinforcement learning model such as the MDP model may be set as below:

- Learning_rate = 0.0001;

- Optimizer = adam;

- Batch_size = 64;

- GAMMA = 0.99; and

- EVERY = 4.

In accordance with an exemplary embodiment, the normalized data (e.g. state, action and reward) of each UE may be fed for the reinforcement learning model, and the model parameters may be continuously adjusted so that the model validation set loss may be smallest. It can be appreciated that the MDP model and related parameter settings described above are just examples, and according to different application requirements, various reinforcement learning models with different configurations and parameter settings may be applicable to exemplary embodiments according to the present disclosure.

In accordance with an exemplary embodiment, a pre-trained ResNet50 model may be used as the trained convolutional neural network to extract a feature vector containing beamforming information. In an embodiment, the downloaded parameters for the ResNet50 model may not contain softmax regression. The structure of the ResNet50 model may be predefined as required. Then the pre-trained model parameters may be loaded into the predefined network structure, and a regression may be added in the full connection layer to get the beamforming information. It can be appreciated that although the convolutional neural network is described with respect to a ResNet50 model, any other suitable models (e.g. LeNet, etc. ) of a convolutional neural network may be applicable to various exemplary embodiments of the present disclosure.

In accordance with an exemplary embodiment, assuming the legacy beamforming weight is [W1, W2, W3, ……, Wn] , then the final beamforming weight BFW _final may be tuned as below:

BFW _final = [W1 + α1, W2 + α2, W3 + α3, ……, Wn + αn] (18)

where αn is a beamforming weight tune factor of Wn, which may consist of real and imaginary parts as the action output of the MDP model.

It can be appreciated that the parameter names (e.g. Current_User_Channel_Matrix, R _{local_t}, R _{global_t}, and BFW _final, etc. ) , parameter elements and parameter representations used herein are exemplary, and other names, elements and representations may also be used to indicate the same or similar information. In addition, it also can be appreciated that functions, variables and weights related to the channel matrix described herein are just examples, and other suitable function settings, the associated variables, weights and values thereof may also be applicable to implementing various embodiments.

It is noted that some embodiments of the present disclosure are mainly described in relation to LTE or NR specifications being used as non-limiting examples for certain exemplary network configurations and system deployments. As such, the description of exemplary embodiments given herein specifically refers to terminology which is directly related thereto. Such terminology is only used in the context of the presented non-limiting examples and embodiments, and does naturally not limit the present disclosure in any way. Rather, any other system configuration or radio technologies may equally be utilized as long as exemplary embodiments described herein are applicable.

Fig. 4 is a flowchart illustrating a method 400 according to some embodiments of the present disclosure. The method 400 illustrated in Fig. 4 may be performed by a network node or an apparatus communicatively coupled to the network node. In accordance with an exemplary embodiment, the network node may comprise a base station, an AP, a transmission point or any other suitable entity that may communicate with one or more terminal devices such as UEs according to specific communication protocols.

According to the exemplary method 400 illustrated in Fig. 4, the network node may obtain channel information of one or more terminal devices with respect to the network node, as shown in block 402. According to a MDP based at least in part on the channel information, the network node may determine beamforming information of the one or more terminal devices, as shown in block 404.

In accordance with an exemplary embodiment, the channel information of the one or more terminal devices may be used as a state input of the MDP, e.g., as described with respect to Fig. 2B, Fig. 3A and Fig. 3B. According to an exemplary embodiment, the channel information of the one or more terminal devices may comprise a channel matrix per terminal device (e.g., the channel matrix in formula (9) , etc. ) . According to another exemplary embodiment, the channel information of the one or more terminal devices may comprise a channel impulse response per terminal device. Alternatively, the channel information of the one or more terminal devices may comprise a beamforming weight per terminal device.

In accordance with an exemplary embodiment, the beamforming information of the one or more terminal devices may be an action output of the MDP. According to an exemplary embodiment, the beamforming information of the one or more terminal devices may comprise: a beamforming weight tune factor per terminal device (e.g., the beamforming weight tune factor in formula (11) , etc. ) , or a beamforming weight per terminal device.

In accordance with an exemplary embodiment, the MDP may enable optimization of communication performance of the one or more terminal devices. According to an exemplary embodiment, the communication performance of the one or more terminal devices may comprise one or more of:

- a SINR expected per terminal device;

- an average SNIR expected for the one or more terminal devices;

- a data rate expected per terminal device;

- an average data rate expected for the one or more terminal devices;

- traffic throughput expected per terminal device; and

- an average traffic throughput expected for the one or more terminal devices.

In accordance with an exemplary embodiment, when global optimization is applied for beamforming, all channel data of the one or more terminal devices may be considered in the MDP as a whole. In this case, channel matrixes of different terminal devices may be integrated to form one channel matrix containing all channel data of the one or more terminal devices. The channel matrix formed for the one or more terminal devices may be provided to the MDP as the state input. Similarly, channel impulse responses of different terminal devices or beamforming weights of different terminal devices also may be integrated and input to the MDP, when global optimization is required. Correspondingly, the action output of the MDP may be beamforming weight tune factors of different terminal devices or beamforming weights of different terminal devices, which may implement global optimization of the network performance.

In accordance with an exemplary embodiment, the MDP may use an action-state value function (e.g., the action-state value function in formula (7) , etc. ) determined by a convolution neural network (e.g., a LeNet model, a ResNet50 model, etc. ) . According to an exemplary embodiment, the convolution neural network may have a regression functionality to provide a continuous output.

- channel information of one or more training devices (e.g., a group of UEs having historical data such as state, action and reward) ;

- beamforming information of the one or more training devices; and

- communication performance of the one or more training devices.

In accordance with an exemplary embodiment, the beamforming information of the one or more training devices may comprise a beamforming weight tune factor randomly generated for each of the one or more training devices (e.g., using the mean and standard deviation with proper values) .

In accordance with an exemplary embodiment, the channel information, the beamforming information and communication performance of the one or more terminal devices may be used as training data of the MDP. In this case, the one or more parameters of the action-state value function used by the MDP may be updated correspondingly.

Fig. 5 is a flowchart illustrating a method 500 according to some embodiments of the present disclosure. The method 500 illustrated in Fig. 5 may be performed by a terminal device or an apparatus communicatively coupled to the terminal device. In accordance with an exemplary embodiment, the terminal device such as a UE may be capable of communicating with a network node (e.g., a base station, an AP, a transmission point, etc. ) according to specific communication protocols.

According to the exemplary method 500 illustrated in Fig. 5, the terminal device may obtain channel information of the terminal device with respect to a network node (e.g., the network node as described with respect to Fig. 4) , as shown in block 502. According to a MDP based at least in part on the channel information, the terminal device may determine beamforming information of the terminal device, as shown in block 504.

In accordance with an exemplary embodiment, the channel information of the terminal device may be used as a state input of the MDP, e.g., as described with respect to Fig. 2B, Fig. 3A and Fig. 3B. According to an exemplary embodiment, the channel information of the terminal device may comprise: a channel matrix of the terminal device (e.g., the channel matrix in formula (9) , etc. ) , a channel impulse response of the terminal device, or a beamforming weight of the terminal device.

In accordance with an exemplary embodiment, the beamforming information of the terminal device may be an action output of the MDP. According to an exemplary embodiment, the beamforming information of the terminal device may comprise: a beamforming weight tune factor of the terminal device (e.g., the beamforming weight tune factor in formula (11) , etc. ) , or a beamforming weight of the terminal device.

In accordance with an exemplary embodiment, the MDP may enable optimization of communication performance of the terminal device. According to an exemplary embodiment, the communication performance of the terminal device may comprise one or more of:

- a SINR expected for the terminal device;

- a data rate expected for the terminal device; and

- traffic throughput expected for the terminal device.

- beamforming information of the one or more training devices; and

- communication performance of the one or more training devices.

In accordance with an exemplary embodiment, the channel information, the beamforming information and communication performance of the terminal device may be used as training data of the MDP. In this case, the one or more parameters of the action-state value function used by the MDP may be updated correspondingly.

It can be appreciated that the one or more training devices as described with respect to the method 500 may be the same as or different from the one or more training devices as described with respect to the method 400. Similarly, the action-state value function used by the MDP as described with respect to the method 500 may be the same as or different from the action-state value function used by the MDP as described according to the method 400.

Various exemplary embodiments according to the present disclosure may enable DRL based beamforming weight finetune. In accordance with some exemplary embodiments, a DRL model such as an MDP model using a convolution neural network may be pre-trained according to training data (e.g., state, action and reward data) collected for some predetermined UEs in various communication environments. Channel information of a UE for which beamforming weight finetune may be needed can be used as input data of the pre-trained MDP model, so as to get beamforming information of the UE as output data from the MDP model. Application of various exemplary embodiments can advantageously improve the accuracy of beamforming weight estimation with enhanced throughput and energy efficiency.

The various blocks shown in Figs. 4-5 may be viewed as method steps, and/or as operations that result from operation of computer program code, and/or as a plurality of coupled logic circuit elements constructed to carry out the associated function (s) . The schematic flow chart diagrams described above are generally set forth as logical flow chart diagrams. As such, the depicted order and labeled steps are indicative of specific embodiments of the presented methods. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated methods. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.

Fig. 6A is a block diagram illustrating an apparatus 610 according to various embodiments of the present disclosure. As shown in Fig. 6A, the apparatus 610 may comprise one or more processors such as processor 611 and one or more memories such as memory 612 storing computer program codes 613. The memory 612 may be non-transitory machine/processor/computer readable storage medium. In accordance with some exemplary embodiments, the apparatus 610 may be implemented as an integrated circuit chip or module that can be plugged or installed into a network node as described with respect to Fig. 4, or a terminal device as described with respect to Fig. 5. In such case, the apparatus 610 may be implemented as a network node as described with respect to Fig. 4, or a terminal device as described with respect to Fig. 5.

In some implementations, the one or more memories 612 and the computer program codes 613 may be configured to, with the one or more processors 611, cause the apparatus 610 at least to perform any operation of the method as described in connection with Fig. 4. In other implementations, the one or more memories 612 and the computer program codes 613 may be configured to, with the one or more processors 611, cause the apparatus 610 at least to perform any operation of the method as described in connection with Fig. 5. Alternatively or additionally, the one or more memories 612 and the computer program codes 613 may be configured to, with the one or more processors 611, cause the apparatus 610 at least to perform more or less operations to implement the proposed methods according to the exemplary embodiments of the present disclosure.

Fig. 6B is a block diagram illustrating an apparatus 620 according to some embodiments of the present disclosure. As shown in Fig. 6B, the apparatus 620 may comprise an obtaining unit 621 and a determining unit 622. In an exemplary embodiment, the apparatus 620 may be implemented in a network node such as a base station. In this case, the obtaining unit 621 may be operable to carry out the operation in block 402, and the determining unit 622 may be operable to carry out the operation in block 404. In another exemplary embodiment, the apparatus 620 may be implemented in a terminal device such as a UE. In this case, the obtaining unit 621 may be operable to carry out the operation in block 502, and the determining unit 622 may be operable to carry out the operation in block 504. Optionally, the obtaining unit 621 and/or the determining unit 622 may be operable to carry out more or less operations to implement the proposed methods according to the exemplary embodiments of the present disclosure.

Fig. 7 is a block diagram illustrating a telecommunication network connected via an intermediate network to a host computer in accordance with some embodiments of the present disclosure.

With reference to Fig. 7, in accordance with an embodiment, a communication system includes a telecommunication network 710, such as a 3GPP-type cellular network, which comprises an access network 711, such as a radio access network, and a core network 714. The access network 711 comprises a plurality of

base stations

712a, 712b, 712c, such as NBs, eNBs, gNBs or other types of wireless access points, each defining a

corresponding coverage area

713a, 713b, 713c. Each

base station

712a, 712b, 712c is connectable to the core network 714 over a wired or wireless connection 715. A first UE 791 located in a coverage area 713c is configured to wirelessly connect to, or be paged by, the corresponding base station 712c. A second UE 792 in a coverage area 713a is wirelessly connectable to the corresponding base station 712a. While a plurality of

UEs

791, 792 are illustrated in this example, the disclosed embodiments are equally applicable to a situation where a sole UE is in the coverage area or where a sole UE is connecting to the corresponding base station 712.

The telecommunication network 710 is itself connected to a host computer 730, which may be embodied in the hardware and/or software of a standalone server, a cloud-implemented server, a distributed server or as processing resources in a server farm. The host computer 730 may be under the ownership or control of a service provider, or may be operated by the service provider or on behalf of the service provider.

Connections

721 and 722 between the telecommunication network 710 and the host computer 730 may extend directly from the core network 714 to the host computer 730 or may go via an optional intermediate network 720. An intermediate network 720 may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network 720, if any, may be a backbone network or the Internet; in particular, the intermediate network 720 may comprise two or more sub-networks (not shown) .

The communication system of Fig. 7 as a whole enables connectivity between the connected

UEs

791, 792 and the host computer 730. The connectivity may be described as an over-the-top (OTT) connection 750. The host computer 730 and the connected

UEs

791, 792 are configured to communicate data and/or signaling via the OTT connection 750, using the access network 711, the core network 714, any intermediate network 720 and possible further infrastructure (not shown) as intermediaries. The OTT connection 750 may be transparent in the sense that the participating communication devices through which the OTT connection 750 passes are unaware of routing of uplink and downlink communications. For example, the base station 712 may not or need not be informed about the past routing of an incoming downlink communication with data originating from the host computer 730 to be forwarded (e.g., handed over) to a connected UE 791. Similarly, the base station 712 need not be aware of the future routing of an outgoing uplink communication originating from the UE 791 towards the host computer 730.

Fig. 8 is a block diagram illustrating a host computer communicating via a base station with a UE over a partially wireless connection in accordance with some embodiments of the present disclosure.

Example implementations, in accordance with an embodiment, of the UE, base station and host computer discussed in the preceding paragraphs will now be described with reference to Fig. 8. In a communication system 800, a host computer 810 comprises hardware 815 including a communication interface 816 configured to set up and maintain a wired or wireless connection with an interface of a different communication device of the communication system 800. The host computer 810 further comprises a processing circuitry 818, which may have storage and/or processing capabilities. In particular, the processing circuitry 818 may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The host computer 810 further comprises software 811, which is stored in or accessible by the host computer 810 and executable by the processing circuitry 818. The software 811 includes a host application 812. The host application 812 may be operable to provide a service to a remote user, such as UE 830 connecting via an OTT connection 850 terminating at the UE 830 and the host computer 810. In providing the service to the remote user, the host application 812 may provide user data which is transmitted using the OTT connection 850.

The communication system 800 further includes a base station 820 provided in a telecommunication system and comprising hardware 825 enabling it to communicate with the host computer 810 and with the UE 830. The hardware 825 may include a communication interface 826 for setting up and maintaining a wired or wireless connection with an interface of a different communication device of the communication system 800, as well as a radio interface 827 for setting up and maintaining at least a wireless connection 870 with the UE 830 located in a coverage area (not shown in Fig. 8) served by the base station 820. The communication interface 826 may be configured to facilitate a connection 860 to the host computer 810. The connection 860 may be direct or it may pass through a core network (not shown in Fig. 8) of the telecommunication system and/or through one or more intermediate networks outside the telecommunication system. In the embodiment shown, the hardware 825 of the base station 820 further includes a processing circuitry 828, which may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The base station 820 further has software 821 stored internally or accessible via an external connection.

The communication system 800 further includes the UE 830 already referred to. Its hardware 835 may include a radio interface 837 configured to set up and maintain a wireless connection 870 with a base station serving a coverage area in which the UE 830 is currently located. The hardware 835 of the UE 830 further includes a processing circuitry 838, which may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The UE 830 further comprises software 831, which is stored in or accessible by the UE 830 and executable by the processing circuitry 838. The software 831 includes a client application 832. The client application 832 may be operable to provide a service to a human or non-human user via the UE 830, with the support of the host computer 810. In the host computer 810, an executing host application 812 may communicate with the executing client application 832 via the OTT connection 850 terminating at the UE 830 and the host computer 810. In providing the service to the user, the client application 832 may receive request data from the host application 812 and provide user data in response to the request data. The OTT connection 850 may transfer both the request data and the user data. The client application 832 may interact with the user to generate the user data that it provides.

It is noted that the host computer 810, the base station 820 and the UE 830 illustrated in Fig. 8 may be similar or identical to the host computer 730, one of

base stations

712a, 712b, 712c and one of

UEs

791, 792 of Fig. 7, respectively. This is to say, the inner workings of these entities may be as shown in Fig. 8 and independently, the surrounding network topology may be that of Fig. 7.

In Fig. 8, the OTT connection 850 has been drawn abstractly to illustrate the communication between the host computer 810 and the UE 830 via the base station 820, without explicit reference to any intermediary devices and the precise routing of messages via these devices. Network infrastructure may determine the routing, which it may be configured to hide from the UE 830 or from the service provider operating the host computer 810, or both. While the OTT connection 850 is active, the network infrastructure may further take decisions by which it dynamically changes the routing (e.g., on the basis of load balancing consideration or reconfiguration of the network) .

Wireless connection 870 between the UE 830 and the base station 820 is in accordance with the teachings of the embodiments described throughout this disclosure. One or more of the various embodiments improve the performance of OTT services provided to the UE 830 using the OTT connection 850, in which the wireless connection 870 forms the last segment. More precisely, the teachings of these embodiments may improve the latency and the power consumption, and thereby provide benefits such as lower complexity, reduced time required to access a cell, better responsiveness, extended battery lifetime, etc.

A measurement procedure may be provided for the purpose of monitoring data rate, latency and other factors on which the one or more embodiments improve. There may further be an optional network functionality for reconfiguring the OTT connection 850 between the host computer 810 and the UE 830, in response to variations in the measurement results. The measurement procedure and/or the network functionality for reconfiguring the OTT connection 850 may be implemented in software 811 and hardware 815 of the host computer 810 or in software 831 and hardware 835 of the UE 830, or both. In embodiments, sensors (not shown) may be deployed in or in association with communication devices through which the OTT connection 850 passes; the sensors may participate in the measurement procedure by supplying values of the monitored quantities exemplified above, or supplying values of other physical quantities from which the

software

811, 831 may compute or estimate the monitored quantities. The reconfiguring of the OTT connection 850 may include message format, retransmission settings, preferred routing etc.; the reconfiguring need not affect the base station 820, and it may be unknown or imperceptible to the base station 820. Such procedures and functionalities may be known and practiced in the art. In certain embodiments, measurements may involve proprietary UE signaling facilitating the host computer 810’s measurements of throughput, propagation times, latency and the like. The measurements may be implemented in that the

software

811 and 831 causes messages to be transmitted, in particular empty or ‘dummy’ messages, using the OTT connection 850 while it monitors propagation times, errors etc.

Fig. 9 is a flowchart illustrating a method implemented in a communication system, in accordance with an embodiment. The communication system includes a host computer, a base station and a UE which may be those described with reference to Fig. 7 and Fig. 8. For simplicity of the present disclosure, only drawing references to Fig. 9 will be included in this section. In step 910, the host computer provides user data. In substep 911 (which may be optional) of step 910, the host computer provides the user data by executing a host application. In step 920, the host computer initiates a transmission carrying the user data to the UE. In step 930 (which may be optional) , the base station transmits to the UE the user data which was carried in the transmission that the host computer initiated, in accordance with the teachings of the embodiments described throughout this disclosure. In step 940 (which may also be optional) , the UE executes a client application associated with the host application executed by the host computer.

Fig. 10 is a flowchart illustrating a method implemented in a communication system, in accordance with an embodiment. The communication system includes a host computer, a base station and a UE which may be those described with reference to Fig. 7 and Fig. 8. For simplicity of the present disclosure, only drawing references to Fig. 10 will be included in this section. In step 1010 of the method, the host computer provides user data. In an optional substep (not shown) the host computer provides the user data by executing a host application. In step 1020, the host computer initiates a transmission carrying the user data to the UE. The transmission may pass via the base station, in accordance with the teachings of the embodiments described throughout this disclosure. In step 1030 (which may be optional) , the UE receives the user data carried in the transmission.

Fig. 11 is a flowchart illustrating a method implemented in a communication system, in accordance with an embodiment. The communication system includes a host computer, a base station and a UE which may be those described with reference to Fig. 7 and Fig. 8. For simplicity of the present disclosure, only drawing references to Fig. 11 will be included in this section. In step 1110 (which may be optional) , the UE receives input data provided by the host computer. Additionally or alternatively, in step 1120, the UE provides user data. In substep 1121 (which may be optional) of step 1120, the UE provides the user data by executing a client application. In substep 1111 (which may be optional) of step 1110, the UE executes a client application which provides the user data in reaction to the received input data provided by the host computer. In providing the user data, the executed client application may further consider user input received from the user. Regardless of the specific manner in which the user data was provided, the UE initiates, in substep 1130 (which may be optional) , transmission of the user data to the host computer. In step 1140 of the method, the host computer receives the user data transmitted from the UE, in accordance with the teachings of the embodiments described throughout this disclosure.

Fig. 12 is a flowchart illustrating a method implemented in a communication system, in accordance with an embodiment. The communication system includes a host computer, a base station and a UE which may be those described with reference to Fig. 7 and Fig. 8. For simplicity of the present disclosure, only drawing references to Fig. 12 will be included in this section. In step 1210 (which may be optional) , in accordance with the teachings of the embodiments described throughout this disclosure, the base station receives user data from the UE. In step 1220 (which may be optional) , the base station initiates transmission of the received user data to the host computer. In step 1230 (which may be optional) , the host computer receives the user data carried in the transmission initiated by the base station.

According to some exemplary embodiments, there is provided a method implemented in a communication system which may include a host computer, a base station and a UE. The method may comprise providing user data at the host computer. Optionally, the method may comprise, at the host computer, initiating a transmission carrying the user data to the UE via a cellular network comprising the base station which may perform any step of the exemplary method 400 as describe with respect to Fig. 4.

According to some exemplary embodiments, there is provided a communication system including a host computer. The host computer may comprise processing circuitry configured to provide user data, and a communication interface configured to forward the user data to a cellular network for transmission to a UE. The cellular network may comprise a base station having a radio interface and processing circuitry. The base station’s processing circuitry may be configured to perform any step of the exemplary method 400 as describe with respect to Fig. 4.

According to some exemplary embodiments, there is provided a method implemented in a communication system which may include a host computer, a base station and a UE. The method may comprise providing user data at the host computer. Optionally, the method may comprise, at the host computer, initiating a transmission carrying the user data to the UE via a cellular network comprising the base station. The UE may perform any step of the exemplary method 500 as describe with respect to Fig. 5.

According to some exemplary embodiments, there is provided a communication system including a host computer. The host computer may comprise processing circuitry configured to provide user data, and a communication interface configured to forward user data to a cellular network for transmission to a UE. The UE may comprise a radio interface and processing circuitry. The UE’s processing circuitry may be configured to perform any step of the exemplary method 500 as describe with respect to Fig. 5.

According to some exemplary embodiments, there is provided a method implemented in a communication system which may include a host computer, a base station and a UE. The method may comprise, at the host computer, receiving user data transmitted to the base station from the UE which may perform any step of the exemplary method 500 as describe with respect to Fig. 5.

According to some exemplary embodiments, there is provided a communication system including a host computer. The host computer may comprise a communication interface configured to receive user data originating from a transmission from a UE to a base station. The UE may comprise a radio interface and processing circuitry. The UE’s processing circuitry may be configured to perform any step of the exemplary method 500 as describe with respect to Fig. 5.

According to some exemplary embodiments, there is provided a method implemented in a communication system which may include a host computer, a base station and a UE. The method may comprise, at the host computer, receiving, from the base station, user data originating from a transmission which the base station has received from the UE. The base station may perform any step of the exemplary method 400 as describe with respect to Fig. 4.

According to some exemplary embodiments, there is provided a communication system which may include a host computer. The host computer may comprise a communication interface configured to receive user data originating from a transmission from a UE to a base station. The base station may comprise a radio interface and processing circuitry. The base station’s processing circuitry may be configured to perform any step of the exemplary method 400 as describe with respect to Fig. 4.

In general, the various exemplary embodiments may be implemented in hardware or special purpose chips, circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the disclosure is not limited thereto. While various aspects of the exemplary embodiments of this disclosure may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

As such, it should be appreciated that at least some aspects of the exemplary embodiments of the disclosure may be practiced in various components such as integrated circuit chips and modules. It should thus be appreciated that the exemplary embodiments of this disclosure may be realized in an apparatus that is embodied as an integrated circuit, where the integrated circuit may comprise circuitry (as well as possibly firmware) for embodying at least one or more of a data processor, a digital signal processor, baseband circuitry and radio frequency circuitry that are configurable so as to operate in accordance with the exemplary embodiments of this disclosure.

It should be appreciated that at least some aspects of the exemplary embodiments of the disclosure may be embodied in computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other device. The computer executable instructions may be stored on a computer readable medium such as a hard disk, optical disk, removable storage media, solid state memory, random access memory (RAM) , etc. As will be appreciated by one of skill in the art, the function of the program modules may be combined or distributed as desired in various embodiments. In addition, the function may be embodied in whole or partly in firmware or hardware equivalents such as integrated circuits, field programmable gate arrays (FPGA) , and the like.

The present disclosure includes any novel feature or combination of features disclosed herein either explicitly or any generalization thereof. Various modifications and adaptations to the foregoing exemplary embodiments of this disclosure may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings. However, any and all modifications will still fall within the scope of the non-limiting and exemplary embodiments of this disclosure.

Claims

A method (400) performed by a network node, comprising:

obtaining (402) channel information of one or more terminal devices with respect to the network node; and

determining (404) beamforming information of the one or more terminal devices, according to a Markov decision process based at least in part on the channel information.
The method according to claim 1, wherein the channel information of the one or more terminal devices is a state input of the Markov decision process.
The method according to claim 1 or 2, wherein the channel information of the one or more terminal devices comprises:

a channel matrix per terminal device;

a channel impulse response per terminal device; or

a beamforming weight per terminal device.
The method according to any of claims 1-3, wherein the beamforming information of the one or more terminal devices is an action output of the Markov decision process.
The method according to any of claims 1-4, wherein the beamforming information of the one or more terminal devices comprises:

a beamforming weight tune factor per terminal device; or

a beamforming weight per terminal device.
The method according to any of claims 1-5, wherein the Markov decision process enables optimization of communication performance of the one or more terminal devices.
The method according to claim 6, wherein the communication performance of the one or more terminal devices comprises one or more of:

a signal to interference plus noise ratio expected per terminal device;

an average signal to interference plus noise ratio expected for the one or more terminal devices;

a data rate expected per terminal device;

an average data rate expected for the one or more terminal devices;

traffic throughput expected per terminal device; and

an average traffic throughput expected for the one or more terminal devices.
The method according to any of claims 1-7, wherein the Markov decision process uses an action-state value function determined by a convolution neural network.
The method according to claim 8, wherein the determination of the action-state value function comprises determining one or more parameters of the action-state value function by iterated training according to:

channel information of one or more training devices;

beamforming information of the one or more training devices; and

communication performance of the one or more training devices.
The method according to claim 9, wherein the beamforming information of the one or more training devices comprises a beamforming weight tune factor randomly generated for each of the one or more training devices.
The method according to any of claims 8-10, wherein the convolution neural network has a regression functionality to provide a continuous output.
The method according to any of claims 1-11, wherein the channel information, the beamforming information and communication performance of the one or more terminal devices are used as training data of the Markov decision process.
A network node (610) , comprising:

one or more processors (611) ; and

one or more memories (612) storing computer program codes (613) ,

the one or more memories (612) and the computer program codes (613) configured to, with the one or more processors (611) , cause the network node (610) at least to:

obtain channel information of one or more terminal devices with respect to the network node; and

determine beamforming information of the one or more terminal devices, according to a Markov decision process based at least in part on the channel information.
The network node according to claim 13, wherein the one or more memories and the computer program codes are configured to, with the one or more processors, cause the network node to perform the method according to any one of claims 2-12.
A computer-readable medium having computer program codes (613) embodied thereon which, when executed on a computer, cause the computer to perform any step of the method according to any one of claims 1-12.
A method (500) performed by a terminal device, comprising:

obtaining (502) channel information of the terminal device with respect to a network node; and

determining (504) beamforming information of the terminal device, according to a Markov decision process based at least in part on the channel information.
The method according to claim 16, wherein the channel information of the terminal device is a state input of the Markov decision process.
The method according to claim 16 or 17, wherein the channel information of the terminal device comprises:

a channel matrix of the terminal device;

a channel impulse response of the terminal device; or

a beamforming weight of the terminal device.
The method according to any of claims 16-18, wherein the beamforming information of the terminal device is an action output of the Markov decision process.
The method according to any of claims 16-19, wherein the beamforming information of the terminal device comprises:

a beamforming weight tune factor of the terminal device; or

a beamforming weight of the terminal device.
The method according to any of claims 16-20, wherein the Markov decision process enables optimization of communication performance of the terminal device.
The method according to claim 21, wherein the communication performance of the terminal device comprises one or more of:

a signal to interference plus noise ratio expected for the terminal device;

a data rate expected for the terminal device; and

traffic throughput expected for the terminal device.
The method according to any of claims 16-22, wherein the Markov decision process uses an action-state value function determined by a convolution neural network.
The method according to claim 23, wherein the determination of the action-state value function comprises determining one or more parameters of the action-state value function by iterated training according to:

channel information of one or more training devices;

beamforming information of the one or more training devices; and

communication performance of the one or more training devices.
The method according to claim 24, wherein the beamforming information of the one or more training devices comprises a beamforming weight tune factor randomly generated for each of the one or more training devices.
The method according to any of claims 23-25, wherein the convolution neural network has a regression functionality to provide a continuous output.
The method according to any of claims 16-26, wherein the channel information, the beamforming information and communication performance of the terminal device are used as training data of the Markov decision process.
A terminal device (610) , comprising:

one or more processors (611) ; and

one or more memories (612) storing computer program codes (613) ,

the one or more memories (612) and the computer program codes (613) configured to, with the one or more processors (611) , cause the terminal device (610) at least to:

obtain channel information of the terminal device with respect to a network node; and

determine beamforming information of the terminal device, according to a Markov decision process based at least in part on the channel information.
The terminal device according to claim 28, wherein the one or more memories and the computer program codes are configured to, with the one or more processors, cause the terminal device to perform the method according to any one of claims 17-27.
A computer-readable medium having computer program codes (613) embodied thereon which, when executed on a computer, cause the computer to perform any step of the method according to any one of claims 16-27.