CN110266350A

CN110266350A - Beam allocation method and device

Info

Publication number: CN110266350A
Application number: CN201910363472.6A
Authority: CN
Inventors: 延凯悦; 王友祥; 王波; 韩潇; 潘安劼
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2019-09-20

Abstract

This application provides a kind of beam allocation method and devices, are related to the communications field, are allocated for the wave beam to Base Transmitter.This method comprises: the corresponding beam set of each terminal in the system of determination；It wherein, include at least one wave beam in beam set, wave beam meets the wave beam of the first preset condition between base station and terminal；The wave beam of each terminal selection when performance the second preset condition of satisfaction of base station is determined using Q learning algorithm；The beam allocation that each terminal selects is carried out data transmission to corresponding terminal respectively.Beam selection can be converted into corresponding data function, pass through the performance number of terminal when computing terminal selection different beams, the beam allocation is carried out multi-beam transmission to each terminal, substantially increases the transmission performance of base station by the wave beam that terminal selects when calculating base station performance value maximum.

Description

Beam allocation method and device

Technical Field

The present application relates to the field of communications, and in particular, to a method and an apparatus for beam allocation.

Background

With the increasing demand of users for network resources, the existing low-frequency resources are already very limited, and therefore, the development and utilization of millimeter wave spectrum resources are enhanced. Millimeter waves have the advantages of rich frequency spectrum resources, narrow beams, high energy, strong directivity and high transmission quality, but meanwhile, millimeter waves are fast in attenuation, short in communication distance and large in electromagnetic wave loss, and high-gain beams are generally generated by combining a large-scale Multiple-Input Multiple-Output (MIMO) technology on a base station side, so that the large communication link loss is made up.

In the massive MIMO technology, communication between a base station and a user equipment is generally line of sight (LOS) communication or non-line of sight (NLOS) communication that undergoes only one reflection. In this case, the LOS beam allocation method is generally employed. Namely, in the initial access stage of the terminal, an access beam is selected according to the line of sight between the terminal and the base station (i.e., a beam which propagates along a straight line or only propagates through a few refractions between the base station and the terminal is selected), and then in the subsequent data transmission process, in order to save the overhead and improve the efficiency, data transmission is performed in the initially selected beam. Or selecting the beam with the maximum reference signal receiving power from the Reference Signal Receiving Power (RSRP) of each beam measured by the terminal for transmission. However, in an actual application scenario, if the distance between the plurality of terminals is small, interference is likely to occur between beams of the plurality of terminals, and even data collision occurs, so that the performance of the base station is greatly reduced.

Disclosure of Invention

The embodiment of the application provides a beam distribution method and device, which can convert beam selection into a corresponding data function, calculate a beam selected by a terminal when a base station performance value is maximum by calculating a performance value of the terminal when the terminal selects different beams, and distribute the beam to each terminal for beam transmission, thereby greatly improving the transmission performance of the base station.

In order to achieve the purpose, the technical scheme is as follows:

in a first aspect, the present application provides a beam allocation method, including: determining a beam set corresponding to each terminal in the communication system; wherein the set of beams includes at least one beam; calculating the performance of the base station after each terminal respectively selects one wave beam from the corresponding wave beam set to perform analog data transmission according to a preset rule; and when the performance of the base station meets a preset condition, the wave beam selected by each terminal is distributed to each terminal for data transmission.

In a second aspect, the present application provides a beam allocating apparatus, including: a processing unit, configured to determine a beam set corresponding to each terminal in the communication system; wherein the set of beams includes at least one beam; the processing unit is further configured to calculate, according to a preset rule, a performance of the base station after each terminal selects one beam from the corresponding beam set to perform analog data transmission; the processing unit is further configured to allocate the beam selected by each terminal to each terminal for data transmission when the performance of the base station meets a preset condition.

In a third aspect, the present application provides a beam allocating apparatus, including: a processor and a memory; wherein the memory is configured to store one or more programs, the one or more programs include computer executable instructions, and when the beam allocation apparatus runs, the processor executes the computer executable instructions stored in the memory, so as to cause the beam allocation apparatus to perform the beam allocation method according to the first aspect and any one of the implementation manners of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the beam allocation method of the first aspect and any one of its implementation manners.

In a fifth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the beam allocation method of the first aspect and any one of its implementations.

In the beam allocation method provided by the embodiment of the application, a beam set corresponding to each terminal in the communication system is determined; wherein the set of beams includes at least one beam; calculating the performance of the base station after each terminal respectively selects one wave beam from the corresponding wave beam set to perform analog data transmission according to a preset rule; and when the performance of the base station meets a preset condition, the wave beam selected by each terminal is distributed to each terminal for data transmission. The beam selection can be converted into a corresponding data function, the performance value of the terminal when the terminal selects different beams is calculated, the beam selected by the terminal when the performance value of the base station is maximum is calculated, and the beam is distributed to each terminal for beam transmission, so that the transmission performance of the base station is greatly improved.

Drawings

Fig. 1 is a system architecture diagram of a beam distribution system according to an embodiment of the present application;

fig. 2 is a first flowchart of a beam allocation method according to an embodiment of the present application;

fig. 3 is a second flowchart of a beam allocation method according to an embodiment of the present application;

fig. 4 is a simulation comparison diagram of base station throughput of a Q learning algorithm beam allocation scheme, an LOS path beam allocation scheme, an interference cancellation algorithm beam allocation scheme, and an enhanced beam selection algorithm allocation scheme provided in the embodiment of the present application;

fig. 5 is a comparison diagram of a Q learning algorithm beam allocation scheme, an LOS path beam allocation scheme, an interference cancellation algorithm beam allocation scheme, and an enhanced beam selection algorithm allocation scheme for performing terminal transmission rate fairness simulation according to the embodiment of the present application;

fig. 6 is a first schematic structural diagram of a beam distribution apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a beam distribution apparatus according to an embodiment of the present application;

fig. 8 is a third schematic structural diagram of a beam distribution apparatus according to an embodiment of the present application.

Detailed Description

The following describes the beam allocation method and apparatus provided in the present application in detail with reference to the accompanying drawings.

The terms "first" and "second", etc. in the description and drawings of the present application are used for distinguishing between different objects and not for describing a particular order of the objects.

Furthermore, the terms "including" and "having," and any variations thereof, as referred to in the description of the present application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that in the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the description of the present application, the meaning of "a plurality" means two or more unless otherwise specified.

The technology to which this application relates is explained below to facilitate the understanding of the reader:

LOS path beam: a beam propagating along a straight line between the base station and the terminal.

NLOS radial beam: a beam that has undergone at least one reflection from between the base station and the terminal.

A MIMO antenna system: a plurality of antennas (antennas are usually n × n-dimensional square array antennas) are deployed at the transmitting end and the receiving end, and when a signal is transmitted, the content is divided into multiple parts, and the multiple parts are transmitted to the receiving end through a plurality of different antennas. And combining the signals received by the receiving end to obtain the signal sent by the transmitting end. The MIMO technology can greatly reduce the signal flow sent by a single antenna at the transmitting end, thereby improving the transmission distance and the receiving range of signals, improving the transmission speed of the signals and simultaneously not occupying additional frequency spectrum resources. Meanwhile, because the channels between each transmitting terminal and each receiving terminal are different, the MIMO technology can also greatly improve the channel capacity of the transmitting terminal.

The beam allocation method provided in the embodiment of the present application is applied to the beam allocation system 100 shown in fig. 1.

As shown in fig. 1, the beam distribution system 100 includes a base station 101 and a plurality of terminals 102.

The base station 101 may be capable of transmitting a plurality of beams, and the terminal 102 may be capable of searching for at least one beam and selecting a corresponding beam for data transmission.

Among the multiple beams transmitted by the base station 101, each terminal 102 can detect one LOS path beam and NS NLOS path beams, where NS is an integer greater than or equal to 0. After the beam allocation is completed, the terminal performs data transmission using the selected beam.

The embodiment of the present application provides a beam allocation method, which is applied to the beam allocation system 100. The method is used for completing the optimal beam allocation of each terminal and realizing the maximization of the performance of the base station. As shown in fig. 2, the method includes S201-S203:

s201, determining a beam set corresponding to each terminal in the communication system.

Wherein the set of beams includes at least one beam.

Specifically, in this step, the RSRP of each beam may be determined. And forming the beams with the RSRP meeting the preset condition into the beam set. The specific process is as follows:

and the terminal carries out beam detection and acquires at least one beam transmitted by the base station. The at least one beam comprises an LOS path beam and NS NLOS path beams, and NS is an integer greater than or equal to 0.

And performing channel estimation on the detected beams to obtain the received signal energy distribution of each beam (namely the RSRP of each beam). And taking the beams with the RSRP larger than the preset threshold value as alternative beams (the alternative beams are the beams) to form a beam set. In an optional implementation of this step, the LOS beam is placed in a beam set regardless of whether the RSRP of the LOS path beam is greater than a preset threshold.

Illustratively, the base station is a millimeter wave massive MIMO base station. The base station is supporting K terminals for data transmission at the same time. It is assumed that the user employs single antenna reception and the base station side employs massive MIMO techniques for beamforming transmission. The base station side can form a plurality of narrow beams in different directions. The narrow beams have a certain width and are arranged in a non-overlapping sequence within the respective beam width. All narrowbeamformed beam sets of the base station are able to achieve full coverage of the 3D space. The set of beams may be represented as:

ε＝{e_q|1≤q≤Q}

where Q denotes the total number of beams that the base station can transmit and eq denotes the index of the qth beam.

And the base station side sends different directional beams in different time slices according to the transmitting beam codebook set, each beam covers a specific area of one block, and each beam also carries a reference signal. In the prior art, a terminal determines a beam with the maximum RSRP value as an optimal access beam through a beam scanning method and reports the optimal access beam to a base station, thereby completing beam allocation.

In one implementation of this step. For each terminal, a total of two beam selection processes are performed from power-on to data transmission, and the first process is a process in which the terminal accesses a network. In the process, the terminal selects a wide beam through single-sideband modulation (SSB), and the process of accessing the network is ended after the selection is completed. After the terminal accesses the network, when the terminal needs to perform data transmission, the measurement of the user beam and the selection of the beam are performed through a channel state information reference signal (CSI-RS).

User beam splitting for data transmissionFor example, after the user determines a plurality of user beams as a beam set, the beam set is reported to the massive MIMO base station. In a practical application scenario. The channel may be affected by blockage by moving or static objects. The terminal can detect a plurality of beams simultaneously, wherein the plurality of beams comprise an LOS path beam and NS NLOS path beams, N_SIs an integer of 0 or more. The more the number of the beams selected and reported by the terminal is, the better the transmission quality of the finally selected beams is possible to be.

S202, calculating the performance of the base station after each terminal respectively selects one wave beam from the corresponding wave beam set to carry out analog data transmission according to a preset rule.

The preset rule may be a Q learning algorithm.

Specifically, assume that the terminal detects that one LOS path beam and NS NLOS path beams exist between the terminal and the base station. Definition ofIs the LOS path beam for the UE,is the c-th NLOS radial beam of the UE. Thus the set of alternative beams for the kth UE can be expressed as:

where NT | CSk | ═ NS +1 represents the number of candidate beam set elements.

Assuming that the terminal side does not adopt the beamforming technology but adopts the omni-directional reception, the data transmission rate of the kth terminal in the system without considering the inter-cell interference is as follows:

r_k＝B·log₂(1+θ_k)

where θ k is a signal-to-noise ratio of a base station transmission signal received by the kth UE, and B is a channel bandwidth.

The performance function of the base station for the kth terminal (in this embodiment, the performance of the terminal is expressed by the downlink transmission rate of the base station for the terminal) is defined as follows: uk ═ ln (rk)

The problem of solving the performance optimization of the base station in the present application is converted into a problem of solving the maximum of the performance function of the base station for all the terminals.

The maximum value of the base station performance function is solved as follows:

wherein K is more than or equal to 1 and less than or equal to K.

And calculating the performance value of the base station after each terminal selects one beam according to the formula.

And S203, when the performance of the base station meets the preset condition, the wave beam selected by each terminal is distributed to each terminal for data transmission.

The preset condition may be that the performance of the base station is maximum.

Specifically, the performance value of the base station after each terminal selects different beams to perform analog data transmission is calculated for multiple times; and selecting corresponding actions executed by each terminal when the performance value of the base station meets the preset condition. The selection of the beam for each action is determined. The beams are allocated to the corresponding terminals, and the terminals perform data transmission by using the beams. The performance value of the base station is ensured to be maximum.

In an implementation manner of the embodiment of the present application, the base station performance calculation and the beam selection process in steps S202 to S203 may be implemented by using a Q learning algorithm. The problem of the best performance of the base station is decomposed into the problem of parallel beam allocation of K terminals in the system. And then solving the parallel beam distribution problem of the K terminals in the system by using a Q learning algorithm in deep learning. As shown in fig. 3, the step can be specifically divided into S301 to S307:

s301, determining the ith action.

The first action is an action that the terminal selects any beam to transmit data; l is an integer of 1 or more. And taking each terminal as an agent in a Q learning algorithm. The agent takes one beam for data transmission as one action.

There are a total of K agents in the system. Each agent corresponds to a set of actions. Taking the kth agent as an example, the action set can be expressed as:

the total number of the beams with the terminal is N_TEach beam corresponds to N_TAnd (4) an action.

Definition v_kl(1≤l≤N_T) Selecting its alternative beam set CS for the agent_kAs its data transmission beam.

The state of the base station is determined.

Wherein, the state of the base station is defined as follows: the overall performance of the massive MIMO system described above is taken as the environment. The environment is set to be a single-state environment, and the state environment is not involved in the process of executing each action by the agent (namely, the massive MIMO system is assumed to be stable and is not influenced by the surrounding environment or the performance of the device per se).

S302, determining the performance value of the base station after each terminal respectively executes the I-th action

Wherein k represents the kth terminal in the terminals, k is an integer greater than or equal to 1, v_klIndicating that the kth terminal performs the/th action.

Specifically, a reward function after each terminal executes the ith action is determined. The return function is: the agent performs various actions and then influences the environment of the base station (i.e. the overall performance of the base station will change after the terminal selects the beam transmission).

In this application, the influence on the performance of the base station after the agent performs each corresponding action in the action set is defined as a reward function. Therefore, the reward function after each agent executes the corresponding action is the performance sum of the base station after the terminal selects the beam to transmit data. The reward function may be expressed by the following equation:

wherein,for the reward function (i.e., the performance of the base station after the terminal selects the ith beam), K represents the kth terminal in the system, K represents the co-existence of K terminals in the system, l represents the ith beam in the terminal beam set, and NT represents the total NT beams in the beam set.

And S303, acquiring a Q value table corresponding to each terminal.

Wherein the Q value table comprises preset Q values corresponding to each ith actionAnd t represents the updating times of the Q value, the value of the Q value t is increased by 1 every time the Q value is updated, and t is an integer which is more than or equal to 0.

The Q value table is defined as follows: the Q value table is a table written by a worker in advance, each terminal corresponds to one Q value table, each Q value table has NT Q values, and each Q value corresponds to one action. The magnitude of each Q value in the Q value table in the initial stage may be set by a worker or an expert empirically or directly to 0. And then, continuously updating each Q value according to the calculation result of the Q learning algorithm.

After the agent executes the action corresponding to the ith beam, the corresponding reward function is obtainedWill be provided withQ value corresponding to the ith action in the Q value tableComparing, and updating the Q value

And S304, updating the Q value.

Wherein the updated Q value is the Q value corresponding to the l-th actionPerformance value with said base stationIs measured.

Specifically, the Q value corresponding to the ith action is determinedPerformance value with said base stationAnd order:

where t represents the t-th cycle (every time the base station executes) in which the base station executes the Q learning algorithm

S301-S306 are one period), and t is a positive integer greater than or equal to 1.

The principle of updating the Q value is that when the reward function of the action is greater than the original Q value of the action in the current period, we consider that the current terminal executes the action in parallel with other terminals, and the obtained performance of the base station is greater than the performance of the base station obtained when the terminal executes the action in the previous period and the other terminals execute other actions, and the Q value is updated, and the performance of the base station at this time is superior to the performance of the previous base station. And updating the original Q value into a newly obtained return function. The above loop is executed until the Q value converges (approaches a maximum unchanged value), and the performance of the base station and the maximum beam allocation scheme are obtained.

When the return function of the action is smaller than the original Q value of the action in the period, the Q value is not updated, namely, the action of which the performance of the base station is smaller than the previous period is omitted.

In the case of the conventional calculation method,is generally expressed as follows:

wherein,indicating a change in the environmental status of the base station due to factors other than the selection of the beam by the terminal. Since the environment of the base station in the present application is a single-state environmentThe above formula can be simplified as:

s305, mixingReplacing in Q-value tables

That is, the Q value in the Q value table is only in the newly obtained return functionAnd is updated when the current Q value is greater than the current Q value.

S306, determining the (l + 1) th action according to a preset action selection rule, and adding 1 to the value of l in the S1-S4.

After the K terminals in the system execute one action selection in parallel, the next action is selected for each terminal.

Wherein the preset rule comprises: in the embodiment of the application, an epsilon greedy strategy is adopted as an action selection criterion. The method specifically comprises the following steps:

a random number is generated. And determining the sizes of the random number and a preset coefficient. And if the random number is larger than the preset coefficient, selecting the action corresponding to the maximum Q value in the Q value table as the (l + 1) th action. And if the random number is smaller than the preset coefficient, randomly selecting one action as the (l + 1) th action.

This process may be embodied in that a random number x is first generated_k，x_k∈[0,1]. Comparing the magnitude relation between the random number and a preset coefficient epsilon; where ε ∈ (0, 1).

If x_k< ε, then from V_kOptionally one as the l +1 action if x_kIf the value is larger than epsilon, the action corresponding to the current maximum Q value in the Q value table is selected as the (l + 1) th action. According to this method, the calculation amount can be greatly reduced while obtaining the optimum performance of the base station.

And S307, repeatedly executing the steps S302-S306 until the Q value is converged, determining the beam selected when each terminal executes the current action, and distributing the beam selected when each terminal executes the current action to each terminal for data transmission.

In the beam allocation method provided by the embodiment of the application, a beam set corresponding to each terminal in the communication system is determined; wherein the set of beams includes at least one beam; calculating the performance of the base station after each terminal respectively selects one wave beam from the corresponding wave beam set to perform analog data transmission according to a preset rule; and when the performance of the base station meets a preset condition, the wave beam selected by each terminal is distributed to each terminal for data transmission. The beam selection can be converted into a corresponding data function, the performance value of the terminal when the terminal selects different beams is calculated, the beam selected by the terminal when the performance value of the base station is maximum is calculated, and the beam is distributed to each terminal for beam transmission, so that the transmission performance of the base station is greatly improved. Meanwhile, the beam allocation method provided by the embodiment of the application can also take fairness among all the terminals into consideration. Avoiding the occurrence of better overall performance of the base station but poorer performance for some terminals.

In an implementation manner of the embodiment of the present application, the allocation scheme obtained above may be simulated through MATLAB, and it is verified that the allocation scheme obtained in the embodiment of the present application improves the throughput of the base station and the fairness between the terminals compared with the beam allocation method in the prior art. The specific simulation process is as follows:

selecting a single cell of a millimeter wave large-scale MIMO system, wherein the frequency band of a base station is 28GHz, the bandwidth is 100MHz, the radius of the cell is 50 meters, the base station is positioned at the center of the cell, and all terminals are uniformly distributed in the coverage area of the base station. And a 3D beam forming model is adopted at the base station side, 48 beams can be formed to realize full coverage, 12 beams are formed in the horizontal direction and 4 beams are formed in the vertical direction, and the greedy coefficient epsilon is selected to be 0.1.

The base station throughput is compared with the scheme for performing beam allocation by using the Q learning algorithm, which is provided in the present application example, with the conventional LOS path beam allocation scheme, the scheme for performing beam allocation by using the interference beam elimination algorithm, and the scheme for performing beam allocation by using the enhanced beam selection algorithm, and the result is shown in fig. 4.

As can be seen from fig. 4, with the increasing number of terminals in the system, the improvement of the base station throughput (in Gbit/s) by the scheme for performing beam allocation by using the Q learning algorithm provided in the embodiment of the present application is more and more obvious.

When the number of terminals is increased to about 20, in the conventional LOS path beam allocation scheme, the interference beam elimination algorithm beam allocation scheme, and the enhanced beam selection algorithm beam allocation scheme, strong interference is generated between beams when each terminal transmits data. The average transmission rate of the terminals decreases as the number of terminals increases. The total throughput (downlink transmission rate) of the base station tends to a constant value.

The scheme for performing beam allocation by using the Q learning algorithm provided by the application focuses on the system global. Each terminal can be assigned an appropriate beam for data transmission. Resulting in less interference between the various terminals. The total throughput (or downlink transmission rate, throughput) of the base station increases with the number of terminals.

Therefore, the technical scheme provided by the application can greatly avoid data transmission interference and beam allocation conflict among the terminals, and selects the optimal beam allocation scheme for the base station. The performance of the base station is greatly improved.

Fairness between the terminals is then simulated. Defining a downlink rate transmission fairness parameter Raj Jain of each terminal, and specifically calculating the method as follows:

the value range of the transmission fairness parameter is (0, 1), the fairness among the terminals is worse as the parameter value is closer to 0, and the fairness among the terminals is better as the parameter value is closer to 1.

The fairness between the terminals is compared with the scheme of beam allocation by using the Q learning algorithm, the scheme of beam allocation by using the traditional LOS path beam allocation scheme, the scheme of beam allocation by using the interference beam elimination algorithm and the scheme of beam allocation by using the enhanced beam selection algorithm provided by the real-time example of the application. The results are shown in FIG. 5.

As can be seen from fig. 5, in the conventional LOS path beam allocation scheme, the interference beam elimination algorithm beam allocation scheme, and the enhanced beam selection algorithm beam allocation scheme, when the number of terminals increases, the transmission fairness parameter decreases faster, i.e., the transmission fairness among the terminals receives a larger influence. The transmission fairness parameter of the scheme for performing beam allocation by using the Q learning algorithm provided by the application is reduced obviously slower than that of the scheme. Better transmission fairness can be kept among all the terminals.

Therefore, the scheme for performing beam allocation by using the Q learning algorithm provided by the application focuses on global optimization, can adjust beams with conflicts, and can also schedule data transmission beams of different UEs from a global angle, so that each UE selects the most appropriate beam in the alternative beam set, thereby obtaining the optimal transmission efficiency and having the best terminal transmission rate fairness.

In the embodiment of the present application, the power adjustment apparatus may be divided into the functional modules or the functional units according to the method example, for example, each functional module or functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module may be implemented in a form of hardware, or may be implemented in a form of a software functional module or a functional unit. The division of the modules or units in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

As shown in fig. 6, the present embodiment provides a beam allocation apparatus, which is applied in the beam allocation system 100. For performing the aforementioned beam allocation method, the apparatus comprising:

a processing unit 601, configured to determine a beam set corresponding to each terminal in the communication system; wherein the set of beams includes at least one beam.

The processing unit 601 is further configured to calculate, according to a preset rule, performance of the base station after each terminal selects one beam from the corresponding beam set to perform analog data transmission.

The processing unit 601 is further configured to allocate the beam selected by each terminal to each terminal for data transmission when the performance of the base station meets a preset condition.

Optionally, the processing unit 601 is further configured to execute the following steps: determining a first action; the first action is an action that the terminal selects any beam to perform data transmission, and l is an integer greater than or equal to 1;

s1, determining the performance value of the base station after each terminal respectively executes the I-th actionWherein k represents each ofK terminal of the terminals, k being an integer greater than or equal to 1, v_klIndicating that the kth terminal performs the/th action.

S2, obtaining a Q value table corresponding to each terminal, wherein the Q value table comprises preset Q values corresponding to each ith actionAnd t represents the updating times of the Q value, the value of the Q value t is increased by 1 every time the Q value is updated, and t is an integer greater than or equal to 0.

S3, determining the Q value corresponding to the I actionPerformance value with said base stationAnd order:

will be described inAnd selecting one beam from the corresponding beam set as the base station performance after the base station performs analog data transmission.

Optionally, the processing unit 601 is further configured to execute the following steps:

s4, mixingReplacing in Q-value tables

S5, determining the (l + 1) th action according to a preset action selection rule, and adding 1 to the value of l in the S1-S4.

The above-mentioned S1-S5 are repeatedly performed until the Q values converge, and the beam selected when each of the terminals performs the current action is determined.

And allocating the beam selected when each terminal executes the current action to each terminal for data transmission.

Optionally, the processing unit 601 is further configured to: a random number is generated. And determining the sizes of the random number and a preset coefficient. And if the random number is larger than the preset coefficient, selecting the action corresponding to the maximum Q value in the Q value table as the (l + 1) th action. And if the random number is smaller than the preset coefficient, randomly selecting one action as the (l + 1) th action.

Optionally, on the basis of fig. 6, as shown in fig. 7, the apparatus further includes an obtaining unit 701; the obtaining unit 701 is configured to obtain at least one beam transmitted by a base station. The processing unit 601 is further configured to determine reference signal received power RSRP of each beam. The processing unit 601 is further configured to form the beam set by using beams whose RSRP meets a preset condition.

Fig. 8 shows a schematic diagram of another possible structure of the beam allocating apparatus in the above embodiment. The beam distribution apparatus includes: a processor 802 and a communications interface 803. The processor 802 is configured to control and manage the actions of the beam allocation apparatus, for example, to perform the steps performed by the processing unit 601 described above, and/or to perform other processes for the techniques described herein. The communication interface 803 is used to support communication of the beam allocation apparatus with other network entities, e.g., to perform the steps performed by the acquisition unit 701 described above, and/or to perform other processes for the techniques described herein. The beam allocation apparatus may further comprise a memory 801 and a bus 804, the memory 801 being used to store program codes and data of the beam allocation apparatus.

Wherein the memory 801 may be a memory in the beam allocation apparatus or the like, which may include a volatile memory, such as a random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

The processor 802 may be any logic block, module or circuitry that may implement or perform the various illustrative logical blocks, modules and circuits described in connection with the disclosure herein. The processor may be a central processing unit, general purpose processor, digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.

The bus 804 may be an Extended Industry Standard Architecture (EISA) bus or the like. The bus 804 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.

Embodiments of the present application provide a computer program product comprising instructions, which when run on a computer, cause the computer to perform the beam allocation method of the above method embodiments.

An embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the computer is caused to execute a beam allocation method in a method flow shown in the foregoing method embodiment.

The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a register, a hard disk, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, any suitable combination of the above, or any other form of computer readable storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). In embodiments of the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A beam distribution method is applied to a communication system, wherein the communication system comprises a base station and at least one terminal; each terminal utilizes the base station to transmit data; the method comprises the following steps:

determining a beam set corresponding to each terminal in the communication system; wherein the set of beams includes at least one beam;

calculating the performance of the base station after each terminal respectively selects one wave beam from the corresponding wave beam set to perform analog data transmission according to a preset rule;

and when the performance of the base station meets a preset condition, the wave beam selected by each terminal is distributed to each terminal for data transmission.

2. The method according to claim 1, wherein the calculating, according to a preset rule, the performance of the base station after each terminal selects one beam from the corresponding beam set for analog data transmission; the method comprises the following steps:

determining a first action; the first action is an action that the terminal selects any beam to perform data transmission, and l is an integer greater than or equal to 1;

s1, determining the performance value of the base station after each terminal respectively executes the I-th actionWherein k represents the kth terminal in the terminals, k is an integer greater than or equal to 1, v_klIndicating that the kth terminal performs the l-th action;

s2, obtaining a Q value table corresponding to each terminal, wherein the Q value table comprises preset Q values corresponding to each ith actiont represents the updating times of the Q value, the value of the Q value t is increased by 1 every time the Q value is updated, and t is an integer greater than or equal to 0;

3. The method according to claim 2, wherein when the performance of the base station satisfies a preset condition, the beam selected by each terminal is allocated to each terminal for data transmission; the method comprises the following steps:

s4, mixingReplacing in Q-value tables

S5, determining the (l + 1) th action according to a preset action selection rule, and adding 1 to the value of l in the S1-S4;

repeatedly executing the above S1-S5 until the Q value converges, and determining the beam selected by each terminal when executing the current action;

4. The method according to claim 3, wherein the determining the (l + 1) th action according to the preset action selection rule comprises:

generating a random number;

determining the random number and the preset coefficient;

if the random number is larger than the preset coefficient, selecting the action corresponding to the maximum Q value in the Q value table as the (l + 1) th action;

and if the random number is smaller than the preset coefficient, randomly selecting one action as the (l + 1) th action.

5. The method of any of claims 1-4, wherein the determining the set of beams corresponding to each terminal in the system comprises:

acquiring at least one beam transmitted by a base station;

determining a reference signal received power, RSRP, of each beam;

and forming the beams with the RSRP meeting the preset condition into the beam set.

6. An apparatus for beam allocation, the apparatus comprising:

the processing unit is used for determining a beam set corresponding to each terminal in the communication system; wherein the set of beams includes at least one beam;

the processing unit is further configured to calculate, according to a preset rule, a performance of the base station after each terminal selects one beam from the corresponding beam set to perform analog data transmission;

the processing unit is further configured to allocate the beam selected by each terminal to each terminal for data transmission when the performance of the base station meets a preset condition.

7. The beam assigning apparatus of claim 6, wherein the processing unit is further configured to:

s1, determining the performance value of the base station after each terminal executes the first actionWherein k represents the respective terminalK is an integer of 1 or more, v_klIndicating that the kth terminal performs the l-th action;

s2, obtaining a Q value table corresponding to each terminal, wherein the Q value table comprises preset Q values corresponding to each ith actionWherein t represents the updating times of the Q value, the value of the Q value t is increased by 1 every time the Q value is updated, and t is an integer greater than or equal to 0;

8. The beam assigning apparatus of claim 7, wherein the processing unit is further configured to:

s4, mixingReplacing in Q-value tables

9. The beam assigning apparatus of claim 8, wherein the processing unit is further configured to:

generating a random number;

determining the random number and the preset coefficient;

10. The beam assigning apparatus according to any one of claims 6-9, wherein the apparatus further comprises:

an obtaining unit, configured to obtain at least one beam transmitted by a base station;

the processing unit is further configured to determine a reference signal received power, RSRP, of each beam;

the processing unit is further configured to form the beam set from beams whose RSRP meets a preset condition.

11. A beam allocation apparatus, characterized in that the beam allocation apparatus comprises: a processor and a memory; wherein the memory is used for storing one or more programs, the one or more programs comprising computer executable instructions, which when executed by the processor, cause the beam allocation apparatus to perform the beam allocation method of any one of claims 1 to 5.

12. A computer readable storage medium having instructions stored thereon, which when run on a computer, cause the computer to perform the beam allocation method of any one of claims 1 to 5.