CN113811009B - Multi-base-station network resource intelligent allocation method based on space-time feature extraction - Google Patents
Multi-base-station network resource intelligent allocation method based on space-time feature extraction Download PDFInfo
- Publication number
- CN113811009B CN113811009B CN202111118071.8A CN202111118071A CN113811009B CN 113811009 B CN113811009 B CN 113811009B CN 202111118071 A CN202111118071 A CN 202111118071A CN 113811009 B CN113811009 B CN 113811009B
- Authority
- CN
- China
- Prior art keywords
- network
- vector
- algorithm
- current
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000000605 extraction Methods 0.000 title claims abstract description 20
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 61
- 230000007246 mechanism Effects 0.000 claims abstract description 21
- 238000013468 resource allocation Methods 0.000 claims abstract description 18
- 230000002787 reinforcement Effects 0.000 claims abstract description 17
- 230000007787 long-term memory Effects 0.000 claims abstract description 6
- 230000006403 short-term memory Effects 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 79
- 230000009471 action Effects 0.000 claims description 35
- 238000012549 training Methods 0.000 claims description 23
- 230000015654 memory Effects 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 230000008901 benefit Effects 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 230000001965 increasing effect Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 241000135164 Timea Species 0.000 claims description 3
- 238000009827 uniform distribution Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 abstract description 6
- 238000005457 optimization Methods 0.000 abstract description 4
- 239000000284 extract Substances 0.000 abstract description 3
- 230000002068 genetic effect Effects 0.000 abstract 1
- 230000003595 spectral effect Effects 0.000 abstract 1
- 230000008859 change Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/53—Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a multi-base-station network resource intelligent allocation method based on space-time characteristic extraction, which extracts position information and space characteristics of each 5G base station through a drawing attention mechanism, learns behavior habits of network users through a long-term and short-term memory mechanism, extracts time characteristics, analyzes fluctuation conditions of each slice data packet on space and time, and can obtain higher system return, namely higher spectral efficiency and better user experience compared with a resource allocation strategy based on an optimization algorithm and a genetic algorithm and a resource allocation strategy based on traditional reinforcement learning, and meanwhile, the method can adapt to a dynamically changing environment and has higher flexibility and robustness.
Description
Technical Field
The invention relates to the technical field of wireless communication, in particular to a multi-base-station network resource intelligent allocation method based on space-time characteristic extraction.
Background
Currently, 5G networks have become an indispensable key ring for the development of digital society, and compared with 4G networks, the 5G networks provide massive services capable of meeting wider demands, most of which are not realized by 4G.
ITU defines three main application scenarios for 5G: enhanced mobile bandwidth (eMBB), massive machine-type communication (mtc), ultra-reliable low-latency communication (URLLC). The eMBB is mainly applied to services such as AR/VR (augmented reality/virtual reality) by virtue of high bandwidth, the mMTC is applied to services such as Internet of things and smart home due to high connection density, and the URLLC with low time delay and high reliability can be applied to services such as automatic driving and remote operation.
However, if the 5G network uses a dedicated network for a specific service like the 4G network, it will cause a waste of resources to a large extent. This is because, as mentioned above, different services of the 5G network have different requirements on performance such as communication delay, bandwidth, mobility, reliability, etc., and a plurality of dedicated networks are required to cover all services, which results in huge deployment cost.
Therefore, researchers have proposed Network Slicing (NS) technology. The network slicing technology can flexibly allocate the existing network resources according to different user requirements. Compared with a single network, the method can provide a higher-performance logic network, flexibly allocate limited bandwidth resources, reasonably allocate the network resources without mutual interference, and have higher reliability and safety. In order to meet the changing user requirements and frequent switching between base stations caused by user mobility, how to optimize deployment and adjust resource allocation of network slices in real time is a significant challenge for current 5G service business. The key technical indexes are as follows: while meeting the Service Level Agreement (SLA) of slice subscribers as much as possible to improve the user Service Satisfaction Rate (SSR), the Spectrum Efficiency (SE) is maximized to reduce the resource cost and meet the needs of more subscribers.
The traditional special resource allocation scheme and the resource allocation strategy based on the optimization algorithm and the heuristic algorithm often have strict limiting conditions and complex deductions to form a specific optimization problem, the method is lack of flexibility and expandability, and when the user characteristics and the proportion of various performance users change, the algorithms cannot well respond. Therefore, it is necessary to dynamically and intelligently allocate spectrum resources to different slices according to a service request of a user in order to maximize SE while guaranteeing a basic SSR.
Reinforcement learning learns optimal behavior strategies that maximize revenue by constantly interacting with the environment, capturing state information in the environment, and making action selections based thereon in a trial-and-error manner. The traditional reinforcement learning method is difficult to process continuous or high-dimensional state space conditions, so that a deep learning feature extraction and prediction method is introduced into reinforcement learning, deep features of states are extracted by a deep neural network and represent a state value function, and an optimal action selection strategy for predicting a larger state space by a deep reinforcement learning algorithm is provided. Typical Deep reinforcement learning includes Deep Q Network (DQN), Actor-Critic (A2C), and the like.
Although convolutional neural networks have achieved great success in processing structured information, the data involved in many interesting tasks cannot be represented by a grid-like structure, but rather in an irregular domain, at which time one tends to graph the structure. There is an increasing interest in generalizing the convolution to the graph domain, from which graph convolution neural networks are constantly evolving. The graph attention machine is made into a representative graph convolution neural network mechanism, and a multi-head masking attention machine is introduced to endow neighbor nodes with different influence weights.
In addition, the movement of the user may cause the requirement of the same user to be continuously switched between different base stations, and the user behavior needs to be predicted and the requirement is met in time. The long-term and short-term memory mechanism is used as a typical recurrent neural network mechanism, and can be used for integrating and discarding information of time sequences and extracting time characteristics of the sequences.
Through the two mechanisms, the cooperative cooperation of the nodes in the graph can be enhanced, the change of user behaviors and the information aggregation can be predicted in advance, and meanwhile, the influence on the noise of the neighbor nodes and the influence on the user movement are more robust.
Disclosure of Invention
The invention aims to provide a multi-base-station network resource intelligent allocation method based on space-time characteristic extraction, and compared with the traditional optimization algorithm and heuristic algorithm, the method provided by the invention has better flexibility and expandability; compared with other reinforcement learning algorithms, the method provided by the invention can strengthen the change trend of the cooperative cooperation prediction data packet between the base stations, and predict the change trend of the user behavior to reduce the negative influence of the change of the number of the users in the base stations caused by the mobility of the users on the prediction of the state action value function, so that the reinforcement learning algorithm based on time characteristic extraction is adopted to carry out the resource allocation prediction of the multi-base-station cooperative wireless network, the prediction accuracy can be improved, and the wireless network performance is greatly improved.
In order to achieve the purpose, the invention provides the following technical scheme:
the application discloses a multi-base-station network resource intelligent allocation method based on space-time feature extraction, which is characterized by comprising the following steps:
s11, dividing the algorithm network structure G into a state vector coding network Embed, a long-short term memory network LSTM, a graph attention machine network GAT and a depth Q network DQN;
s12, wherein the State vector coding network Embedded is composed of two layers of fully connected networks, and is recorded as
hm=Embed(sm)=σ(Wesm+be) Wherein W ise、beIs the weight matrix of the layer, sigma is the activation function, and the N-dimensional state vector s in the multi-subject reinforcement learningmInputting the vector into a state vector coding network Embedded, and outputting a K-dimensional coded vector hm;
S13, encoding the current subject m and the subject of the current subject m on the adjacent node in the directed graph into a vector hmAnd hj, Calculating attention influence coefficients as input vectors of a graph attention machine mechanism network GAT, and carrying out normalization processing on the attention influence coefficients, wherein D ismRepresenting a set of subjects of the current subject m on adjacent nodes in the directed graph; multiplying the normalized attention influence coefficient by the input vector to calculate the first layer of the graph attention mechanism network GATOutputting; the attention influence coefficient, the normalization processing and the first layer output are subjected to split charging representation,the second layer output of the graph attention machine mechanism network GAT is
S14, for the current subject m, combining the first layer outputs of the until current T continuous time-stepped graph attention device network GAT into a sequenceCombining second layer outputs of until current T continuous time-stepping graph attention device networks GAT into a sequenceWill be provided withAndas the input vector sequence of the long-short term memory network LSTM, integrating the time characteristics of the sequence; the long-short term memory network LSTM is composed of a plurality of units, one unit comprises three structures of a memory gate, a forgetting gate and an output gate, and the output vector of the previous unit is converted into a vectorAndand the vector of the current timeAs input, output comprehensive informationAndthe memory gate, the forgetting gate and the output gate are used as the core to process data, and the long-short term memory network LSTM finally outputs a vectorAndwherein,represents the integrated information of all vectors at the first t-1 moments,representing information related to the current moment in the vector of the t-1 moment;
s15, the deep Q network DQN is composed of multiple layers of fully connected network, and h 'will be output through the first layer of the graph attention mechanism network GAT'mSecond layer output h ″)mAnd long and short term memory network LSTM processed output vectorThe DQN is used as the input of the depth Q network DQN, the return values of different actions executed in the current state are output, and the action with the highest return is selected and executed to interact with the environment;
s16, after the network structure is defined, a target network is constructed through a weight matrix in a Gaussian distribution random initialization algorithm networkThe network structure is completely the same as the algorithm network structure G, and the self weight initialization is completed by a method of copying G weight parameters;
s2, executing resource allocation;
s3, repeating the resource allocation N of step S2preSecondly, training an algorithm network structure G;
s4, training the algorithm network structure G for X times each time the algorithm network structure G in the step S3 is completed, and assigning the weight parameters of the algorithm network structure G to the target networkImplementing a target networkUpdating of (1);
s5, step S3 executing NtrainAnd finishing the training process of the algorithm network structure G.
Preferably, the calculation formula of the influence coefficient of the attention force in the substep S13 is
emj=ATT(Wshm,Wthj)=(Wshm)T(Wthj) The formula for normalizing the attention influence coefficient isThe formula for calculating the first layer output of the attention mechanism network of the graph is Wherein, Ws、Wt、Is the weight matrix of the layer and is the network parameter to be trained at the same time.
Preferably, the calculation formula of the gate is memorized in the step S14The calculation formula of the forgetting door isThe calculation formula of the output gate isThe correlation formula of the comprehensive information calculation isWherein, Wi、Wf、Wo、WC、bi、bf、bo、bCThe weight matrix of the layer and the network parameters to be trained are provided, and tanh is an activation function.
Preferably, the step S2 includes the following substeps:
s21, the wireless resource manager obtains the network state vector of each base station at the current time t, the number of the base stations is M,obtaining a random number from the (0, 1) uniform distribution, if the random number is larger than epsilon, the wireless resource manager randomly selects an effective action for each base station; if the random number is less than or equal to epsilon, the radio resource manager will stCombined with the state vectors of the previous T-1 time points, and input to the network G in step S1, each base station will obtain an action with the maximum return valuePerforming action atThe radio resource manager will receive the system benefit valueAnd observe the network state vector s at the next momentt+1;
S22, setting two hyper-parameters c by the wireless resource system manager1、c2And a threshold value c3The real-time report is calculated, whereinRepresents the mean value of the SSR slices in each base station acquired from the system, wherein c1Is 3 to 6, c2Is 1 to 3, c3The value of (a) is 0.75-1;
s23, the wireless resource manager will (S)t,at,rt,st+1) The quadruplets are stored to a size ofOf the cache areaTherein, the3000 to 10000.
Preferably, the step S3 includes the following processes: from the bufferSelecting p quadruples as training samples, and using p network state vectors stRespectively combined with the state vectors of T-1 previous moments to obtain a matrix s1,s2,…,sp]TInputting the data into the algorithm network structure G constructed in the step S1, obtaining the return values generated by executing different actions under p states, and respectively selecting [ a [ [ a ]1,a2,…,ap]TThe corresponding return value is recorded as the predicted return value G(s) under the current network parameters1,a1),G(s2,a2),…,G(sp,ap) P network state vectors s in the samplet+1Respectively combined with the state vectors of T-1 previous moments to obtain a matrixIt is input to the target network constructed in step S1In the method, the return values generated by executing different actions under p states are obtained and respectively selectedCorresponding to the maximum reported value, record as The loss function of the algorithmic network structure G isWherein r isiAnd (3) for the instant return corresponding to each sample, taking gamma as a discount factor, taking 0.75-0.9, and training the weight parameter of the algorithm network structure G by applying a batch gradient descent method.
Preferably, the step S5 includes the following processes: the wireless resource manager converts the current network state vector stAnd the state vector combination of the current base station and the previous T-1 moments is input into an algorithm network structure G, the algorithm network structure G outputs a return value corresponding to each action for each base station main body, and the action corresponding to the maximum return value is selected as a distribution strategy of the current base station and executed.
Preferably, the value of X is 100-500, and N ispreIs 500 to 3000, NtrainThe value of (1) is 1000-5000.
Preferably, the number p of the quadruples is 32.
Preferably, the batch gradient descent method is Adam, and the learning rate is 0.001.
Preferably, the initial value of epsilon in the sub-step S21 is 0, and the initial value is based on each step
ε=εmax-max(0,e-train_step/decay_step) Is increased, wherein epsilonmaxThe value is 0.85-0.95, train _ step is the number of training steps at the current moment, and escape _ step is 2000-4000.
The invention has the beneficial effects that:
(1) the invention utilizes a graph attention mechanism and a long-short term memory mechanism to preprocess the state vector, extracts time and space characteristics, enlarges the receptive field under the condition of limited communication conditions, and strengthens the cooperative cooperation among base stations and the prediction of user behaviors. Through network training, the influence weight of surrounding base stations on the current base station is obtained, the positive influence of effective variables is increased, the negative influence caused by noise and user movement is reduced, and the robustness of the system is enhanced.
(2) According to the method, the state action value function is estimated by using a deep reinforcement learning method, an optimal resource allocation strategy is selected, the reinforcement learning algorithm can generate sample data required by training through interaction with the environment, any experience hypothesis and prior hypothesis on the state action function distribution are not required, the method can adapt to more complex scenes, and the flexibility is better.
(3) Compared with the traditional resource sharing and numerical analysis algorithm, the wireless resource allocation strategy obtained through the cooperation of multiple base stations can obtain a higher system benefit value, namely, the utilization rate of frequency spectrum resources is improved while the basic user service satisfaction rate is ensured, and therefore the user experience is improved.
The features and advantages of the present invention will be described in detail by embodiments in conjunction with the accompanying drawings.
Drawings
FIG. 1 is a flowchart of a multi-base-station cooperative network resource allocation method based on temporal feature extraction reinforcement learning according to the present invention;
fig. 2 shows the system benefit values of the method and some allocation methods of the present invention as they change during the allocation of radio resources, when the parameters specified in the following examples are used.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of the multi-base-station cooperative network resource allocation method based on time feature extraction reinforcement learning of the present invention specifically includes the following steps:
s1, algorithm network structure G and target networkThe method specifically comprises the following substeps of:
s11, an algorithm network structure G of the method is divided into a state vector coding network Embed, a long-short term memory network LSTM, an image attention machine system network GAT and a depth Q network DQN.
S12, wherein the state vector coding network is composed of two layers of full connection network, and is recorded as
hm=Embed(sm)=σ(Wesm+be),
Wherein We、beIs the weight matrix for that layer, and σ is the "ReLu" activation function. Enhancing N-dimensional state vector s in multi-subject learningm(status vector of mth body) is input into Embed, and K-dimensional encoded vector h is outputm。
S13, encoding the current subject m and the subject of the current subject m on the adjacent node in the directed graph into a vector hmAnd hj,(wherein DmA subject set representing the current subject m on adjacent nodes in the directed graph, and Euclidean distance is used as a standard for constructing the directed graph) as an input vector of the graph attention mechanism network, and is used for calculating an attention influence coefficient and carrying out normalization processing on the attention influence coefficient,
emj=ATT(Wshm,Wthj)=(Wshm)T(Wthj), (2)
and multiplying the normalized attention influence coefficient by the input vector, calculating the first-layer output of the attention mechanism network of the graph through a formula (4), wherein the value of the multi-head attention mechanism parameter K is 2-20.
The three steps of calculating the attention influence coefficient, normalizing, calculating output and the like are represented by the following formulas,
it is noted that the mechanical network has two layers, the second layer has the same structure as the first layer and is represented by the following formula,
S14, for the current subject m, combining the outputs of the two layers of GAT which are continuously stepped until the current T times into a sequence Andas the input vector sequence of the long-short term memory network LSTM, the time characteristics of the sequence are integrated. The long and short term memory network LSTM comprises a plurality of units, wherein one unit comprises three structures of a memory gate, a forgetting gate and an output gate, and the output vector of the previous unit is converted into a vectorAndand the vector of the current timeAs input, outputAndwherein,represents the integrated information of all vectors at the first t-1 moments,representing information associated with the current time instant in the vector at time t-1.
The output of the memory gate is calculated by the formula,
the calculation formula output by the forgetting gate is as follows,
the output of the output gate is calculated by the formula,
the correlation formula of the comprehensive information calculation is
Wherein, Wi、Wf、Wo、WC、bi、bf、bo、bCThe weight matrix of the layer and the network parameters to be trained are provided, and tanh is an activation function.
The three gates are used as cores for data processing, and the long-short term memory network LSTM finally outputs a vectorAnd
s15, the deep Q network DQN is composed of a plurality of layers of full-connection networks, output vectors processed by the two-layer graph attention machine network GAT and the long-short term memory network LSTM are used as the input of the deep Q network DQN, return values of different actions executed under the current state are output, and the action with the highest return is selected and executed to interact with the environment;
s16, after the network structure is defined, a target network is constructed through a weight matrix in a Gaussian distribution random initialization algorithm networkThe network structure is completely the same as the algorithm network structure G, and the self weight initialization is completed by a method of copying G weight parameters.
S2, performing resource allocation, specifically including the following substeps:
s21, the wireless resource manager obtains the network state vector of each base station at the current t moment, the number of the base stations is And the radio resource manager acquires a random number from (0, 1) uniform distribution by adopting an epsilon-greedy algorithm, and if the random number is greater than epsilon, the radio resource manager randomly selects an effective action for each base station. If the random number is less than or equal to epsilon then the radio resource manager will stCombined with the state vectors of the previous T-1 time points, and input to the network G in step S1, each base station will obtain an action with the maximum return valuePerforming action atThe radio resource manager will receive the system benefit valueAnd observe the network state vector s at the next momentt+1. The initial value of epsilon is 0, and the basis is that the operation is carried out once
ε=εmax-max(0,e-train_step/decay_step) Is increased, wherein epsilonmaxThe value is 0.85-0.95, train _ step is the number of training steps at the current moment, and escape _ step is 2000-4000.
S22, setting two hyper-parameters c by the wireless resource system manager1、c2And a threshold value c3The immediate return is calculated by the following formula,
whereinWhich represents the mean value of the SSRs of the slices in each base station acquired from the system. Setting c1Is 3 to 6, c2Is 1 to 3, c3The value of (a) is 0.75-1.
S23, the wireless resource manager will (S)t,at,rt,st+1) The quadruplets are stored to a size ofOf the cache areaIn the interior of the container body,the value of (a) is 3000-10000. If it is notAnd if the space is full, deleting the quadruple stored firstly and storing the latest quadruple by adopting a first-in first-out method.
S3, repeating the resource allocation N of step S2preSub, NpreThe value of (2) is 500-3000, so that the cache area has enough data for training the current network parameters, and the process of training the network G is as follows:
from the bufferSelecting p quadruples as training samples, and using p network state vectors stRespectively combined with the state vectors of T-1 previous moments to obtain a matrix s1,s2,…,sp]TInputting the data into the algorithm network structure G constructed in the step (1) to obtain the return values generated by executing different actions under p states, and respectively selecting [ a ]1,a2,…,ap]TThe corresponding return value is recorded as the predicted return value G(s) under the current network parameters1,a1),G(s2,a2),…,G(sp,ap)。
P network state vectors s in the samplet+1Respectively combined with the state vectors of T-1 previous moments to obtain a matrixAnd inputs it to step S1 to constructTarget network ofIn the method, the return values generated by executing different actions under p states are obtained, and the return values are selected and respectively selectedThese actions correspond to the maximum reported value and are recorded as
The loss function of the algorithm network structure G is:
wherein r isiAnd (3) taking 0.75-0.9 as the instant return corresponding to each sample and gamma as a discount factor, applying a weight parameter of a batch gradient descent method training algorithm network structure G, selecting Adam as an optimizer, and setting the learning rate to be 0.001.
S4, training the algorithm network structure G for X times in each step of finishing S3, wherein X is 100-500, and G network weight parameters are assigned to the target networkImplementing a target networkAnd (4) updating.
S5, step S3 executing NtrainNext to, NtrainThe value of (1) is 1000-5000, and the training process of the algorithm network structure G is completed. The wireless resource manager converts the current network state vector stThe state vector combination of the state vector and the previous T-1 moments is input into an algorithm network structure G, the algorithm network structure G outputs a return value corresponding to each action for each base station main body, and the maximum return value is selectedAnd taking the action corresponding to the report value as the allocation strategy of the current base station and executing the action.
On a server configured as shown in table 1, a simulation environment is written in Python language, a network framework is built by keras, and tests are performed by taking 3 different types of services (call, video and ultra-reliable low-delay service) as an example. The system has 19 base stations, i.e. M is 19, and the base stations are arranged in a honeycomb manner, the total bandwidth of each base station is 10M, the distributed granularity is set to be 0.5M, and therefore the total 171 distribution strategy, i.e. the number of effective actions is 171. The discount factor γ is set to 0.9 and the multi-head attention coefficient K is 8. Furthermore,. epsilonmaxThe value is 0.95 and the value of escape _ step is 2000. Buffer areaHas a size of 5000, NpreIs 2000, NtrainHas a value of 10000. The optimizer in the batch gradient descent algorithm used by the training algorithm network structure G is Adam, and the learning rate is 0.001. Other parameter cases are as follows:
X=200、c1=5.5、c2=2、c3=0.8、p=32。
TABLE 1 System test platform parameters
Processor with a memory having a plurality of memory cells | Intel i9-9900KF 3.6GHZ |
Memory device | 64G |
Display card | NVIDIA GTX 2080 |
Software platform | keras 2.2.4 |
The method of the present invention is compared to some resource allocation methods, including the Hard Slicing algorithm (Hard Slicing), the DQN algorithm, the LSTM-A2C algorithm, and the GAT-DQN algorithm without adding LSTM. Wherein, the hard slicing algorithm is to uniformly distribute the total bandwidth of the base station to each network slice; the LSTM-A2C algorithm is an algorithm that combines long and short term memory networks with deep reinforcement learning. Referring to fig. 2, the variation of the system benefit values obtained by various methods during the radio resource allocation process is shown, wherein the system benefit values represent the average return values of 19 base stations. For ease of analysis, the median value was plotted every 100 steps. The curves in the analysis chart show that in the previous 4000 steps, as the deep reinforcement learning algorithm needs to be trained by network parameters, the return value is relatively larger in fluctuation and lower in median return compared with the average score method. When the network training is finished, namely 4000 steps, the system benefit value of each deep reinforcement learning algorithm is obviously improved, and the method is more excellent and has better system stability and higher system benefit value.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A multi-base station network resource intelligent allocation method based on space-time feature extraction is characterized by comprising the following steps:
s11, dividing the algorithm network structure G into a state vector coding network Embed, a long-short term memory network LSTM, a graph attention machine network GAT and a depth Q network DQN;
s12, wherein the State vector coding network Embedded is composed of two layers of fully connected networks, and is recorded as
hm=Embed(sm)=σ(Wesm+be) Wherein W ise、beIs the weight matrix of the layer, sigma is the activation function, and the N-dimensional state vector s in the multi-subject reinforcement learningmInputting the vector into a state vector coding network Embedded, and outputting a K-dimensional coded vector hm;
S13, encoding the current subject m and the subject of the current subject m on the adjacent node in the directed graph into a vector hmAndcalculating attention influence coefficients as input vectors of a graph attention machine mechanism network GAT, and carrying out normalization processing on the attention influence coefficients, wherein D ismRepresenting a set of subjects of the current subject m on adjacent nodes in the directed graph; multiplying the normalized attention influence coefficient by the input vector to calculate the first layer output of the attention mechanism network GAT of the graph; the attention influence coefficient, the normalization processing and the first layer output are subjected to split charging representation,the second layer output of the graph attention machine mechanism network GAT is
S14, for the current subject m, combining the first layer outputs of the until current T continuous time-stepped graph attention device network GAT into a sequenceCombining second layer outputs of until current T continuous time-stepping graph attention device networks GAT into a sequenceWill be provided withAndas the input vector sequence of the long-short term memory network LSTM, integrating the time characteristics of the sequence; the long-short term memory network LSTM is composed of a plurality of units, one unit comprises three structures of a memory gate, a forgetting gate and an output gate, and the output vector of the previous unit is converted into a vectorAndand the vector of the current timeAs input, output comprehensive informationAndthe memory gate, the forgetting gate and the output gate are used as the core to process data, and the long-short term memory network LSTM finally outputs a vectorAndwherein,represents the integrated information of all vectors at the first t-1 moments,representing information related to the current moment in the vector of the t-1 moment;
s15, the deep Q network DQN is composed of multiple layers of fully connected network, and h 'will be output through the first layer of the graph attention mechanism network GAT'mSecond layer output h'mAnd long and short term memory network LSTM processed output vectorAndthe DQN is used as the input of the depth Q network DQN, the return values of different actions executed in the current state are output, and the action with the highest return is selected and executed to interact with the environment;
s16, after the network structure is defined, a target network is constructed through a weight matrix in a Gaussian distribution random initialization algorithm networkThe network structure is completely the same as the algorithm network structure G, and the self weight initialization is completed by a method of copying G weight parameters;
s2, executing resource allocation;
s3, repeating the resource allocation N of step S2preSecondly, training an algorithm network structure G;
s4, training the algorithm network structure G for X times each time the algorithm network structure G in the step S3 is completed, and assigning the weight parameters of the algorithm network structure G to the target networkImplementing a target networkUpdating of (1);
s5, step S3 executing NtrainAnd finishing the training process of the algorithm network structure G.
2. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the calculation formula of the influence coefficient of the attention force in the substep S13 is emj=ATT(Wshm,Wthj)=(Wshm)T(Wthj) The formula for normalizing the attention influence coefficient isThe formula for calculating the first layer output of the attention mechanism network of the graph isWherein, Ws、Wt、The weight matrix of the layer is also the network parameter to be trained.
3. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the calculation formula of the memory gate in the step S14
Wherein, wi,wf、wo、wC.bi、bf、bo,bCThe weight matrix of the layer and the network parameters to be trained are provided, and tanh is an activation function.
4. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1,
the step S2 includes the following substeps:
s21, the wireless resource manager obtains the network state vector of each base station at the current time t, the number of the base stations is M,obtaining a random number from the (0, 1) uniform distribution, if the random number is larger than epsilon, the wireless resource manager randomly selects an effective action for each base station; if the random number is less than or equal to epsilon, the radio resource manager will stCombined with the state vectors of the previous T-1 time points, and input to the network G in step S1, each base station will obtain an action with the maximum return valuePerforming action atThe radio resource manager will receive the system benefit valueAnd observe the network state vector s at the next momentt+1;
S22, setting two hyper-parameters c by the wireless resource system manager1、c2And a threshold value c3The real-time report is calculated, whereinRepresents the mean value of the SSR slices in each base station acquired from the system, wherein c1Is 3 to 6, c2Is 1 to 3, c3The value of (a) is 0.75-1;
5. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the step S3 includes the following processes: from the bufferSelecting p quadruples as training samples, and using p network state vectors stRespectively combined with the state vectors of T-1 previous moments to obtain a matrix s1,s2,…,sp]TInputting the data into the algorithm network structure G constructed in the step S1, obtaining the return values generated by executing different actions under p states, and respectively selecting [ a [ [ a ]1,a2,…,ap]TThe corresponding return value is recorded as the predicted return value G(s) under the current network parameters1,a1),G(s2,a2),…,G(sp,ap) P network state vectors s in the samplet+1Respectively with itCombining the state vectors of previous T-1 moments to obtain a matrixIt is input to the target network constructed in step S1In the method, the return values generated by executing different actions under p states are obtained and respectively selectedCorresponding to the maximum reported value, record asThe loss function of the algorithmic network structure G isWherein r isiAnd (3) for the instant return corresponding to each sample, taking gamma as a discount factor, taking 0.75-0.9, and training the weight parameter of the algorithm network structure G by applying a batch gradient descent method.
6. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the step S5 includes the following processes: the wireless resource manager converts the current network state vector stAnd the state vector combination of the current base station and the previous T-1 moments is input into an algorithm network structure G, the algorithm network structure G outputs a return value corresponding to each action for each base station main body, and the action corresponding to the maximum return value is selected as a distribution strategy of the current base station and executed.
7. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the value of X is 100-500, and N ispreIs 500 to 3000, NtrainThe value of (1) is 1000-5000.
8. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 5, wherein: the number p of the quadruples is 32.
9. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 5, wherein: the batch gradient descent method is Adam, and the learning rate is 0.001.
10. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 4, wherein: in the sub-step S21, the initial value of epsilon is 0, and each step is executed according to the condition that g is gmax-max(0,e-train-step/decay_step) Is increased, wherein epsilonmaxThe value is 0.85-0.95, train _ step is the number of training steps at the current moment, and escape _ step is 2000-4000.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111118071.8A CN113811009B (en) | 2021-09-24 | 2021-09-24 | Multi-base-station network resource intelligent allocation method based on space-time feature extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111118071.8A CN113811009B (en) | 2021-09-24 | 2021-09-24 | Multi-base-station network resource intelligent allocation method based on space-time feature extraction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113811009A CN113811009A (en) | 2021-12-17 |
CN113811009B true CN113811009B (en) | 2022-04-12 |
Family
ID=78896400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111118071.8A Active CN113811009B (en) | 2021-09-24 | 2021-09-24 | Multi-base-station network resource intelligent allocation method based on space-time feature extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113811009B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114615183B (en) * | 2022-03-14 | 2023-09-05 | 广东技术师范大学 | Routing method, device, computer equipment and storage medium based on resource prediction |
CN117313551A (en) * | 2023-11-28 | 2023-12-29 | 中国科学院合肥物质科学研究院 | Radionuclide diffusion prediction method and system based on GAT-LSTM |
CN118093057B (en) * | 2024-04-24 | 2024-07-05 | 武汉攀升鼎承科技有限公司 | Notebook computer system resource optimization method and system based on user using habit |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111212019A (en) * | 2018-11-22 | 2020-05-29 | 阿里巴巴集团控股有限公司 | User account access control method, device and equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111182637B (en) * | 2019-12-24 | 2022-06-21 | 浙江大学 | Wireless network resource allocation method based on generation countermeasure reinforcement learning |
CN112749005B (en) * | 2020-07-10 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Resource data processing method, device, computer equipment and storage medium |
CN112396492A (en) * | 2020-11-19 | 2021-02-23 | 天津大学 | Conversation recommendation method based on graph attention network and bidirectional long-short term memory network |
CN112512070B (en) * | 2021-02-05 | 2021-05-11 | 之江实验室 | Multi-base-station cooperative wireless network resource allocation method based on graph attention mechanism reinforcement learning |
CN113051822B (en) * | 2021-03-25 | 2024-09-24 | 浙江工业大学 | Industrial system anomaly detection method based on graph attention network and LSTM automatic coding model |
-
2021
- 2021-09-24 CN CN202111118071.8A patent/CN113811009B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111212019A (en) * | 2018-11-22 | 2020-05-29 | 阿里巴巴集团控股有限公司 | User account access control method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113811009A (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113811009B (en) | Multi-base-station network resource intelligent allocation method based on space-time feature extraction | |
CN110334201B (en) | Intention identification method, device and system | |
CN112512070B (en) | Multi-base-station cooperative wireless network resource allocation method based on graph attention mechanism reinforcement learning | |
CN111339433B (en) | Information recommendation method and device based on artificial intelligence and electronic equipment | |
US20210097646A1 (en) | Method and apparatus for enhancing video frame resolution | |
KR20190119548A (en) | Method and apparatus for processing image noise | |
CN106448670A (en) | Dialogue automatic reply system based on deep learning and reinforcement learning | |
Zhang et al. | Optimization of image transmission in cooperative semantic communication networks | |
CN113852432B (en) | Spectrum Prediction Sensing Method Based on RCS-GRU Model | |
CN107943583A (en) | Processing method, device, storage medium and the electronic equipment of application program | |
CN116775807A (en) | Natural language processing and model training method, equipment and storage medium | |
CN112183742A (en) | Neural network hybrid quantization method based on progressive quantization and Hessian information | |
CN112766467B (en) | Image identification method based on convolution neural network model | |
CN110633735B (en) | Progressive depth convolution network image identification method and device based on wavelet transformation | |
CN114490618A (en) | Ant-lion algorithm-based data filling method, device, equipment and storage medium | |
CN117350304A (en) | Multi-round dialogue context vector enhancement method and system | |
Kaushik et al. | Traffic prediction in telecom systems using deep learning | |
Zhang et al. | Machine learning based protocol classification in unlicensed 5 GHz bands | |
CN117177279A (en) | Multi-user multi-task computing unloading method, device and medium containing throughput prediction | |
CN114449536B (en) | 5G ultra-dense network multi-user access selection method based on deep reinforcement learning | |
CN111813538A (en) | Edge computing resource allocation method | |
CN112633491A (en) | Method and device for training neural network | |
Usha et al. | Dynamic spectrum sensing in cognitive radio networks using ML model | |
CN112906640B (en) | Space-time situation prediction method and device based on deep learning and readable storage medium | |
CN114363671A (en) | Multimedia resource pushing method, model training method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |