CN113811009B - Multi-base-station network resource intelligent allocation method based on space-time feature extraction - Google Patents

Multi-base-station network resource intelligent allocation method based on space-time feature extraction Download PDF

Info

Publication number
CN113811009B
CN113811009B CN202111118071.8A CN202111118071A CN113811009B CN 113811009 B CN113811009 B CN 113811009B CN 202111118071 A CN202111118071 A CN 202111118071A CN 113811009 B CN113811009 B CN 113811009B
Authority
CN
China
Prior art keywords
network
vector
algorithm
current
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111118071.8A
Other languages
Chinese (zh)
Other versions
CN113811009A (en
Inventor
李荣鹏
肖柏狄
郭荣斌
赵志峰
张宏纲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Lab
Original Assignee
Zhejiang University ZJU
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Lab filed Critical Zhejiang University ZJU
Priority to CN202111118071.8A priority Critical patent/CN113811009B/en
Publication of CN113811009A publication Critical patent/CN113811009A/en
Application granted granted Critical
Publication of CN113811009B publication Critical patent/CN113811009B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/53Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a multi-base-station network resource intelligent allocation method based on space-time characteristic extraction, which extracts position information and space characteristics of each 5G base station through a drawing attention mechanism, learns behavior habits of network users through a long-term and short-term memory mechanism, extracts time characteristics, analyzes fluctuation conditions of each slice data packet on space and time, and can obtain higher system return, namely higher spectral efficiency and better user experience compared with a resource allocation strategy based on an optimization algorithm and a genetic algorithm and a resource allocation strategy based on traditional reinforcement learning, and meanwhile, the method can adapt to a dynamically changing environment and has higher flexibility and robustness.

Description

Multi-base-station network resource intelligent allocation method based on space-time feature extraction
Technical Field
The invention relates to the technical field of wireless communication, in particular to a multi-base-station network resource intelligent allocation method based on space-time characteristic extraction.
Background
Currently, 5G networks have become an indispensable key ring for the development of digital society, and compared with 4G networks, the 5G networks provide massive services capable of meeting wider demands, most of which are not realized by 4G.
ITU defines three main application scenarios for 5G: enhanced mobile bandwidth (eMBB), massive machine-type communication (mtc), ultra-reliable low-latency communication (URLLC). The eMBB is mainly applied to services such as AR/VR (augmented reality/virtual reality) by virtue of high bandwidth, the mMTC is applied to services such as Internet of things and smart home due to high connection density, and the URLLC with low time delay and high reliability can be applied to services such as automatic driving and remote operation.
However, if the 5G network uses a dedicated network for a specific service like the 4G network, it will cause a waste of resources to a large extent. This is because, as mentioned above, different services of the 5G network have different requirements on performance such as communication delay, bandwidth, mobility, reliability, etc., and a plurality of dedicated networks are required to cover all services, which results in huge deployment cost.
Therefore, researchers have proposed Network Slicing (NS) technology. The network slicing technology can flexibly allocate the existing network resources according to different user requirements. Compared with a single network, the method can provide a higher-performance logic network, flexibly allocate limited bandwidth resources, reasonably allocate the network resources without mutual interference, and have higher reliability and safety. In order to meet the changing user requirements and frequent switching between base stations caused by user mobility, how to optimize deployment and adjust resource allocation of network slices in real time is a significant challenge for current 5G service business. The key technical indexes are as follows: while meeting the Service Level Agreement (SLA) of slice subscribers as much as possible to improve the user Service Satisfaction Rate (SSR), the Spectrum Efficiency (SE) is maximized to reduce the resource cost and meet the needs of more subscribers.
The traditional special resource allocation scheme and the resource allocation strategy based on the optimization algorithm and the heuristic algorithm often have strict limiting conditions and complex deductions to form a specific optimization problem, the method is lack of flexibility and expandability, and when the user characteristics and the proportion of various performance users change, the algorithms cannot well respond. Therefore, it is necessary to dynamically and intelligently allocate spectrum resources to different slices according to a service request of a user in order to maximize SE while guaranteeing a basic SSR.
Reinforcement learning learns optimal behavior strategies that maximize revenue by constantly interacting with the environment, capturing state information in the environment, and making action selections based thereon in a trial-and-error manner. The traditional reinforcement learning method is difficult to process continuous or high-dimensional state space conditions, so that a deep learning feature extraction and prediction method is introduced into reinforcement learning, deep features of states are extracted by a deep neural network and represent a state value function, and an optimal action selection strategy for predicting a larger state space by a deep reinforcement learning algorithm is provided. Typical Deep reinforcement learning includes Deep Q Network (DQN), Actor-Critic (A2C), and the like.
Although convolutional neural networks have achieved great success in processing structured information, the data involved in many interesting tasks cannot be represented by a grid-like structure, but rather in an irregular domain, at which time one tends to graph the structure. There is an increasing interest in generalizing the convolution to the graph domain, from which graph convolution neural networks are constantly evolving. The graph attention machine is made into a representative graph convolution neural network mechanism, and a multi-head masking attention machine is introduced to endow neighbor nodes with different influence weights.
In addition, the movement of the user may cause the requirement of the same user to be continuously switched between different base stations, and the user behavior needs to be predicted and the requirement is met in time. The long-term and short-term memory mechanism is used as a typical recurrent neural network mechanism, and can be used for integrating and discarding information of time sequences and extracting time characteristics of the sequences.
Through the two mechanisms, the cooperative cooperation of the nodes in the graph can be enhanced, the change of user behaviors and the information aggregation can be predicted in advance, and meanwhile, the influence on the noise of the neighbor nodes and the influence on the user movement are more robust.
Disclosure of Invention
The invention aims to provide a multi-base-station network resource intelligent allocation method based on space-time characteristic extraction, and compared with the traditional optimization algorithm and heuristic algorithm, the method provided by the invention has better flexibility and expandability; compared with other reinforcement learning algorithms, the method provided by the invention can strengthen the change trend of the cooperative cooperation prediction data packet between the base stations, and predict the change trend of the user behavior to reduce the negative influence of the change of the number of the users in the base stations caused by the mobility of the users on the prediction of the state action value function, so that the reinforcement learning algorithm based on time characteristic extraction is adopted to carry out the resource allocation prediction of the multi-base-station cooperative wireless network, the prediction accuracy can be improved, and the wireless network performance is greatly improved.
In order to achieve the purpose, the invention provides the following technical scheme:
the application discloses a multi-base-station network resource intelligent allocation method based on space-time feature extraction, which is characterized by comprising the following steps:
s1, algorithm network structure G and target network
Figure GDA0003504571840000021
Building and initializing;
s11, dividing the algorithm network structure G into a state vector coding network Embed, a long-short term memory network LSTM, a graph attention machine network GAT and a depth Q network DQN;
s12, wherein the State vector coding network Embedded is composed of two layers of fully connected networks, and is recorded as
hm=Embed(sm)=σ(Wesm+be) Wherein W ise、beIs the weight matrix of the layer, sigma is the activation function, and the N-dimensional state vector s in the multi-subject reinforcement learningmInputting the vector into a state vector coding network Embedded, and outputting a K-dimensional coded vector hm
S13, encoding the current subject m and the subject of the current subject m on the adjacent node in the directed graph into a vector hmAnd hj,
Figure GDA0003504571840000031
Figure GDA0003504571840000032
Calculating attention influence coefficients as input vectors of a graph attention machine mechanism network GAT, and carrying out normalization processing on the attention influence coefficients, wherein D ismRepresenting a set of subjects of the current subject m on adjacent nodes in the directed graph; multiplying the normalized attention influence coefficient by the input vector to calculate the first layer of the graph attention mechanism network GATOutputting; the attention influence coefficient, the normalization processing and the first layer output are subjected to split charging representation,
Figure GDA0003504571840000033
the second layer output of the graph attention machine mechanism network GAT is
Figure GDA0003504571840000034
S14, for the current subject m, combining the first layer outputs of the until current T continuous time-stepped graph attention device network GAT into a sequence
Figure GDA0003504571840000035
Combining second layer outputs of until current T continuous time-stepping graph attention device networks GAT into a sequence
Figure GDA0003504571840000036
Will be provided with
Figure GDA0003504571840000037
And
Figure GDA0003504571840000038
as the input vector sequence of the long-short term memory network LSTM, integrating the time characteristics of the sequence; the long-short term memory network LSTM is composed of a plurality of units, one unit comprises three structures of a memory gate, a forgetting gate and an output gate, and the output vector of the previous unit is converted into a vector
Figure GDA0003504571840000039
And
Figure GDA00035045718400000310
and the vector of the current time
Figure GDA00035045718400000311
As input, output comprehensive information
Figure GDA00035045718400000312
And
Figure GDA00035045718400000313
the memory gate, the forgetting gate and the output gate are used as the core to process data, and the long-short term memory network LSTM finally outputs a vector
Figure GDA00035045718400000314
And
Figure GDA00035045718400000315
wherein,
Figure GDA00035045718400000316
represents the integrated information of all vectors at the first t-1 moments,
Figure GDA00035045718400000317
representing information related to the current moment in the vector of the t-1 moment;
s15, the deep Q network DQN is composed of multiple layers of fully connected network, and h 'will be output through the first layer of the graph attention mechanism network GAT'mSecond layer output h ″)mAnd long and short term memory network LSTM processed output vector
Figure GDA00035045718400000318
The DQN is used as the input of the depth Q network DQN, the return values of different actions executed in the current state are output, and the action with the highest return is selected and executed to interact with the environment;
s16, after the network structure is defined, a target network is constructed through a weight matrix in a Gaussian distribution random initialization algorithm network
Figure GDA00035045718400000319
The network structure is completely the same as the algorithm network structure G, and the self weight initialization is completed by a method of copying G weight parameters;
s2, executing resource allocation;
s3, repeating the resource allocation N of step S2preSecondly, training an algorithm network structure G;
s4, training the algorithm network structure G for X times each time the algorithm network structure G in the step S3 is completed, and assigning the weight parameters of the algorithm network structure G to the target network
Figure GDA00035045718400000320
Implementing a target network
Figure GDA00035045718400000321
Updating of (1);
s5, step S3 executing NtrainAnd finishing the training process of the algorithm network structure G.
Preferably, the calculation formula of the influence coefficient of the attention force in the substep S13 is
emj=ATT(Wshm,Wthj)=(Wshm)T(Wthj) The formula for normalizing the attention influence coefficient is
Figure GDA0003504571840000041
The formula for calculating the first layer output of the attention mechanism network of the graph is
Figure GDA0003504571840000042
Figure GDA0003504571840000043
Wherein, Ws、Wt
Figure GDA0003504571840000044
Is the weight matrix of the layer and is the network parameter to be trained at the same time.
Preferably, the calculation formula of the gate is memorized in the step S14
Figure GDA0003504571840000045
The calculation formula of the forgetting door is
Figure GDA0003504571840000046
The calculation formula of the output gate is
Figure GDA0003504571840000047
The correlation formula of the comprehensive information calculation is
Figure GDA0003504571840000048
Wherein, Wi、Wf、Wo、WC、bi、bf、bo、bCThe weight matrix of the layer and the network parameters to be trained are provided, and tanh is an activation function.
Preferably, the step S2 includes the following substeps:
s21, the wireless resource manager obtains the network state vector of each base station at the current time t, the number of the base stations is M,
Figure GDA0003504571840000049
obtaining a random number from the (0, 1) uniform distribution, if the random number is larger than epsilon, the wireless resource manager randomly selects an effective action for each base station; if the random number is less than or equal to epsilon, the radio resource manager will stCombined with the state vectors of the previous T-1 time points, and input to the network G in step S1, each base station will obtain an action with the maximum return value
Figure GDA00035045718400000410
Performing action atThe radio resource manager will receive the system benefit value
Figure GDA00035045718400000411
And observe the network state vector s at the next momentt+1
S22, setting two hyper-parameters c by the wireless resource system manager1、c2And a threshold value c3The real-time report is calculated,
Figure GDA00035045718400000412
Figure GDA00035045718400000413
wherein
Figure GDA00035045718400000414
Represents the mean value of the SSR slices in each base station acquired from the system, wherein c1Is 3 to 6, c2Is 1 to 3, c3The value of (a) is 0.75-1;
s23, the wireless resource manager will (S)t,at,rt,st+1) The quadruplets are stored to a size of
Figure GDA00035045718400000417
Of the cache area
Figure GDA00035045718400000415
Therein, the
Figure GDA00035045718400000416
3000 to 10000.
Preferably, the step S3 includes the following processes: from the buffer
Figure GDA00035045718400000418
Selecting p quadruples as training samples, and using p network state vectors stRespectively combined with the state vectors of T-1 previous moments to obtain a matrix s1,s2,…,sp]TInputting the data into the algorithm network structure G constructed in the step S1, obtaining the return values generated by executing different actions under p states, and respectively selecting [ a [ [ a ]1,a2,…,ap]TThe corresponding return value is recorded as the predicted return value G(s) under the current network parameters1,a1),G(s2,a2),…,G(sp,ap) P network state vectors s in the samplet+1Respectively combined with the state vectors of T-1 previous moments to obtain a matrix
Figure GDA0003504571840000051
It is input to the target network constructed in step S1
Figure GDA0003504571840000052
In the method, the return values generated by executing different actions under p states are obtained and respectively selected
Figure GDA0003504571840000053
Corresponding to the maximum reported value, record as
Figure GDA0003504571840000054
Figure GDA0003504571840000055
The loss function of the algorithmic network structure G is
Figure GDA0003504571840000056
Wherein r isiAnd (3) for the instant return corresponding to each sample, taking gamma as a discount factor, taking 0.75-0.9, and training the weight parameter of the algorithm network structure G by applying a batch gradient descent method.
Preferably, the step S5 includes the following processes: the wireless resource manager converts the current network state vector stAnd the state vector combination of the current base station and the previous T-1 moments is input into an algorithm network structure G, the algorithm network structure G outputs a return value corresponding to each action for each base station main body, and the action corresponding to the maximum return value is selected as a distribution strategy of the current base station and executed.
Preferably, the value of X is 100-500, and N ispreIs 500 to 3000, NtrainThe value of (1) is 1000-5000.
Preferably, the number p of the quadruples is 32.
Preferably, the batch gradient descent method is Adam, and the learning rate is 0.001.
Preferably, the initial value of epsilon in the sub-step S21 is 0, and the initial value is based on each step
ε=εmax-max(0,e-train_step/decay_step) Is increased, wherein epsilonmaxThe value is 0.85-0.95, train _ step is the number of training steps at the current moment, and escape _ step is 2000-4000.
The invention has the beneficial effects that:
(1) the invention utilizes a graph attention mechanism and a long-short term memory mechanism to preprocess the state vector, extracts time and space characteristics, enlarges the receptive field under the condition of limited communication conditions, and strengthens the cooperative cooperation among base stations and the prediction of user behaviors. Through network training, the influence weight of surrounding base stations on the current base station is obtained, the positive influence of effective variables is increased, the negative influence caused by noise and user movement is reduced, and the robustness of the system is enhanced.
(2) According to the method, the state action value function is estimated by using a deep reinforcement learning method, an optimal resource allocation strategy is selected, the reinforcement learning algorithm can generate sample data required by training through interaction with the environment, any experience hypothesis and prior hypothesis on the state action function distribution are not required, the method can adapt to more complex scenes, and the flexibility is better.
(3) Compared with the traditional resource sharing and numerical analysis algorithm, the wireless resource allocation strategy obtained through the cooperation of multiple base stations can obtain a higher system benefit value, namely, the utilization rate of frequency spectrum resources is improved while the basic user service satisfaction rate is ensured, and therefore the user experience is improved.
The features and advantages of the present invention will be described in detail by embodiments in conjunction with the accompanying drawings.
Drawings
FIG. 1 is a flowchart of a multi-base-station cooperative network resource allocation method based on temporal feature extraction reinforcement learning according to the present invention;
fig. 2 shows the system benefit values of the method and some allocation methods of the present invention as they change during the allocation of radio resources, when the parameters specified in the following examples are used.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of the multi-base-station cooperative network resource allocation method based on time feature extraction reinforcement learning of the present invention specifically includes the following steps:
s1, algorithm network structure G and target network
Figure GDA0003504571840000061
The method specifically comprises the following substeps of:
s11, an algorithm network structure G of the method is divided into a state vector coding network Embed, a long-short term memory network LSTM, an image attention machine system network GAT and a depth Q network DQN.
S12, wherein the state vector coding network is composed of two layers of full connection network, and is recorded as
hm=Embed(sm)=σ(Wesm+be),
Wherein We、beIs the weight matrix for that layer, and σ is the "ReLu" activation function. Enhancing N-dimensional state vector s in multi-subject learningm(status vector of mth body) is input into Embed, and K-dimensional encoded vector h is outputm
S13, encoding the current subject m and the subject of the current subject m on the adjacent node in the directed graph into a vector hmAnd hj,
Figure GDA0003504571840000062
(wherein DmA subject set representing the current subject m on adjacent nodes in the directed graph, and Euclidean distance is used as a standard for constructing the directed graph) as an input vector of the graph attention mechanism network, and is used for calculating an attention influence coefficient and carrying out normalization processing on the attention influence coefficient,
emj=ATT(Wshm,Wthj)=(Wshm)T(Wthj), (2)
Figure GDA0003504571840000063
and multiplying the normalized attention influence coefficient by the input vector, calculating the first-layer output of the attention mechanism network of the graph through a formula (4), wherein the value of the multi-head attention mechanism parameter K is 2-20.
Figure GDA0003504571840000064
The three steps of calculating the attention influence coefficient, normalizing, calculating output and the like are represented by the following formulas,
Figure GDA0003504571840000065
it is noted that the mechanical network has two layers, the second layer has the same structure as the first layer and is represented by the following formula,
Figure GDA0003504571840000071
wherein, Ws、Wt
Figure GDA0003504571840000072
Is the weight matrix of the layer and is also the network parameter to be trained.
S14, for the current subject m, combining the outputs of the two layers of GAT which are continuously stepped until the current T times into a sequence
Figure GDA0003504571840000073
Figure GDA0003504571840000074
And
Figure GDA0003504571840000075
as the input vector sequence of the long-short term memory network LSTM, the time characteristics of the sequence are integrated. The long and short term memory network LSTM comprises a plurality of units, wherein one unit comprises three structures of a memory gate, a forgetting gate and an output gate, and the output vector of the previous unit is converted into a vector
Figure GDA0003504571840000076
And
Figure GDA0003504571840000077
and the vector of the current time
Figure GDA0003504571840000078
As input, output
Figure GDA0003504571840000079
And
Figure GDA00035045718400000710
wherein,
Figure GDA00035045718400000711
represents the integrated information of all vectors at the first t-1 moments,
Figure GDA00035045718400000712
representing information associated with the current time instant in the vector at time t-1.
The output of the memory gate is calculated by the formula,
Figure GDA00035045718400000713
the calculation formula output by the forgetting gate is as follows,
Figure GDA00035045718400000714
the output of the output gate is calculated by the formula,
Figure GDA00035045718400000715
the correlation formula of the comprehensive information calculation is
Figure GDA00035045718400000716
Figure GDA00035045718400000717
Wherein, Wi、Wf、Wo、WC、bi、bf、bo、bCThe weight matrix of the layer and the network parameters to be trained are provided, and tanh is an activation function.
The three gates are used as cores for data processing, and the long-short term memory network LSTM finally outputs a vector
Figure GDA00035045718400000718
And
Figure GDA00035045718400000719
s15, the deep Q network DQN is composed of a plurality of layers of full-connection networks, output vectors processed by the two-layer graph attention machine network GAT and the long-short term memory network LSTM are used as the input of the deep Q network DQN, return values of different actions executed under the current state are output, and the action with the highest return is selected and executed to interact with the environment;
s16, after the network structure is defined, a target network is constructed through a weight matrix in a Gaussian distribution random initialization algorithm network
Figure GDA00035045718400000720
The network structure is completely the same as the algorithm network structure G, and the self weight initialization is completed by a method of copying G weight parameters.
S2, performing resource allocation, specifically including the following substeps:
s21, the wireless resource manager obtains the network state vector of each base station at the current t moment, the number of the base stations is
Figure GDA0003504571840000081
Figure GDA0003504571840000082
And the radio resource manager acquires a random number from (0, 1) uniform distribution by adopting an epsilon-greedy algorithm, and if the random number is greater than epsilon, the radio resource manager randomly selects an effective action for each base station. If the random number is less than or equal to epsilon then the radio resource manager will stCombined with the state vectors of the previous T-1 time points, and input to the network G in step S1, each base station will obtain an action with the maximum return value
Figure GDA0003504571840000083
Performing action atThe radio resource manager will receive the system benefit value
Figure GDA0003504571840000084
And observe the network state vector s at the next momentt+1. The initial value of epsilon is 0, and the basis is that the operation is carried out once
ε=εmax-max(0,e-train_step/decay_step) Is increased, wherein epsilonmaxThe value is 0.85-0.95, train _ step is the number of training steps at the current moment, and escape _ step is 2000-4000.
S22, setting two hyper-parameters c by the wireless resource system manager1、c2And a threshold value c3The immediate return is calculated by the following formula,
Figure GDA0003504571840000085
wherein
Figure GDA0003504571840000086
Which represents the mean value of the SSRs of the slices in each base station acquired from the system. Setting c1Is 3 to 6, c2Is 1 to 3, c3The value of (a) is 0.75-1.
S23, the wireless resource manager will (S)t,at,rt,st+1) The quadruplets are stored to a size of
Figure GDA00035045718400000812
Of the cache area
Figure GDA00035045718400000813
In the interior of the container body,
Figure GDA00035045718400000814
the value of (a) is 3000-10000. If it is not
Figure GDA00035045718400000815
And if the space is full, deleting the quadruple stored firstly and storing the latest quadruple by adopting a first-in first-out method.
S3, repeating the resource allocation N of step S2preSub, NpreThe value of (2) is 500-3000, so that the cache area has enough data for training the current network parameters, and the process of training the network G is as follows:
from the buffer
Figure GDA00035045718400000816
Selecting p quadruples as training samples, and using p network state vectors stRespectively combined with the state vectors of T-1 previous moments to obtain a matrix s1,s2,…,sp]TInputting the data into the algorithm network structure G constructed in the step (1) to obtain the return values generated by executing different actions under p states, and respectively selecting [ a ]1,a2,…,ap]TThe corresponding return value is recorded as the predicted return value G(s) under the current network parameters1,a1),G(s2,a2),…,G(sp,ap)。
P network state vectors s in the samplet+1Respectively combined with the state vectors of T-1 previous moments to obtain a matrix
Figure GDA0003504571840000087
And inputs it to step S1 to constructTarget network of
Figure GDA0003504571840000088
In the method, the return values generated by executing different actions under p states are obtained, and the return values are selected and respectively selected
Figure GDA0003504571840000089
These actions correspond to the maximum reported value and are recorded as
Figure GDA00035045718400000810
Figure GDA00035045718400000811
The loss function of the algorithm network structure G is:
Figure GDA0003504571840000091
wherein r isiAnd (3) taking 0.75-0.9 as the instant return corresponding to each sample and gamma as a discount factor, applying a weight parameter of a batch gradient descent method training algorithm network structure G, selecting Adam as an optimizer, and setting the learning rate to be 0.001.
S4, training the algorithm network structure G for X times in each step of finishing S3, wherein X is 100-500, and G network weight parameters are assigned to the target network
Figure GDA0003504571840000092
Implementing a target network
Figure GDA0003504571840000093
And (4) updating.
S5, step S3 executing NtrainNext to, NtrainThe value of (1) is 1000-5000, and the training process of the algorithm network structure G is completed. The wireless resource manager converts the current network state vector stThe state vector combination of the state vector and the previous T-1 moments is input into an algorithm network structure G, the algorithm network structure G outputs a return value corresponding to each action for each base station main body, and the maximum return value is selectedAnd taking the action corresponding to the report value as the allocation strategy of the current base station and executing the action.
On a server configured as shown in table 1, a simulation environment is written in Python language, a network framework is built by keras, and tests are performed by taking 3 different types of services (call, video and ultra-reliable low-delay service) as an example. The system has 19 base stations, i.e. M is 19, and the base stations are arranged in a honeycomb manner, the total bandwidth of each base station is 10M, the distributed granularity is set to be 0.5M, and therefore the total 171 distribution strategy, i.e. the number of effective actions is 171. The discount factor γ is set to 0.9 and the multi-head attention coefficient K is 8. Furthermore,. epsilonmaxThe value is 0.95 and the value of escape _ step is 2000. Buffer area
Figure GDA0003504571840000094
Has a size of 5000, NpreIs 2000, NtrainHas a value of 10000. The optimizer in the batch gradient descent algorithm used by the training algorithm network structure G is Adam, and the learning rate is 0.001. Other parameter cases are as follows:
X=200、c1=5.5、c2=2、c3=0.8、p=32。
TABLE 1 System test platform parameters
Processor with a memory having a plurality of memory cells Intel i9-9900KF 3.6GHZ
Memory device 64G
Display card NVIDIA GTX 2080
Software platform keras 2.2.4
The method of the present invention is compared to some resource allocation methods, including the Hard Slicing algorithm (Hard Slicing), the DQN algorithm, the LSTM-A2C algorithm, and the GAT-DQN algorithm without adding LSTM. Wherein, the hard slicing algorithm is to uniformly distribute the total bandwidth of the base station to each network slice; the LSTM-A2C algorithm is an algorithm that combines long and short term memory networks with deep reinforcement learning. Referring to fig. 2, the variation of the system benefit values obtained by various methods during the radio resource allocation process is shown, wherein the system benefit values represent the average return values of 19 base stations. For ease of analysis, the median value was plotted every 100 steps. The curves in the analysis chart show that in the previous 4000 steps, as the deep reinforcement learning algorithm needs to be trained by network parameters, the return value is relatively larger in fluctuation and lower in median return compared with the average score method. When the network training is finished, namely 4000 steps, the system benefit value of each deep reinforcement learning algorithm is obviously improved, and the method is more excellent and has better system stability and higher system benefit value.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A multi-base station network resource intelligent allocation method based on space-time feature extraction is characterized by comprising the following steps:
s1, algorithm network structure G and target network
Figure FDA0003504571830000011
Building and initializing;
s11, dividing the algorithm network structure G into a state vector coding network Embed, a long-short term memory network LSTM, a graph attention machine network GAT and a depth Q network DQN;
s12, wherein the State vector coding network Embedded is composed of two layers of fully connected networks, and is recorded as
hm=Embed(sm)=σ(Wesm+be) Wherein W ise、beIs the weight matrix of the layer, sigma is the activation function, and the N-dimensional state vector s in the multi-subject reinforcement learningmInputting the vector into a state vector coding network Embedded, and outputting a K-dimensional coded vector hm
S13, encoding the current subject m and the subject of the current subject m on the adjacent node in the directed graph into a vector hmAnd
Figure FDA0003504571830000012
calculating attention influence coefficients as input vectors of a graph attention machine mechanism network GAT, and carrying out normalization processing on the attention influence coefficients, wherein D ismRepresenting a set of subjects of the current subject m on adjacent nodes in the directed graph; multiplying the normalized attention influence coefficient by the input vector to calculate the first layer output of the attention mechanism network GAT of the graph; the attention influence coefficient, the normalization processing and the first layer output are subjected to split charging representation,
Figure FDA0003504571830000013
the second layer output of the graph attention machine mechanism network GAT is
Figure FDA0003504571830000014
S14, for the current subject m, combining the first layer outputs of the until current T continuous time-stepped graph attention device network GAT into a sequence
Figure FDA0003504571830000015
Combining second layer outputs of until current T continuous time-stepping graph attention device networks GAT into a sequence
Figure FDA0003504571830000016
Will be provided with
Figure FDA0003504571830000017
And
Figure FDA0003504571830000018
as the input vector sequence of the long-short term memory network LSTM, integrating the time characteristics of the sequence; the long-short term memory network LSTM is composed of a plurality of units, one unit comprises three structures of a memory gate, a forgetting gate and an output gate, and the output vector of the previous unit is converted into a vector
Figure FDA0003504571830000019
And
Figure FDA00035045718300000110
and the vector of the current time
Figure FDA00035045718300000111
As input, output comprehensive information
Figure FDA00035045718300000112
And
Figure FDA00035045718300000113
the memory gate, the forgetting gate and the output gate are used as the core to process data, and the long-short term memory network LSTM finally outputs a vector
Figure FDA00035045718300000114
And
Figure FDA00035045718300000115
wherein,
Figure FDA00035045718300000116
represents the integrated information of all vectors at the first t-1 moments,
Figure FDA00035045718300000117
representing information related to the current moment in the vector of the t-1 moment;
s15, the deep Q network DQN is composed of multiple layers of fully connected network, and h 'will be output through the first layer of the graph attention mechanism network GAT'mSecond layer output h'mAnd long and short term memory network LSTM processed output vector
Figure FDA00035045718300000118
And
Figure FDA00035045718300000119
the DQN is used as the input of the depth Q network DQN, the return values of different actions executed in the current state are output, and the action with the highest return is selected and executed to interact with the environment;
s16, after the network structure is defined, a target network is constructed through a weight matrix in a Gaussian distribution random initialization algorithm network
Figure FDA00035045718300000120
The network structure is completely the same as the algorithm network structure G, and the self weight initialization is completed by a method of copying G weight parameters;
s2, executing resource allocation;
s3, repeating the resource allocation N of step S2preSecondly, training an algorithm network structure G;
s4, training the algorithm network structure G for X times each time the algorithm network structure G in the step S3 is completed, and assigning the weight parameters of the algorithm network structure G to the target network
Figure FDA0003504571830000021
Implementing a target network
Figure FDA0003504571830000022
Updating of (1);
s5, step S3 executing NtrainAnd finishing the training process of the algorithm network structure G.
2. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the calculation formula of the influence coefficient of the attention force in the substep S13 is emj=ATT(Wshm,Wthj)=(Wshm)T(Wthj) The formula for normalizing the attention influence coefficient is
Figure FDA0003504571830000023
The formula for calculating the first layer output of the attention mechanism network of the graph is
Figure FDA0003504571830000024
Wherein, Ws、Wt
Figure FDA0003504571830000025
The weight matrix of the layer is also the network parameter to be trained.
3. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the calculation formula of the memory gate in the step S14
Figure FDA0003504571830000026
The calculation formula of the forgetting door is
Figure FDA0003504571830000027
The calculation formula of the output gate is
Figure FDA0003504571830000028
The correlation formula of the comprehensive information calculation is
Figure FDA0003504571830000029
Wherein, wi,wf、wo、wC.bi、bf、bo,bCThe weight matrix of the layer and the network parameters to be trained are provided, and tanh is an activation function.
4. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1,
the step S2 includes the following substeps:
s21, the wireless resource manager obtains the network state vector of each base station at the current time t, the number of the base stations is M,
Figure FDA00035045718300000210
obtaining a random number from the (0, 1) uniform distribution, if the random number is larger than epsilon, the wireless resource manager randomly selects an effective action for each base station; if the random number is less than or equal to epsilon, the radio resource manager will stCombined with the state vectors of the previous T-1 time points, and input to the network G in step S1, each base station will obtain an action with the maximum return value
Figure FDA00035045718300000211
Performing action atThe radio resource manager will receive the system benefit value
Figure FDA00035045718300000212
And observe the network state vector s at the next momentt+1
S22, setting two hyper-parameters c by the wireless resource system manager1、c2And a threshold value c3The real-time report is calculated,
Figure FDA0003504571830000031
Figure FDA0003504571830000032
wherein
Figure FDA0003504571830000033
Represents the mean value of the SSR slices in each base station acquired from the system, wherein c1Is 3 to 6, c2Is 1 to 3, c3The value of (a) is 0.75-1;
s23, the wireless resource manager will (S)t,at,rt,st+1) The quadruplets are stored to a size of
Figure FDA0003504571830000034
Of the cache area
Figure FDA0003504571830000035
Therein, the
Figure FDA0003504571830000036
3000 to 10000.
5. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the step S3 includes the following processes: from the buffer
Figure FDA0003504571830000037
Selecting p quadruples as training samples, and using p network state vectors stRespectively combined with the state vectors of T-1 previous moments to obtain a matrix s1,s2,…,sp]TInputting the data into the algorithm network structure G constructed in the step S1, obtaining the return values generated by executing different actions under p states, and respectively selecting [ a [ [ a ]1,a2,…,ap]TThe corresponding return value is recorded as the predicted return value G(s) under the current network parameters1,a1),G(s2,a2),…,G(sp,ap) P network state vectors s in the samplet+1Respectively with itCombining the state vectors of previous T-1 moments to obtain a matrix
Figure FDA0003504571830000038
It is input to the target network constructed in step S1
Figure FDA0003504571830000039
In the method, the return values generated by executing different actions under p states are obtained and respectively selected
Figure FDA00035045718300000310
Corresponding to the maximum reported value, record as
Figure FDA00035045718300000311
The loss function of the algorithmic network structure G is
Figure FDA00035045718300000312
Wherein r isiAnd (3) for the instant return corresponding to each sample, taking gamma as a discount factor, taking 0.75-0.9, and training the weight parameter of the algorithm network structure G by applying a batch gradient descent method.
6. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the step S5 includes the following processes: the wireless resource manager converts the current network state vector stAnd the state vector combination of the current base station and the previous T-1 moments is input into an algorithm network structure G, the algorithm network structure G outputs a return value corresponding to each action for each base station main body, and the action corresponding to the maximum return value is selected as a distribution strategy of the current base station and executed.
7. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 1, wherein: the value of X is 100-500, and N ispreIs 500 to 3000, NtrainThe value of (1) is 1000-5000.
8. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 5, wherein: the number p of the quadruples is 32.
9. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 5, wherein: the batch gradient descent method is Adam, and the learning rate is 0.001.
10. The method for intelligently allocating network resources of multiple base stations based on spatio-temporal feature extraction as claimed in claim 4, wherein: in the sub-step S21, the initial value of epsilon is 0, and each step is executed according to the condition that g is gmax-max(0,e-train-step/decay_step) Is increased, wherein epsilonmaxThe value is 0.85-0.95, train _ step is the number of training steps at the current moment, and escape _ step is 2000-4000.
CN202111118071.8A 2021-09-24 2021-09-24 Multi-base-station network resource intelligent allocation method based on space-time feature extraction Active CN113811009B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111118071.8A CN113811009B (en) 2021-09-24 2021-09-24 Multi-base-station network resource intelligent allocation method based on space-time feature extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111118071.8A CN113811009B (en) 2021-09-24 2021-09-24 Multi-base-station network resource intelligent allocation method based on space-time feature extraction

Publications (2)

Publication Number Publication Date
CN113811009A CN113811009A (en) 2021-12-17
CN113811009B true CN113811009B (en) 2022-04-12

Family

ID=78896400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111118071.8A Active CN113811009B (en) 2021-09-24 2021-09-24 Multi-base-station network resource intelligent allocation method based on space-time feature extraction

Country Status (1)

Country Link
CN (1) CN113811009B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615183B (en) * 2022-03-14 2023-09-05 广东技术师范大学 Routing method, device, computer equipment and storage medium based on resource prediction
CN117313551A (en) * 2023-11-28 2023-12-29 中国科学院合肥物质科学研究院 Radionuclide diffusion prediction method and system based on GAT-LSTM
CN118093057B (en) * 2024-04-24 2024-07-05 武汉攀升鼎承科技有限公司 Notebook computer system resource optimization method and system based on user using habit

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111212019A (en) * 2018-11-22 2020-05-29 阿里巴巴集团控股有限公司 User account access control method, device and equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111182637B (en) * 2019-12-24 2022-06-21 浙江大学 Wireless network resource allocation method based on generation countermeasure reinforcement learning
CN112749005B (en) * 2020-07-10 2023-10-31 腾讯科技(深圳)有限公司 Resource data processing method, device, computer equipment and storage medium
CN112396492A (en) * 2020-11-19 2021-02-23 天津大学 Conversation recommendation method based on graph attention network and bidirectional long-short term memory network
CN112512070B (en) * 2021-02-05 2021-05-11 之江实验室 Multi-base-station cooperative wireless network resource allocation method based on graph attention mechanism reinforcement learning
CN113051822B (en) * 2021-03-25 2024-09-24 浙江工业大学 Industrial system anomaly detection method based on graph attention network and LSTM automatic coding model

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111212019A (en) * 2018-11-22 2020-05-29 阿里巴巴集团控股有限公司 User account access control method, device and equipment

Also Published As

Publication number Publication date
CN113811009A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN113811009B (en) Multi-base-station network resource intelligent allocation method based on space-time feature extraction
CN110334201B (en) Intention identification method, device and system
CN112512070B (en) Multi-base-station cooperative wireless network resource allocation method based on graph attention mechanism reinforcement learning
CN111339433B (en) Information recommendation method and device based on artificial intelligence and electronic equipment
US20210097646A1 (en) Method and apparatus for enhancing video frame resolution
KR20190119548A (en) Method and apparatus for processing image noise
CN106448670A (en) Dialogue automatic reply system based on deep learning and reinforcement learning
Zhang et al. Optimization of image transmission in cooperative semantic communication networks
CN113852432B (en) Spectrum Prediction Sensing Method Based on RCS-GRU Model
CN107943583A (en) Processing method, device, storage medium and the electronic equipment of application program
CN116775807A (en) Natural language processing and model training method, equipment and storage medium
CN112183742A (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN112766467B (en) Image identification method based on convolution neural network model
CN110633735B (en) Progressive depth convolution network image identification method and device based on wavelet transformation
CN114490618A (en) Ant-lion algorithm-based data filling method, device, equipment and storage medium
CN117350304A (en) Multi-round dialogue context vector enhancement method and system
Kaushik et al. Traffic prediction in telecom systems using deep learning
Zhang et al. Machine learning based protocol classification in unlicensed 5 GHz bands
CN117177279A (en) Multi-user multi-task computing unloading method, device and medium containing throughput prediction
CN114449536B (en) 5G ultra-dense network multi-user access selection method based on deep reinforcement learning
CN111813538A (en) Edge computing resource allocation method
CN112633491A (en) Method and device for training neural network
Usha et al. Dynamic spectrum sensing in cognitive radio networks using ML model
CN112906640B (en) Space-time situation prediction method and device based on deep learning and readable storage medium
CN114363671A (en) Multimedia resource pushing method, model training method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant