CN117640417A - Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG - Google Patents

Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG Download PDF

Info

Publication number
CN117640417A
CN117640417A CN202311658595.5A CN202311658595A CN117640417A CN 117640417 A CN117640417 A CN 117640417A CN 202311658595 A CN202311658595 A CN 202311658595A CN 117640417 A CN117640417 A CN 117640417A
Authority
CN
China
Prior art keywords
conflict
network
internet
dense
resource allocation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311658595.5A
Other languages
Chinese (zh)
Inventor
黄杰
李幸星
杨凡
杨成
喻涛
张仕龙
姚凤航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN202311658595.5A priority Critical patent/CN117640417A/en
Publication of CN117640417A publication Critical patent/CN117640417A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of D2D-assisted ultra-dense Internet of things, and particularly discloses an ultra-dense Internet of things resource allocation method and system based on GCN-DDPG, comprising the following steps: constructing a communication model of the ultra-dense Internet of things; establishing a conflict graph according to the communication model; simplifying the conflict graph into a conflict hypergraph; and performing resource allocation on the ultra-dense Internet of things by adopting a deep reinforcement learning model based on a graph convolution neural network, namely a GCN-DDPG model based on the conflict hypergraph. The invention analyzes the resource multiplexing conflict relation by constructing a conflict graph model, converts the conflict graph model into a conflict hypergraph model, simultaneously analyzes the conflict relation among a plurality of transmission links (transmission link, TL), and converts the resource allocation problem into the vertex strong coloring problem of the hypergraph. Finally, a conflict-free resource allocation strategy based on a graph roll reinforcement learning algorithm (GCN-DDPG) is provided. Experimental results show that compared with a comparison algorithm, the algorithm has higher network resource reuse rate and throughput, and has better performance in D2D-assisted UD-IoE.

Description

Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG
Technical Field
The invention relates to the technical field of D2D-assisted ultra-dense Internet of things, in particular to an ultra-dense Internet of things resource allocation method and system based on GCN-DDPG.
Background
With the development of wireless communication technology, the number of internet of everything (IoE, internet of Everything) is rapidly growing, and the number of devices (IoED, internet ofEverything Devices) of the internet of things is rapidly increasing, which means that the internet of things will evolve into Ultra-dense internet of things (UD-IoE, ultra-Dense Internet of Everything) in the future, and challenges such as limited network throughput and limited spectrum resource utilization are faced. Device-to-Device (D2D) communication may be used to mitigate challenges such as limited network throughput and resource utilization in UD-IoE. D2D communication supports direct communication between two ioeds and shares resources between two devices. However, when the communication ranges between ioeds overlap and the same channel is used for transmission, serious interference phenomenon occurs. Therefore, in the D2D-assisted UD-IoE, an effective resource management method is required to be relied on, so that network performance is guaranteed, and serious interference is avoided.
Aiming at D2D-assisted UD-IoE resource management, the existing research focuses on network throughput performance, and an optimization model is built and solved by adopting an optimization method. However, the above method is highly dependent on the accuracy of the optimization model, and the calculation amount increases very rapidly with the network scale, so that it is difficult to adapt to an ultra-dense environment with a huge amount of IoE devices.
With the development of artificial intelligence (AI, artificial intelligence), machine learning has become a very effective technique for handling large amounts of data, high computational tasks, and mathematically complex nonlinear non-convex problems. Currently, more and more researchers apply reinforcement learning to resource management and allocation of wireless communication systems. However, the existing research does not consider the problem of mass hidden terminal interference in the intensive deployment scene of the ultra-intensive internet of things. In the ultra-dense Internet of things, the massive IoED dense deployment enables the communication range among the terminals to be densely overlapped, so that a large amount of potential hidden terminal interference is caused, the possibility of resource multiplexing conflict is greatly improved, and the difficulty of resource management in the ultra-dense Internet of things is seriously increased.
Disclosure of Invention
The invention provides a GCN-DDPG-based ultra-dense Internet of things resource allocation method and a system, which solve the technical problems that: how to effectively avoid resource conflict between communication links in D2D-assisted UD-IoE.
In order to solve the technical problems, the invention provides an ultra-dense Internet of things resource allocation method based on GCN-DDPG, which comprises the following steps:
s1, constructing a communication model of the ultra-dense Internet of things;
the communication model comprises a D2D-assisted UD-IoE layer and a BBU pool, wherein D2D represents end-to-end, UD-IoE represents ultra-dense Internet of things, BBU represents a baseband unit, the UD-IoE layer comprises N IoED, M communication links and a plurality of RRHs, ioED represents Internet of things equipment, and RRHs represent remote radio heads; each RRH is connected to the BBU pool by a high-speed forward link, responsible for providing primary coverage and secondary access; the IoED adopts a D2D communication mode, expands the communication range according to a multi-hop communication link, and finally is connected with a BBU pool through RRH of the UD-IoE layer; the BBU pool consisting of BBUs and computing servers collects all environmental information in the network and distributes resources to communication links;
s2, establishing a conflict graph according to the communication model;
s3, according to a maximum group generation theorem and a greedy algorithm, all the maximum groups in the conflict graph are obtained, and according to the relationship between the superside and the maximum groups, under the condition that the adjacent relationship of vertexes is kept unchanged, the conflict graph is simplified into a conflict supergraph;
s4, performing resource allocation on the ultra-dense Internet of things by adopting a deep reinforcement learning model based on a graph convolution neural network, namely a GCN-DDPG model based on the conflict hypergraph;
the GCN-DDPG model comprises an actor network and a criticizer network;
constructing a two-layer graph convolution neural network in the actor network, inputting an adjacent matrix and a feature matrix of the conflict hypergraph, wherein the feature matrix is a resource allocation matrix, namely a vertex dyeing matrix, then obtaining node features, and selecting proper colors according to the node features, namely performing resource allocation to obtain a colored feature matrix;
and constructing a two-layer graph convolution neural network in the criticizing network, inputting the adjacency matrix of the conflict hypergraph and the colored feature matrix, and outputting a node feature, namely evaluating the selected action.
Further, in the step S4, the GCN-DDPG model further includes an actor target network and a criticizer target network;
in the training process, the BBU pool is used as an agent, and the agent takes the current state s t Input into the actor network to obtain action a t The agent performs a in the environment t =μ(s tμ ) μ () represents the actor network, θ μ Parameters representing the actor network and then obtaining a timely prize r t And obtains the next state s t+1 And(s) t ,a t ,r t ,s t+1 ) Stored in an experience replay buffer for further training; when the experience pool is full, the agent randomly selects Nc data to form a small batch of data, where the state-action of the nth data (s n ,a n ) Is of the estimated value Q(s) n ,a nQ ) Is obtained from the criticizing network, θ Q Parameters representing the criticizing network;
state-action(s) n ,a n ) Is calculated from the following equation:
wherein, gamma is a discount factor,representing the criticizing target network, +.>Parameters representing the target network of the criticizer, < >>Representing the actor target network, +.>Parameters representing the actor target network.
Further, training the criticizing network by minimizing the mean square error, the mean square error L (θ Q ) Calculated from the following formula:
updating parameters of the criticizing network through a gradient descent method; by reacting L (θ) Q ) Middle theta Q Differentiation to obtain L (θ) Q ) Is a gradient of (a).
Further, training the actor network by maximizing the expected revenue of the initial distribution of environmental states, the expected revenue of the initial distribution of environmental states J (θ μ ) Calculated from the following formula:
updating by gradient descent methodParameters of the actor network; calculation of J (θ) by differential chain law μ ) Is a gradient of (2);
updating the parameters of the criticizing target network and the parameters of the actor target network to obtain updated parameters of the criticizing target network and parameters of the actor network respectively.
Further, in the actor network or the criticizing network, the operations of the graph convolution neural network are expressed as:
f(X,A)=σ(AReLu(AXW 0 )W 1 )
wherein the matrix is customized for a simplified formA=a+i, a represents the adjacency matrix of the collision hypergraph, I represents the identity matrix, d=Σ j A ij Represents a diagonal matrix, A ij Elements representing row A, row I, column j, W k The weight matrix representing the k+1 layer, k=0, 1, σ (·) is an activation function, reLu () represents the ReLu activation function, and X represents the feature matrix.
Further, the node H in each layer is updated according to the following equation k
Further, the conflict hypergraph is denoted as G H ={V H ,E H }, wherein V H Is a set of vertices, the vertices representing communication links, E H Is a superset representing conflicting relationships between communication links, E H Any superside of is V H The vertices in the superside have the same relationship with other vertices, G H The relation between any vertex v and any superside e is represented by an association matrix H, and the values of the elements H (v, e) of any row and any column of the H are as follows:
where h (v, e) =1 indicates that vertex v is associated with superside e, i.e., superside e contains vertex v.
Further, the step S2 specifically includes the steps of:
s21, establishing a conflict graph of the communication model, wherein the conflict graph is expressed as:
G C =(V C ,E C )
wherein V is C ={e 1 ,e 2 ,...,e M ' is a collection of vertices representing a communication link, E C Is a set of edges representing resource conflict relationships between communication links, V C Vertex sum E of (2) C The relationship between edges in (a) is defined by an adjacency matrix G C The representation is:
wherein the adjacency matrix G C Element b of the nth row and mth column of the medium nm =(e n ,e m ) The values are as follows:
s22, simplifying the adjacent matrix G C The method comprises the following steps:
wherein,will be denoted G C1 The main diagonal element of (1) is set to 0, G C1 Adjacency matrix representing conflict graph of direct conflict of records, I is identity matrix,/is->Will be denoted G C2 Is the main diagonal of (2)The line element is set as 0,G C2 An adjacency matrix representing a conflict graph recording hidden terminal conflicts; direct collision means that two communication links use the same channel at the same time and have the same transmit or receive IoED; a hidden terminal collision means that two communication links use the same channel at the same time, and that the sending or receiving IoED of one IoED pair is within the communication range of the ioeds of the other IoED pair.
The invention also provides an ultra-dense Internet of things resource allocation method based on GCN-DDPG, which is characterized in that: the method comprises a processing module, wherein the processing module is used for executing the steps S1-S4 in the method.
According to the GCN-DDPG-based ultra-dense Internet of things resource allocation method and system, the resource multiplexing conflict relation is analyzed by constructing the conflict graph model, the conflict graph model is converted into the conflict hypergraph model, the conflict relation among a plurality of transmission links (transmission link, TL) is analyzed at the same time, and the resource allocation problem is converted into the vertex strong coloring problem of the hypergraph. And finally, providing a conflict-free resource allocation strategy based on a graph convolution reinforcement learning algorithm. In particular, the beneficial effects are that:
1) Aiming at the problem of hidden terminal interference in D2D-assisted UD-IoE, the resource multiplexing conflict relation among the TL is analyzed, and the conflict types among the TL are classified. Then, based on the resource conflict relation between UD-IoE layers, a conflict graph model is established, a matrix transformation method is designed to construct a conflict graph, and the conflict relation between logic layers is intuitively reflected.
2) The method aims at solving the problem that the conflict graph model can not analyze resource conflicts among a plurality of TL at the same time, converts the conflict graph model into a conflict hypergraph through the biggest group and hypergraph theory, converts the conflict-free resource allocation problem into the vertex coloring problem of the hypergraph, and designs a method for calculating the hypergraph coloring.
3) Aiming at the hypergraph vertex coloring problem, a depth deterministic strategy gradient (GCN-DDPG) algorithm/model based on graph convolution is provided. The algorithm adopts an Actor-criticizer network learning D2D-assisted UD-IoE resource allocation process, and dynamically adjusts a resource allocation scheme according to sample data in an experience playback pool. Under the condition of ensuring that UD-IoE has no conflict, the resource multiplexing rate is improved. Simulation results show that the network throughput of UD-IoE is improved by the algorithm.
4) Simulation results show that the GCN-DDPG has higher resource multiplexing rate and network throughput.
Drawings
Fig. 1 is a diagram of an exemplary scenario of an ultra-dense internet of things provided by an embodiment of the present invention;
fig. 2 is a diagram illustrating D2D communication provided by an embodiment of the present invention;
FIG. 3 is an illustration of a conflict graph provided by an embodiment of the present invention;
FIG. 4 is an illustration of a conflict superpositioned graph provided by an embodiment of the invention;
FIG. 5 is a schematic diagram of a GCN-DDPG-based resource conflict-free allocation algorithm provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of a graph roll-up neural network provided by an embodiment of the present invention;
FIG. 7 is a graph comparing the maximum network throughput of UD-IoE under four algorithms provided by an embodiment of the present invention;
FIG. 8 is a graph comparing the maximum network throughput of UD-IoE under four algorithms provided by an embodiment of the present invention;
FIG. 9 is a graph showing a comparison of resource multiplexing rates of UD-IoE under four algorithms provided by an embodiment of the present invention;
fig. 10 is a graph comparing signal-to-interference-and-noise ratios (SINR) of UD-IoE under four algorithms provided by an embodiment of the present invention.
Detailed Description
The following examples are given for the purpose of illustration only and are not to be construed as limiting the invention, including the drawings for reference and description only, and are not to be construed as limiting the scope of the invention as many variations thereof are possible without departing from the spirit and scope of the invention.
In order to avoid resource conflict between communication links in D2D-assisted UD-IoE as effectively as possible, the embodiment of the invention provides an ultra-dense Internet of things resource allocation method based on GCN-DDPG, which specifically comprises the following steps:
s1, constructing a communication model of the ultra-dense Internet of things;
s2, establishing a conflict graph according to the communication model;
s3, according to a maximum group generation theorem and a greedy algorithm, all the maximum groups in the conflict graph are obtained, and according to the relationship between the superside and the maximum groups, the conflict graph is simplified into a conflict supergraph under the condition that the adjacent relationship between the vertexes is kept unchanged;
and S4, performing resource allocation on the ultra-dense Internet of things by adopting a deep reinforcement learning model based on a graph convolution neural network, namely a GCN-DDPG model based on the conflict hypergraph.
(1) Step S1: communication model
The scene is ultra-dense Internet of things, and comprises a baseband unit (BBU) pool and a D2D auxiliary UD-IoE layer, as shown in figure 1. UD-IoE contains N loeds ({ loED 1, loED 2, …, loED N }) with M TL ({ TL1, TL2, …, TL M }) between the loeds, and multiple Remote Radio Heads (RRH). The RRH is connected with the BBU pool through a high-speed transmission link, and the RRU is responsible for basic coverage and auxiliary access. The IoED adopts a D2D communication mode, further expands the communication range according to the multi-hop TL, and finally is connected with the BBU pool through the RRH of the UD-IoE layer. The BBU pool consists of BBUs and servers, and is responsible for collecting all environment information in the network and distributing resources to TL in UD-IoE. In the ultra-dense internet of things, massive IoED dense deployment enables the communication range between terminals to be densely overlapped and covered, so that a large amount of potential hidden terminal interference is caused. Therefore, the problem of the interference of the hidden terminals in UD-IoE is mainly considered in the embodiment.
In this example, let D ij Representing the euclidean distance between any two ioeds i and j, it is assumed that all ioeds have the same transmit power and a rayleigh fading model is employed. S is S ij The signal sent by the sending end IoED j is received for the receiving end IoEDi. The signal to noise ratio (SINR) of IoED i can be expressed as:
wherein N is e K represents interference of other IoED transmissions, which is ambient noise.
When the SINR reaches a certain threshold s th When the receiving end j can decode normally, the transmission rate can be expressed as:
where B is the channel bandwidth.
Assuming that all ioeds have the same transmission power P and the same radius R for communication T Then expressed asAlpha is the path attenuation factor.
(2) Step S2: establishing conflict graph model
The graph model may be represented by g= { V, E } where v= { V 1 ,v 2 ,…,v N And is a vertex set, representing all ioeds. E= { E 1 ,e 2 ,…,e M And is an edge set, representing TL. The relationship of vertices to edges is expressed as:
when (e) m ,v n ) =1 means that vertex vn is associated with edge em, i.e. IoED can communicate with other ioeds connected through TL, otherwise it cannot. Therefore, the association matrix H can be written:
in the association matrix, a mn Representation (e) m ,v n ) Is a value of (a).
In the graph model, two resource conflict-free situations exist in D2D communication, which are respectively: 1) The vertex is not associated with any edge, i.e., it is an isolated vertex. 2) There are only two vertices and one edge, which means that the TL does not collide with other TL when allocating resources. In both cases, communication resources may be allocated to TL. To facilitate analysis of the massive hidden terminal interference problem for dense deployments of ultra-dense internet of things, fig. 2 gives an example, which includes 20 ioeds and 32 TL.
To avoid resource conflicts between TL in D2D-assisted UD-IoED, the potential conflict types are classified for analysis as follows.
Direct conflict: TL1 and TL2 would collide directly if the same channel were used at the same time. Examples: in fig. 2, ioED1 and IoED3 send information to IoED2 at the same time and then directly collide.
Hiding terminal conflict: TL1 and TL3 may experience hidden terminal collision if one channel is used at a time. Examples: in fig. 2, ioED1 sends a message to IoED2, and IoED3 sends a message to IoED4, so that hidden terminal collision may occur.
The direct conflict problem is typically edge coloring in nature and can be solved by an edge coloring algorithm. However, the hidden terminal collision problem is not suitable to be solved with typical edge coloring algorithms. Therefore, it is necessary to further build a conflict model and analyze.
Let G c ={V c ,E c ' represents a conflict graph model, where V c =E={e 1 ,e 2 ,…,e m All TL sets, E c An edge set representing resource conflict relationships between TL. The adjacency matrix between the vertexes and the edges is formed by a matrix G c The representation is:
wherein b nm =(e n ,e m ) If e n And e m B if there is conflict nm =1, otherwise 0.
In D2D-assisted UD-IoE, the reachability of TL i and TL indicates that one path exists from TL i to TL j, and the path length is the number of vertices in the path. If there are many paths between TL i and TL j, but any one of them collides, they cannot share resources. Therefore, the number of paths that cause collisions between TL need not be considered in analyzing the collision relationship. Depending on reachability, TL that are in direct conflict with each other can be reached in one step. TL that collide with each other for hidden terminals can be reached in two steps. The direct conflict calculation method is as follows:
wherein G is C1 Adjacency matrix of conflict graph for recording direct conflict, G I For the correlation matrix of the graph model,is G I Is a transposed matrix of (a). As indicated by the letter, ", the elements c and d were operated on as follows:
G C1 the middle main diagonal element is 1. To obtain a direct conflicting relationship between different TL, the main diagonal element is set to 0, i.e.:
i represents an identity matrix.
Similar to equation (6), the method for computing the hidden terminal conflict is as follows:
g is as in formula (8) C2 The main diagonal element in (1) is set to 0, i.e
From equations (8) and (10), an adjacency matrix G of the conflict graph is obtained C The method comprises the following steps:
according to G C The conflict graph of fig. 2 may be found as shown in fig. 3.
(3) Step S3: construction of conflict hypergrams
Fig. 3 shows the collision relationship between TL. Since the edges in the conflict graph include only 2 nodes, it is difficult to reflect the resource conflict relationship of multiple TL. Due to the dense deployment of massive terminals, resource conflicts of multiple TL are the main cause of hidden terminal interference. Therefore, it is necessary to further simplify the conflict graph so that it keeps the conflict relationship between vertices unchanged and simultaneously analyze the conflict relationship between a plurality of TL.
In order to analyze the conflict relationship between multiple TL simultaneously, the conflict graph is simplified to a conflict hypergraph based on the clique and hypergraph theory. Let G H ={V H ,E H ' represents a conflict hypergraph, V H Representing the set of vertices, E H Representing a superset. All vertices in the superside have the same relationship. The hypergraph can be represented by an association matrix H, and the values of elements in H are as follows:
h (v, e) =1 means that vertex v is on superside e, i.e. v and e are associated. For any vertex v 1 ,v 2 ∈e,v 1 And v 2 Adjacent, v 1 Is denoted as N (v).
Due to conflict graph G c ={V c ,E c The nature of the high-density connections between vertices in the graph, a number of complete subgraphs G can be generated s ={V s ,E s }. Each complete subgraph is a cluster with all vertices adjacent to each other, which can be represented by a superside. Thus, according to the relationship between the superside and the maximum massThe conflict graph can be simplified into a conflict hypergraph while keeping the vertex adjacency unchanged. All the biggest cliques in the conflict graph can be solved according to the biggest clique generation theorem and a greedy algorithm, and the conflict graph is converted into a conflict hypergraph, as shown in fig. 4. In fig. 4, vertices represent TL, and supersides represent conflicting relationships between them. Accordingly, collision-free resource allocation can translate into a vertex strong coloring problem for collision hypergraphs.
(4) Step S4: conflict-free resource allocation model
The problem of strong coloring of vertices in conflict hypergraphs is solved. The method provides a reinforcement learning resource conflict-free allocation algorithm (GCN-DDPG) based on graph convolution, constructs a resource allocation MDP model considering two types of conflicts of the ultra-dense Internet of things, and solves an optimal resource allocation strategy pi by combining a graph convolution reinforcement learning depth deterministic strategy gradient algorithm *
Aiming at the problem of strong coloring of the hypergraph vertex, firstly, the hypergraph vertex is converted into a sequence decision problem through a Markov decision process, a BBU pool is used as an agent, and the agent is combined with a depth deterministic strategy gradient algorithm, so that the agent continuously searches for the hypergraph vertex coloring scheme through training. The specific definitions for state S, action a, rewarding function are as follows:
1) State S: s= { H, K t Where H is the correlation matrix of the collision hypergraph and Kt represents the allocation of all TL's.
2) Action a: the intelligent agent makes an observation on the current state according to the current state and makes a set of corresponding communication link resource allocation, namely an action set.
3) Awards R: and the intelligent agent executes corresponding actions under the current state to obtain corresponding returns.
As shown in fig. 5, the GCN-DDPG algorithm structure proposed by the present example algorithm is composed of an actor network (actor network) for generating actions and a criticism network (criticism network) for evaluating the performance of generating actions. The algorithm creates four networks for improving stability and convergence, namely an actor network, an actor-target network, a critic network and a criticism-target network, and updates the target network by using soft update.
The specific algorithm comprises the following steps: first, the agent will be in the current state s t Input into an actor network to obtain action a t =μ(s tμ ) Wherein μ () represents an actor network, θ μ Representing parameters in the actor network. Agent performs a in the environment t Then get the timely rewards r t And obtains the next state s t+1 . Will(s) t ,a t ,r t ,s t+1 ) Stored in the experience replay buffer for further training. When the experience pool is full, nc data is randomly selected to make up the small batch of data. Ream(s) n ,a n ,r n ,s n+1 ) Is the nth group of data in the small batch of data. State-action(s) n ,a n ) Is obtained from the critic network using Q (s n ,a nQ ) Represented by Q (s, a) represents a critic network, θ Q Representing parameters in the critic network. State-action(s) n ,a n ) The target value of (2) can be found based on the bellman equation, namely:
where gamma is the discount factor and where gamma is the discount factor,represents the critic-target network, < ->Represents the critic-target network parameter, < +.>Representing an actor-target network, +.>Representing the actor-target network parameters.
The critic network may be trained by minimizing the Mean Square Error (MSE) loss function, i.e.:
gradient of critic network can be achieved by the gradient of the gradient to L (θ Q ) Middle theta Q Differentiation is carried out, namely:
the Actor network may be trained by maximizing the expected revenue of the initial distribution of environmental states, namely:
and updating parameters of the Actor network through the gradient.
Then, the target network is updated by using soft update, that is, the parameters of the criticizing target network and the parameters of the actor target network are respectively the updated parameters of the criticizing target network and the parameters of the actor network.
Two layers of GCN (graph convolutional neural network) models are built in an actor network and a critic network to acquire characteristics, and the models can be expressed as:
f(X,A)=σ(AReLu(AXW 0 )W 1 ) (18)
wherein the method comprises the steps ofA=a+i, a represents the adjacency matrix of one graph, and I represents the identity matrix. Sigma (&) and ReLu () are activation functions, W k The weight matrix of the k+1 layer is represented.
For one graph adjacency matrix A and feature matrix X, the feature matrix is a resource allocation matrix, i.e. a vertex staining matrix. In a graph convolutional neural network, nodes in each layer are updated according to the following propagation rules:
where a=a+i, σ (·) is an activation function, D represents a diagonal matrix, where d= Σ j A ij 。W k The weight matrix representing the k-th layer, the values of which follow the network updates.
Constructing two layers of convolution layers in an actor network, inputting an adjacency matrix and a feature matrix of a hypergraph, obtaining node features, and selecting proper colors according to the node features, namely, performing resource allocation; in the critic network, a graph convolution neural network is also established, an adjacency matrix of the hypergraph and a dyed feature matrix are input, and a node feature is output, namely, the evaluation of the selection action is performed. The specific flow is shown in fig. 6.
In order to facilitate implementation, the embodiment of the invention also provides an ultra-dense Internet of things resource distribution system based on GCN-DDPG, which is provided with a processing module for executing steps S1-S4 in the method.
According to the GCN-DDPG-based ultra-dense Internet of things resource allocation method and system provided by the embodiment of the invention, the resource multiplexing conflict relation is analyzed by constructing the conflict graph model, the conflict graph model is converted into the conflict hypergraph model, the conflict relation among a plurality of transmission links (transmission link, TL) is analyzed at the same time, and the resource allocation problem is converted into the vertex strong coloring problem of the hypergraph. And finally, providing a conflict-free resource allocation strategy based on a graph convolution reinforcement learning algorithm. In particular, the beneficial effects are that:
1) Aiming at the problem of hidden terminal interference in D2D-assisted UD-IoE, the resource multiplexing conflict relation among the TL is analyzed, and the conflict types among the TL are classified. Then, based on the resource conflict relation between UD-IoE layers, a conflict graph model is established, a matrix transformation method is designed to construct a conflict graph, and the conflict relation between logic layers is intuitively reflected.
2) The method aims at solving the problem that the conflict graph model can not analyze resource conflicts among a plurality of TL at the same time, converts the conflict graph model into a conflict hypergraph through the biggest group and hypergraph theory, converts the conflict-free resource allocation problem into the vertex coloring problem of the hypergraph, and designs a method for calculating the hypergraph coloring.
3) Aiming at the hypergraph vertex coloring problem, a depth deterministic strategy gradient (GCN-DDPG) algorithm/model based on graph convolution is provided. The algorithm adopts an Actor-criticizer network learning D2D-assisted UD-IoE resource allocation process, and dynamically adjusts a resource allocation scheme according to sample data in an experience playback pool. Under the condition of ensuring that UD-IoE has no conflict, the resource multiplexing rate is improved. Simulation results show that the network throughput of UD-IoE is improved by the algorithm.
Simulation results and performance analysis were performed below.
The experiment adopts a hardware platform of PC, CPU is Intel (R) Xeon (R) Gold 6242R CPU@3.10GHz,GPU is NVIDIA RTX 3080Ti, and the memory is 64G. In the embodiment, simulation is performed in an ultra-dense Internet of things scene, scene parameters are shown in table 1, simulation experiments are performed according to the parameter settings of table 1, and comparative experiment data of network throughput and resource multiplexing rate performance are obtained respectively. The method is characterized in that the network throughput and the resource multiplexing rate of the algorithm (GCN-DDPG) are compared with those of other three algorithms, wherein the three algorithms are a resource allocation (RM) algorithm based on random matching, a resource allocation (GA) algorithm based on greedy matching and a resource allocation (MND) algorithm based on maximum node degree matching.
Table 1 list of simulation experiment parameters
1) Network throughput: the performance index can evaluate the network throughput of UD-IoE after the resource allocation algorithm allocates all the communication link resources, and can be expressed as follows:
where B is the spectrum bandwidth of the communication link,for average signal-to-interference-and-noise ratio, P b Is the maximum tolerable error rate.
2) Resource multiplexing rate: the performance index can evaluate the communication resource multiplexing rate of the resource allocation algorithm after all the UD-IoE communication links are free from interference, and the communication resource multiplexing rate can be expressed as follows:
wherein eta * The amount of communication resources ultimately used.
3) Signal-to-interference-and-noise ratio: the ratio of the signal to the sum of interference and noise in the system is calculated as shown in formula (1).
Fig. 7 is a graph of UD-IoE maximum network throughput versus, where maximum throughput refers to the throughput of the system without interference. Fig. 7 shows that the network throughput obtained by using different algorithms in UD-IoE increases with increasing TL, and the algorithm adopted in this example is significantly higher than the other comparative algorithms. Fig. 7 verifies that the present example algorithm can effectively improve system network throughput.
Fig. 8 is a diagram of UD-IoE minimum network throughput versus network throughput when the system is subject to interference. Fig. 8 compares the minimum network throughput for the present example algorithm and 3 comparison algorithms at different numbers of communication links. As TL increases, the network throughput obtained by the algorithm of this example and the 3 comparison algorithms shows an increasing trend. The algorithm of the example effectively improves the minimum network throughput compared with the comparison algorithms 1, 2 and 3. Fig. 8 verifies that the algorithm of the present example can effectively improve the anti-interference capability of the system.
Fig. 9 is a graph comparing the resource multiplexing rates in UD-IoE, and for a communication system, the higher the resource multiplexing rate, the better the performance of the system. The average resource multiplexing rate at different numbers of transmission links for the comparison example algorithm and 3 comparison algorithms in fig. 9. Wherein, the algorithm has higher resource multiplexing rate than the other three comparison algorithms. Fig. 9 verifies that the algorithm of this example can effectively improve the resource multiplexing rate.
Fig. 10 is a graph of UD-IoE average SINR versus, where having a higher SINR value means that the network has lower interference for the communication system. Fig. 10 compares SINR values for the present example algorithm and the other 3 comparison algorithms at different transmission link numbers. The present example algorithm has a higher SINR value than the other three algorithms. The algorithm provided by the example improves the signal-to-interference-and-noise ratio by optimizing the resource allocation, ensures the requirement of the transmission rate, and effectively avoids serious co-channel interference.
Experimental results show that compared with a comparison algorithm, the method has higher network resource reuse rate and throughput, and better performance in D2D-assisted UD-IoE.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (9)

1. The ultra-dense Internet of things resource allocation method based on GCN-DDPG is characterized by comprising the following steps:
s1, constructing a communication model of the ultra-dense Internet of things;
the communication model comprises a D2D-assisted UD-IoE layer and a BBU pool, wherein D2D represents end-to-end, UD-IoE represents ultra-dense Internet of things, BBU represents a baseband unit, the UD-IoE layer comprises N IoED, M communication links and a plurality of RRHs, ioED represents Internet of things equipment, and RRHs represent remote radio heads; each RRH is connected to the BBU pool by a high-speed forward link, responsible for providing primary coverage and secondary access; the IoED adopts a D2D communication mode, expands the communication range according to a multi-hop communication link, and finally is connected with a BBU pool through RRH of the UD-IoE layer; the BBU pool consisting of BBUs and computing servers collects all environmental information in the network and distributes resources to communication links;
s2, establishing a conflict graph according to the communication model;
s3, according to a maximum group generation theorem and a greedy algorithm, all the maximum groups in the conflict graph are obtained, and according to the relationship between the superside and the maximum groups, under the condition that the adjacent relationship of vertexes is kept unchanged, the conflict graph is simplified into a conflict supergraph;
s4, performing resource allocation on the ultra-dense Internet of things by adopting a deep reinforcement learning model based on a graph convolution neural network, namely a GCN-DDPG model based on the conflict hypergraph;
the GCN-DDPG model comprises an actor network and a criticizer network;
constructing a two-layer graph convolution neural network in the actor network, inputting an adjacent matrix and a feature matrix of the conflict hypergraph, wherein the feature matrix is a resource allocation matrix, namely a vertex dyeing matrix, then obtaining node features, and selecting proper colors according to the node features, namely performing resource allocation to obtain a colored feature matrix;
and constructing a two-layer graph convolution neural network in the criticizing network, inputting the adjacency matrix of the conflict hypergraph and the colored feature matrix, and outputting a node feature, namely evaluating the selected action.
2. The GCN-DDPG-based ultra-dense internet of things resource allocation method according to claim 1, wherein in the step S4, the GCN-DDPG model further comprises an actor target network and a criticizer target network;
in the training process, the BBU pool is used as an agent, and the agent takes the current state s t Input into the actor network to obtain action a t The agent performs a in the environment t =μ(s tμ ) μ () represents the actor network, θ μ Parameters representing the actor network and then obtaining a timely prize r t And obtains the next state s t+1 And(s) t ,a t ,r t ,s t+1 ) Stored in an experience replay buffer for further training; when the experience pool is full, the agent randomly selects Nc data to form a small batch of data, where the state-action of the nth data (s n ,a n ) Is of the estimated value Q(s) n ,a nQ ) Is obtained from the criticizing network, θ Q Parameters representing the criticizing network;
state-action(s) n ,a n ) Is calculated from the following equation:
wherein, gamma is a discount factor,representing the criticizing target network, +.>Parameters representing the target network of the criticizer, < >>Representing the actor target network, +.>Parameters representing the actor target network.
3. The GCN-DDPG-based ultra-dense internet of things resource allocation method according to claim 2, wherein the criticizing network is trained by minimizing a mean square error, a mean square error L (θ Q ) Calculated from the following formula:
updating parameters of the criticizing network through a gradient descent method; by reacting L (θ) Q ) Middle theta Q Differentiation to obtain L (θ) Q ) Is a gradient of (a).
4. The GCN-DDPG-based ultra-dense internet of things resource allocation method of claim 3, wherein the actor network is trained by maximizing expected revenue for an initial distribution of environmental statesExpected benefit of initial distribution J (θ μ ) Calculated from the following formula:
updating parameters of the actor network through a gradient descent method; calculation of J (θ) by differential chain law μ ) Is a gradient of (2);
updating the parameters of the criticizing target network and the parameters of the actor target network to obtain updated parameters of the criticizing target network and parameters of the actor network respectively.
5. The GCN-DDPG based ultra-dense internet of things resource allocation method of claim 1, wherein in the actor network or the criticizing network, the operations of the graph roll-up neural network are expressed as:
f(X,A)=σ(AReLu(AXW 0 )W 1 )
wherein the matrix is customized for a simplified formA=a+i, a represents the adjacency matrix of the collision hypergraph, I represents the identity matrix, d=Σ j A ij Represents a diagonal matrix, A ij Elements representing row A, row I, column j, W k The weight matrix representing the k+1 layer, k=0, 1, σ (·) is an activation function, reLu () represents the ReLu activation function, and X represents the feature matrix.
6. The GCN-DDPG-based ultra-dense internet of things resource allocation method of claim 5, wherein the node H in each layer is updated according to the following formula k
7. The GCN-DDPG-based ultra-dense internet of things resource allocation method according to any one of claims 1 to 6, wherein the conflict hypergraph is denoted as G H ={V H ,E H }, wherein V H Is a set of vertices, the vertices representing communication links, E H Is a superset representing conflicting relationships between communication links, E H Any superside of is V H The vertices in the superside have the same relationship with other vertices, G H The relation between any vertex v and any superside e is represented by an association matrix H, and the values of the elements H (v, e) of any row and any column of the H are as follows:
where h (v, e) =1 indicates that vertex v is associated with superside e, i.e., superside e contains vertex v.
8. The GCN-DDPG-based ultra-dense internet of things resource allocation method according to claim 7, wherein the step S2 specifically comprises the steps of:
s21, establishing a conflict graph of the communication model, wherein the conflict graph is expressed as:
G C =(V C ,E C )
wherein V is C ={e 1 ,e 2 ,...,e M ' is a collection of vertices representing a communication link, E C Is a set of edges representing resource conflict relationships between communication links, V C Vertex sum E of (2) C The relationship between edges in (a) is defined by an adjacency matrix G C The representation is:
wherein the adjacency matrix G C Element b of the nth row and mth column of the medium nm =(e n ,e m ) The values are as follows:
s22, simplifying the adjacent matrix G C The method comprises the following steps:
wherein,will be denoted G C1 The main diagonal element of (1) is set to 0, G C1 Adjacency matrix representing conflict graph of direct conflict of records, I is identity matrix,/is->Will be denoted G C2 Is set to 0,G C2 An adjacency matrix representing a conflict graph recording hidden terminal conflicts; direct collision means that two communication links use the same channel at the same time and have the same transmit or receive IoED; a hidden terminal collision means that two communication links use the same channel at the same time, and that the sending or receiving IoED of one IoED pair is within the communication range of the ioeds of the other IoED pair.
9. The ultra-dense Internet of things resource allocation method based on GCN-DDPG is characterized by comprising the following steps of: comprising a processing module for performing the steps S1-S4 of any of claims 1-8.
CN202311658595.5A 2023-12-06 2023-12-06 Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG Pending CN117640417A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311658595.5A CN117640417A (en) 2023-12-06 2023-12-06 Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311658595.5A CN117640417A (en) 2023-12-06 2023-12-06 Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG

Publications (1)

Publication Number Publication Date
CN117640417A true CN117640417A (en) 2024-03-01

Family

ID=90019715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311658595.5A Pending CN117640417A (en) 2023-12-06 2023-12-06 Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG

Country Status (1)

Country Link
CN (1) CN117640417A (en)

Similar Documents

Publication Publication Date Title
CN109639377B (en) Spectrum resource management method based on deep reinforcement learning
Song et al. Wireless device-to-device communications and networks
Li et al. Downlink transmit power control in ultra-dense UAV network based on mean field game and deep reinforcement learning
CN111526592B (en) Non-cooperative multi-agent power control method used in wireless interference channel
CN114499629A (en) Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning
CN114698128B (en) Anti-interference channel selection method and system for cognitive satellite-ground network
Krishnan et al. Optimizing throughput performance in distributed MIMO Wi-Fi networks using deep reinforcement learning
CN111314928A (en) Wireless ad hoc network performance prediction method based on improved BP neural network
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN113239632A (en) Wireless performance prediction method and device, electronic equipment and storage medium
Nie et al. Pilot allocation and power optimization of massive MIMO cellular networks with underlaid D2D communications
Perlaza et al. On the base station selection and base station sharing in self-configuring networks
CN117440442A (en) Internet of things resource conflict-free distribution method and system based on graph reinforcement learning
CN115038155B (en) Ultra-dense multi-access-point dynamic cooperative transmission method
CN116634450A (en) Dynamic air-ground heterogeneous network user association enhancement method based on reinforcement learning
CN115811788A (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
CN117640417A (en) Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG
Rahim et al. Joint devices and IRSs association for terahertz communications in Industrial IoT networks
KR102555696B1 (en) Device and method for allocating resource in vehicle to everything communication based on non-orhthogonal multiple access
Jiang et al. Dynamic spectrum access for femtocell networks: A graph neural network based learning approach
CN117715218B (en) Hypergraph-based D2D auxiliary ultra-dense Internet of things resource management method and system
Wang et al. Decentralized wireless resource allocation with graph neural networks
CN108736991B (en) Group intelligent frequency spectrum switching method based on classification
Zhang et al. Intelligent joint beamforming and distributed power control for UAV-assisted ultra-dense network: A hierarchical optimization approach
Yang et al. Deep reinforcement learning in NOMA-assisted UAV networks for path selection and resource offloading

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination