CN117640417B - GCN-DDPG-based ultra-dense Internet of things resource allocation method and system - Google Patents

GCN-DDPG-based ultra-dense Internet of things resource allocation method and system Download PDF

Info

Publication number
CN117640417B
CN117640417B CN202311658595.5A CN202311658595A CN117640417B CN 117640417 B CN117640417 B CN 117640417B CN 202311658595 A CN202311658595 A CN 202311658595A CN 117640417 B CN117640417 B CN 117640417B
Authority
CN
China
Prior art keywords
conflict
network
matrix
internet
dense
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311658595.5A
Other languages
Chinese (zh)
Other versions
CN117640417A (en
Inventor
黄杰
李幸星
杨凡
杨成
喻涛
张仕龙
姚凤航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN202311658595.5A priority Critical patent/CN117640417B/en
Publication of CN117640417A publication Critical patent/CN117640417A/en
Application granted granted Critical
Publication of CN117640417B publication Critical patent/CN117640417B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to the technical field of D2D-assisted ultra-dense Internet of things, and particularly discloses a GCN-DDPG-based ultra-dense Internet of things resource allocation method and system, wherein the method comprises the following steps: constructing a communication model of the ultra-dense Internet of things; establishing a conflict graph according to the communication model; simplifying the conflict graph into a conflict hypergraph; and performing resource allocation on the ultra-dense Internet of things by adopting a deep reinforcement learning model based on a graph convolution neural network, namely a GCN-DDPG model based on the conflict hypergraph. The invention analyzes the resource multiplexing conflict relation by constructing a conflict graph model, converts the conflict graph model into a conflict hypergraph model, simultaneously analyzes the conflict relation among a plurality of transmission links (transmission link, TL), and converts the resource allocation problem into the vertex strong coloring problem of the hypergraph. Finally, a conflict-free resource allocation strategy based on a graph roll reinforcement learning algorithm (GCN-DDPG) is provided. Experimental results show that compared with a comparison algorithm, the algorithm has higher network resource reuse rate and throughput, and has better performance in D2D-assisted UD-IoE.

Description

GCN-DDPG-based ultra-dense Internet of things resource allocation method and system
Technical Field
The invention relates to the technical field of D2D-assisted ultra-dense Internet of things, in particular to a GCN-DDPG-based ultra-dense Internet of things resource allocation method and system.
Background
With the development of wireless communication technology, the number of internet of Everything (IoE, internet of Everything) is rapidly growing, and the number of devices (IoED, internet ofEverything Devices) of the internet of Everything is rapidly increasing, which means that the internet of things will evolve into Ultra-dense internet of things (UD-IoE, ultra-DENSE INTERNET of eventing) in the future, and challenges such as limited network throughput and limited spectrum resource utilization are faced. Device-to-Device (D2D) communication may be used to mitigate challenges such as limited network throughput and resource utilization in UD-IoE. D2D communication supports direct communication between two IoED and shares resources between the two devices. However, when the communication ranges between IoED overlap and the same channel is used for transmission, serious interference phenomenon occurs. Therefore, in the D2D-assisted UD-IoE, an effective resource management method is required to be relied on, so that network performance is guaranteed, and serious interference is avoided.
Aiming at D2D-assisted UD-IoE resource management, the existing research focuses on network throughput performance, and an optimization model is built and solved by adopting an optimization method. However, the above method is highly dependent on the accuracy of the optimization model, and the calculation amount increases very rapidly with the network scale, so that it is difficult to adapt to an ultra-dense environment with a huge amount of IoE devices.
With the development of artificial intelligence (AI, ARTIFICIAL INTELLIGENCE), machine learning has become a very effective technique to deal with large amounts of data, high computational tasks, and mathematically complex nonlinear non-convex problems. Currently, more and more researchers apply reinforcement learning to resource management and allocation of wireless communication systems. However, the existing research does not consider the problem of mass hidden terminal interference in the intensive deployment scene of the ultra-intensive Internet of things. In the ultra-dense Internet of things, the massive IoED dense deployment enables the communication range among the terminals to be densely overlapped, so that a large amount of potential hidden terminal interference is caused, the possibility of resource multiplexing conflict is greatly improved, and the difficulty of resource management in the ultra-dense Internet of things is seriously increased.
Disclosure of Invention
The invention provides a GCN-DDPG-based ultra-dense Internet of things resource allocation method and a system, which solve the technical problems that: how to effectively avoid resource conflict between communication links in D2D-assisted UD-IoE.
In order to solve the technical problems, the invention provides an ultra-dense Internet of things resource allocation method based on GCN-DDPG, which comprises the following steps:
S1, constructing a communication model of the ultra-dense Internet of things;
The communication model comprises a D2D-assisted UD-IoE layer and a BBU pool, wherein D2D represents end-to-end, UD-IoE represents ultra-dense Internet of things, BBU represents a baseband unit, the UD-IoE layer comprises N IoED, M communication links and a plurality of RRHs, ioED represents Internet of things equipment, and RRHs represent remote radio heads; each RRH is connected to the BBU pool by a high-speed forward link, responsible for providing primary coverage and secondary access; ioED adopts a D2D communication mode, expands the communication range according to a multi-hop communication link, and finally connects a BBU pool through RRH of the UD-IoE layer; the BBU pool consisting of BBUs and computing servers collects all environmental information in the network and distributes resources to communication links;
s2, establishing a conflict graph according to the communication model;
S3, according to a maximum group generation theorem and a greedy algorithm, all the maximum groups in the conflict graph are obtained, and according to the relationship between the superside and the maximum groups, under the condition that the adjacent relationship of vertexes is kept unchanged, the conflict graph is simplified into a conflict supergraph;
S4, performing resource allocation on the ultra-dense Internet of things by adopting a deep reinforcement learning model based on a graph convolution neural network, namely a GCN-DDPG model based on the conflict hypergraph;
the GCN-DDPG model comprises an actor network and a criticizer network;
Constructing a two-layer graph convolution neural network in the actor network, inputting an adjacent matrix and a feature matrix of the conflict hypergraph, wherein the feature matrix is a resource allocation matrix, namely a vertex dyeing matrix, then obtaining node features, and selecting proper colors according to the node features, namely performing resource allocation to obtain a colored feature matrix;
and constructing a two-layer graph convolution neural network in the criticizing network, inputting the adjacency matrix of the conflict hypergraph and the colored feature matrix, and outputting a node feature, namely evaluating the selected action.
Further, in the step S4, the GCN-DDPG model further includes an actor target network and a criticizer target network;
During training, taking the BBU pool as an agent, inputting the current state s t into the actor network by the agent to obtain an action a t, executing a t=μ(stμ by the agent in the environment, wherein mu () represents the actor network, theta μ represents the parameters of the actor network, then obtaining a timely reward r t and obtaining the next state s t+1, and storing (s t,at,rt,st+1) in an experience replay buffer for further training; when the experience pool is full, the intelligent agent randomly selects Nc data to form small batch data, wherein an estimated value Q (s n,anQ) of a state-action (s n,an) of nth data is obtained by the criticizer network, and theta Q represents parameters of the criticizer network;
The target value for the state-action (s n,an) is calculated by the following equation:
wherein, gamma is a discount factor, Representing the target network of the criticizing agent,Parameters representing the criticizing target network,Representing the target network of the actor,Parameters representing the actor target network.
Further, by training the criticizing network with a minimum of mean square error, the mean square error L (θ Q) is calculated by:
Updating parameters of the criticizing network through a gradient descent method; the gradient of L (θ Q) is obtained by differentiating θ Q in L (θ Q).
Further, training the actor network by maximizing the expected revenue of the initial distribution of environmental states, the expected revenue J (θ μ) of the initial distribution of environmental states is calculated by:
Updating parameters of the actor network through a gradient descent method; obtaining the gradient of J (theta μ) through differential chain rule calculation;
Updating the parameters of the criticizing target network and the parameters of the actor target network to obtain updated parameters of the criticizing target network and parameters of the actor network respectively.
Further, in the actor network or the criticizing network, the operations of the graph convolution neural network are expressed as:
f(X,A)=σ(AReLu(AXW0)W1)
Wherein the matrix is customized for a simplified form A=a+i, a represents the adjacency matrix of the collision hypergraph, I represents the identity matrix, d=Σ jAij represents a diagonal matrix, a ij represents the elements of the ith row and jth column of a, W k represents the weight matrix of the k+1st layer, k=0, 1, σ (), is an activation function, reLu () represents ReLu activation function, and X represents the feature matrix.
Further, the node H k in each layer is updated according to the following equation:
Further, the conflict hypergraph is denoted as G H={VH,EH }, where V H is a vertex set, vertices represent communication links, E H is a hyperedge set, a hyperedge represents a conflict relationship between the communication links, any hyperedge in E H is a subset of V H, the vertices in the hyperedge have the same relationship with other vertices, G H represents a relationship between any vertex V and any hyperedge E by using an association matrix H, and values of an element H (V, E) of any row and any column of H are as follows:
where h (v, e) =1 indicates that vertex v is associated with superside e, i.e., superside e contains vertex v.
Further, the step S2 specifically includes the steps of:
S21, establishing a conflict graph of the communication model, wherein the conflict graph is expressed as:
GC=(VC,EC)
Where V C={e1,e2,...,eM is the set of vertices representing communication links, E C is the set of edges representing resource conflict relationships between communication links, and the relationship between the vertices in V C and the edges in E C is represented by the adjacency matrix G C:
Wherein the element b nm=(en,em of the nth row and mth column in the adjacent matrix G C takes the following values:
S22, simplifying an adjacent matrix G C to be:
Wherein, Representing the matrix resulting from setting the main diagonal element of G C1 to 0, G C1 represents the adjacency matrix of the conflict graph recording the direct conflict, I is the identity matrix,Setting the main diagonal element of G C2 to 0,G C2 represents an adjacency matrix of a conflict graph recording hidden terminal conflicts; direct collision means that two communication links use the same channel at the same time and have the same transmission or reception IoED; a hidden terminal collision means that two communication links use the same channel at the same time and that the transmission or reception IoED of one IoED pair is within the communication range of IoED of the other IoED pair.
The invention also provides a GCN-DDPG-based ultra-dense Internet of things resource allocation method, which is characterized in that: the method comprises a processing module, wherein the processing module is used for executing the steps S1-S4 in the method.
According to the GCN-DDPG-based ultra-dense Internet of things resource allocation method and system, the resource multiplexing conflict relation is analyzed by constructing the conflict graph model, the conflict graph model is converted into the conflict hypergraph model, the conflict relation among a plurality of transmission links (transmission link, TL) is analyzed at the same time, and the resource allocation problem is converted into the vertex strong coloring problem of the hypergraph. And finally, providing a conflict-free resource allocation strategy based on a graph convolution reinforcement learning algorithm. In particular, the beneficial effects are that:
1) Aiming at the problem of hidden terminal interference in D2D-assisted UD-IoE, the resource multiplexing conflict relation among the TL is analyzed, and the conflict types among the TL are classified. Then, based on the resource conflict relation between UD-IoE layers, a conflict graph model is established, a matrix transformation method is designed to construct a conflict graph, and the conflict relation between logic layers is intuitively reflected.
2) The method aims at solving the problem that the conflict graph model can not analyze resource conflicts among a plurality of TL at the same time, converts the conflict graph model into a conflict hypergraph through the biggest group and hypergraph theory, converts the conflict-free resource allocation problem into the vertex coloring problem of the hypergraph, and designs a method for calculating the hypergraph coloring.
3) Aiming at the hypergraph vertex coloring problem, a depth deterministic strategy gradient (GCN-DDPG) algorithm/model based on graph convolution is provided. The algorithm adopts an Actor-critic (criticizer) network learning D2D-assisted UD-IoE resource allocation process, and dynamically adjusts a resource allocation scheme according to sample data in an experience playback pool. Under the condition of ensuring that UD-IoE has no conflict, the resource multiplexing rate is improved. Simulation results show that the network throughput of UD-IoE is improved by the algorithm.
4) Simulation results show that GCN-DDPG has higher resource multiplexing rate and network throughput.
Drawings
Fig. 1 is a diagram of an exemplary scenario of an ultra-dense internet of things provided by an embodiment of the present invention;
Fig. 2 is a diagram illustrating D2D communication provided by an embodiment of the present invention;
FIG. 3 is an illustration of a conflict graph provided by an embodiment of the present invention;
FIG. 4 is an illustration of a conflict superpositioned graph provided by an embodiment of the invention;
FIG. 5 is a schematic diagram of a GCN-DDPG-based resource conflict-free allocation algorithm provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of a graph roll-up neural network provided by an embodiment of the present invention;
FIG. 7 is a graph comparing the maximum network throughput of UD-IoE under four algorithms provided by an embodiment of the present invention;
FIG. 8 is a graph comparing the maximum network throughput of UD-IoE under four algorithms provided by an embodiment of the present invention;
FIG. 9 is a graph showing a comparison of resource multiplexing rates of UD-IoE under four algorithms provided by an embodiment of the present invention;
Fig. 10 is a graph comparing signal-to-interference-and-noise ratios (SINR) of UD-IoE under four algorithms provided by an embodiment of the present invention.
Detailed Description
The following examples are given for the purpose of illustration only and are not to be construed as limiting the invention, including the drawings for reference and description only, and are not to be construed as limiting the scope of the invention as many variations thereof are possible without departing from the spirit and scope of the invention.
In order to avoid resource conflict between communication links in D2D-assisted UD-IoE as effectively as possible, the embodiment of the invention provides an ultra-dense Internet of things resource allocation method based on GCN-DDPG, which specifically comprises the following steps:
S1, constructing a communication model of the ultra-dense Internet of things;
S2, establishing a conflict graph according to the communication model;
S3, according to a maximum group generation theorem and a greedy algorithm, all the maximum groups in the conflict graph are obtained, and according to the relationship between the superside and the maximum groups, the conflict graph is simplified into a conflict supergraph under the condition that the adjacent relationship between the vertexes is kept unchanged;
and S4, performing resource allocation on the ultra-dense Internet of things by adopting a deep reinforcement learning model based on a graph convolution neural network, namely a GCN-DDPG model based on the conflict hypergraph.
(1) Step S1: communication model
The scene is ultra-dense Internet of things, and comprises a baseband unit (BBU) pool and a D2D auxiliary UD-IoE layer, as shown in figure 1. UD-IoE contains N loED ({ loED 1, loud 2, …, loED N }) with M TL ({ TL 1, TL 2, …, TL M }) between loED, and multiple Remote Radio Heads (RRH). The RRH is connected with the BBU pool through a high-speed transmission link, and the RRU is responsible for basic coverage and auxiliary access. IoED adopts a D2D communication mode, further expands the communication range according to the multi-hop TL, and finally connects with a BBU pool through the RRH of the UD-IoE layer. The BBU pool consists of BBUs and servers, and is responsible for collecting all environment information in the network and distributing resources to TL in UD-IoE. In the ultra-dense internet of things, massive IoED dense deployment enables the communication range between terminals to be densely overlapped and covered, so that a large amount of potential hidden terminal interference is caused. Therefore, the problem of the interference of the hidden terminals in UD-IoE is mainly considered in the embodiment.
Let D ij denote the euclidean distance between any two IoED i and j in this example, assuming that all IoED have the same transmission power, and a rayleigh fading model is used. S ij is that the receiving end IoEDi receives the signal sent by the sending end IoED j. The signal to noise ratio (SINR) of IoED i can be expressed as:
where N e is ambient noise and k represents interference transmitted by other IoED.
When the SINR reaches a specific threshold s th, the receiving end j can decode normally, and according to equation (1), the transmission rate can be expressed as:
Where B is the channel bandwidth.
Assuming that all IoED have the same transmission power P and the same radius of communication R T, this is denoted asAlpha is the path attenuation factor.
(2) Step S2: establishing conflict graph model
The graph model may be represented by g= { V, E } where v= { V 1,v2,…,vN } is the vertex set, representing all IoED. E= { E 1,e2,…,eM } is an edge set, representing TL. The relationship of vertices to edges is expressed as:
when (e m,vn) =1 indicates that vertex vn is associated with edge em, i.e. IoED can communicate with the other IoED connected via TL, otherwise it cannot. Therefore, the association matrix H can be written:
in the correlation matrix, a mn represents the value of (e m,vn).
In the graph model, two resource conflict-free situations exist in D2D communication, which are respectively: 1) The vertex is not associated with any edge, i.e., it is an isolated vertex. 2) There are only two vertices and one edge, which means that the TL does not collide with other TL when allocating resources. In both cases, communication resources may be allocated to TL. To facilitate analysis of the massive hidden terminal interference problem for dense deployments of ultra-dense internet of things, fig. 2 gives an example, which includes 20 IoED and 32 TL.
To avoid resource conflicts between the TL in the D2D-assisted UD-IoED, the potential conflict types are classified for analysis as follows.
Direct conflict: TL1 and TL2 would collide directly if the same channel were used at the same time. Examples: ioED1 and IoED in fig. 2 directly collide when simultaneously transmitting information to IoED 2.
Hiding terminal conflict: TL1 and TL3 may experience hidden terminal collision if one channel is used at a time. Examples: in fig. 2, ioED sends a message to IoED2, while IoED3 sends a message to IoED, a hidden terminal collision may occur.
The direct conflict problem is typically edge coloring in nature and can be solved by an edge coloring algorithm. However, the hidden terminal collision problem is not suitable to be solved with typical edge coloring algorithms. Therefore, it is necessary to further build a conflict model and analyze.
Let G c={Vc,Ec denote the conflict graph model, where V c=E={e1,e2,…,em denotes all TL sets and E c denotes the edge sets of resource conflict relationships between the TL's. The adjacency matrix between vertices and edges is represented by matrix G c:
Wherein b nm=(en,em), b nm =1 if e n collides with e m, and 0 otherwise.
In D2D-assisted UD-IoE, the reachability of TL i and TL indicates that one path exists from TL i to TL j, and the path length is the number of vertices in the path. If there are many paths between TL i and TL j, but any one of them collides, they cannot share resources. Therefore, the number of paths that cause collisions between TL need not be considered in analyzing the collision relationship. Depending on reachability, TL that are in direct conflict with each other can be reached in one step. TL that collide with each other for hidden terminals can be reached in two steps. The direct conflict calculation method is as follows:
Wherein G C1 is a conflict graph adjacency matrix recording direct conflicts, G I is an association matrix of the graph model, Is the transposed matrix of G I. As indicated by the letter, ", the elements c and d were operated on as follows:
The main diagonal element in G C1 is 1. To obtain a direct conflicting relationship between different TL, the main diagonal element is set to 0, i.e.:
i represents an identity matrix.
Similar to equation (6), the method for computing the hidden terminal conflict is as follows:
The main diagonal element in G C2 is set to 0, as in equation (8), i.e
According to equations (8) and (10), the adjacency matrix G C for the collision map is obtained as:
The conflict graph of FIG. 2 is shown in FIG. 3 as obtainable from G C.
(3) Step S3: construction of conflict hypergrams
Fig. 3 shows the collision relationship between TL. Since the edges in the conflict graph include only 2 nodes, it is difficult to reflect the resource conflict relationship of multiple TL. Due to the dense deployment of massive terminals, resource conflicts of multiple TL are the main cause of hidden terminal interference. Therefore, it is necessary to further simplify the conflict graph so that it keeps the conflict relationship between vertices unchanged and simultaneously analyze the conflict relationship between a plurality of TL.
In order to analyze the conflict relationship between multiple TL simultaneously, the conflict graph is simplified to a conflict hypergraph based on the clique and hypergraph theory. Let G H={VH,EH denote a conflict hypergraph, V H denote a vertex set, and E H denote a superedge set. All vertices in the superside have the same relationship. The hypergraph can be represented by an association matrix H, and the values of elements in H are as follows:
h (v, e) =1 means that vertex v is on superside e, i.e. v and e are associated. For any vertex v 1,v2∈e,v1 and v 2 adjacent, the set of adjacent vertices of v 1 is denoted as N (v).
Because of the nature of the high-density connections between vertices in the conflict graph G c={Vc,Ec, many complete subgraph G s={Vs,Es can be generated. Each complete subgraph is a cluster with all vertices adjacent to each other, which can be represented by a superside. Therefore, according to the relationship between the superside and the bigram, the conflict graph can be simplified into a conflict supergraph under the condition that the adjacent relationship between the vertexes is kept unchanged. All the biggest cliques in the conflict graph can be solved according to the biggest clique generation theorem and a greedy algorithm, and the conflict graph is converted into a conflict hypergraph, as shown in fig. 4. In fig. 4, vertices represent TL, and supersides represent conflicting relationships between them. Accordingly, collision-free resource allocation can translate into a vertex strong coloring problem for collision hypergraphs.
(4) Step S4: conflict-free resource allocation model
The problem of strong coloring of vertices in conflict hypergraphs is solved. The method provides a reinforcement learning resource conflict-free allocation algorithm (GCN-DDPG) based on graph convolution, constructs a resource allocation MDP model considering two types of conflicts of the ultra-dense Internet of things, and solves an optimal resource allocation strategy pi * by combining the graph convolution reinforcement learning depth deterministic strategy gradient algorithm.
Aiming at the problem of strong coloring of the hypergraph vertex, firstly, the hypergraph vertex is converted into a sequence decision problem through a Markov decision process, a BBU pool is used as an agent, and the agent is combined with a depth deterministic strategy gradient algorithm, so that the agent continuously searches for the hypergraph vertex coloring scheme through training. The specific definitions for state S, action a, rewarding function are as follows:
1) State S: s= { H, K t }, where H is the correlation matrix of the collision hypergraph and Kt represents the allocation of all TL.
2) Action a: the intelligent agent makes an observation on the current state according to the current state and makes a set of corresponding communication link resource allocation, namely an action set.
3) Awards R: and the intelligent agent executes corresponding actions under the current state to obtain corresponding returns.
As shown in fig. 5, the GCN-DDPG algorithm structure proposed by the present example algorithm is composed of actor network (actor network) and critic network (criticizer network), the former is used to generate actions, and the latter is used to evaluate the performance of generating actions. The present example algorithm creates four networks for stability and convergence, actor, actor-target (actor target), critic, critic-target (criticizer target), and updates the target network with soft updates.
The specific algorithm comprises the following steps: first, the agent enters the current state s t into actor network, resulting in action a t=μ(stμ), where μ () represents actor the network and θ μ represents parameters in actor the network. The agent performs a t in the environment and then gets a timely prize r t and gets the next state s t+1. Store (s t,at,rt,st+1) in the experience replay buffer for further training. When the experience pool is full, nc data is randomly selected to make up the small batch of data. Let (s n,an,rn,sn+1) be the nth set of data in the small lot of data. The state-action (s n,an) estimate is obtained from the critic network and is represented by Q (s n,anQ), where Q (s, a) represents the critic network and θ Q represents a parameter in the critic network. The target value for the state-action (s n,an) can be derived based on the bellman equation, namely:
Where gamma is the discount factor and where gamma is the discount factor, The representation critic-target network is shown,The critic-target network parameters are represented,The representation actor-target network is shown,And shows actor-target network parameters.
The critic network may be trained by minimizing the Mean Square Error (MSE) loss function, i.e.:
The gradient of critic networks can be obtained by differentiating θ Q in L (θ Q), namely:
the Actor network may be trained by maximizing the expected revenue of the initial distribution of environmental states, namely:
and updating parameters of the Actor network through the gradient.
Then, the target network is updated by using soft update, that is, the parameters of the criticizing target network and the parameters of the actor target network are respectively the updated parameters of the criticizing target network and the parameters of the actor network.
Two-layer GCN (graph convolutional neural network) models are built in actor and critic networks to obtain features, and can be expressed as:
f(X,A)=σ(AReLu(AXW0)W1) (18)
Wherein the method comprises the steps of A=a+i, a represents the adjacency matrix of one graph, and I represents the identity matrix. σ (·) and ReLu () are activation functions, W k represents the weight matrix of the k+1st layer.
For one graph adjacency matrix A and feature matrix X, the feature matrix is a resource allocation matrix, i.e. a vertex staining matrix. In a graph convolutional neural network, nodes in each layer are updated according to the following propagation rules:
where a=a+i, σ (·) is an activation function, D represents a diagonal matrix, where d= Σ jAij.Wk represents the weight matrix of the k-th layer, whose value will follow the network update.
Constructing two layers of convolution layers in actor networks, inputting an adjacency matrix and a feature matrix of the hypergraph, then obtaining node features, and selecting proper colors according to the node features, namely performing resource allocation; in critic networks, a graph convolution neural network is also built, an adjacency matrix of the hypergraph and a dyed feature matrix are input, and a node feature is output, namely, evaluation of the selection action is performed. The specific flow is shown in fig. 6.
In order to facilitate implementation, the embodiment of the invention also provides an ultra-dense Internet of things resource allocation system based on GCN-DDPG, which is provided with a processing module for executing steps S1 to S4 in the method.
According to the GCN-DDPG-based ultra-dense Internet of things resource allocation method and system provided by the embodiment of the invention, the resource multiplexing conflict relation is analyzed by constructing the conflict graph model, the conflict graph model is converted into the conflict hypergraph model, the conflict relation among a plurality of transmission links (transmission link, TL) is analyzed at the same time, and the resource allocation problem is converted into the vertex strong coloring problem of the hypergraph. And finally, providing a conflict-free resource allocation strategy based on a graph convolution reinforcement learning algorithm. In particular, the beneficial effects are that:
1) Aiming at the problem of hidden terminal interference in D2D-assisted UD-IoE, the resource multiplexing conflict relation among the TL is analyzed, and the conflict types among the TL are classified. Then, based on the resource conflict relation between UD-IoE layers, a conflict graph model is established, a matrix transformation method is designed to construct a conflict graph, and the conflict relation between logic layers is intuitively reflected.
2) The method aims at solving the problem that the conflict graph model can not analyze resource conflicts among a plurality of TL at the same time, converts the conflict graph model into a conflict hypergraph through the biggest group and hypergraph theory, converts the conflict-free resource allocation problem into the vertex coloring problem of the hypergraph, and designs a method for calculating the hypergraph coloring.
3) Aiming at the hypergraph vertex coloring problem, a depth deterministic strategy gradient (GCN-DDPG) algorithm/model based on graph convolution is provided. The algorithm adopts an Actor-critic (criticizer) network learning D2D-assisted UD-IoE resource allocation process, and dynamically adjusts a resource allocation scheme according to sample data in an experience playback pool. Under the condition of ensuring that UD-IoE has no conflict, the resource multiplexing rate is improved. Simulation results show that the network throughput of UD-IoE is improved by the algorithm.
Simulation results and performance analysis were performed below.
The experiment adopts a hardware platform of PC, CPU is Intel (R) Xeon (R) Gold 6242R CPU@3.10GHz,GPU is NVIDIA RTX 3080Ti, and the memory is 64G. In the embodiment, simulation is performed in an ultra-dense Internet of things scene, scene parameters are shown in table 1, simulation experiments are performed according to the parameter settings of table 1, and comparative experiment data of network throughput and resource multiplexing rate performance are obtained respectively. The method (GCN-DDPG) is compared with other three algorithms for network throughput and resource multiplexing rate, wherein the three algorithms are a resource allocation (RM) algorithm based on random matching, a resource allocation (GA) algorithm based on greedy matching and a resource allocation (MND) algorithm based on maximum node degree matching.
Table 1 list of simulation experiment parameters
1) Network throughput: the performance index can evaluate the network throughput of UD-IoE after the resource allocation algorithm allocates all the communication link resources, and can be expressed as follows:
where B is the spectrum bandwidth of the communication link, For average signal-to-interference-and-noise ratio, P b is the maximum tolerable error rate.
2) Resource multiplexing rate: the performance index can evaluate the communication resource multiplexing rate of the resource allocation algorithm after all the UD-IoE communication links are free from interference, and the communication resource multiplexing rate can be expressed as follows:
Where η * is the number of communication resources ultimately used.
3) Signal-to-interference-and-noise ratio: the ratio of the signal to the sum of interference and noise in the system is calculated as shown in formula (1).
Fig. 7 is a graph of UD-IoE maximum network throughput versus, where maximum throughput refers to the throughput of the system without interference. Fig. 7 shows that the network throughput obtained by using different algorithms in UD-IoE increases with increasing TL, and the algorithm adopted in this example is significantly higher than the other comparative algorithms. Fig. 7 verifies that the present example algorithm can effectively improve system network throughput.
Fig. 8 is a diagram of UD-IoE minimum network throughput versus network throughput when the system is subject to interference. Fig. 8 compares the minimum network throughput for the present example algorithm and 3 comparison algorithms at different numbers of communication links. As TL increases, the network throughput obtained by the algorithm of this example and the 3 comparison algorithms shows an increasing trend. The algorithm of the example effectively improves the minimum network throughput compared with the comparison algorithms 1, 2 and 3. Fig. 8 verifies that the algorithm of the present example can effectively improve the anti-interference capability of the system.
Fig. 9 is a graph comparing the resource multiplexing rates in UD-IoE, and for a communication system, the higher the resource multiplexing rate, the better the performance of the system. The average resource multiplexing rate at different numbers of transmission links for the comparison example algorithm and 3 comparison algorithms in fig. 9. Wherein, the algorithm has higher resource multiplexing rate than the other three comparison algorithms. Fig. 9 verifies that the algorithm of this example can effectively improve the resource multiplexing rate.
Fig. 10 is a graph of UD-IoE average SINR versus, where having a higher SINR value means that the network has lower interference for the communication system. Fig. 10 compares SINR values for the present example algorithm and the other 3 comparison algorithms at different transmission link numbers. The present example algorithm has a higher SINR value than the other three algorithms. The algorithm provided by the example improves the signal-to-interference-and-noise ratio by optimizing the resource allocation, ensures the requirement of the transmission rate, and effectively avoids serious co-channel interference.
Experimental results show that compared with a comparison algorithm, the method has higher network resource reuse rate and throughput, and better performance in D2D-assisted UD-IoE.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (7)

1. The ultra-dense Internet of things resource allocation method based on GCN-DDPG is characterized by comprising the following steps:
S1, constructing a communication model of the ultra-dense Internet of things;
The communication model comprises a D2D-assisted UD-IoE layer and a BBU pool, wherein D2D represents end-to-end, UD-IoE represents ultra-dense Internet of things, BBU represents a baseband unit, the UD-IoE layer comprises N IoED, M communication links and a plurality of RRHs, ioED represents Internet of things equipment, and RRHs represent remote radio heads; each RRH is connected to the BBU pool by a high-speed forward link, responsible for providing primary coverage and secondary access; ioED adopts a D2D communication mode, expands the communication range according to a multi-hop communication link, and finally connects a BBU pool through RRH of the UD-IoE layer; the BBU pool consisting of BBUs and computing servers collects all environmental information in the network and distributes resources to communication links;
s2, establishing a conflict graph according to the communication model; the step S2 specifically includes the steps of:
S21, establishing a conflict graph of the communication model, wherein the conflict graph is expressed as:
GC=(VC,EC)
Where V C={e1,e2,...,eM is the set of vertices representing communication links, E C is the set of edges representing resource conflict relationships between communication links, and the relationship between the vertices in V C and the edges in E C is represented by the adjacency matrix G C:
Wherein the element b nm=(en,em of the nth row and mth column in the adjacent matrix G C takes the following values:
S22, simplifying an adjacent matrix G C to be:
Wherein, Representing the matrix resulting from setting the main diagonal element of G C1 to 0, G C1 represents the adjacency matrix of the conflict graph recording the direct conflict, I is the identity matrix,Setting the main diagonal element of G C2 to 0,G C2 represents an adjacency matrix of a conflict graph recording hidden terminal conflicts; direct collision means that two communication links use the same channel at the same time and have the same transmission or reception IoED; a hidden terminal collision means that two communication links use the same channel at the same time and that the transmission or reception IoED of one IoED pair is within the communication range of IoED of the other IoED pair;
S3, according to a maximum group generation theorem and a greedy algorithm, all the maximum groups in the conflict graph are obtained, and according to the relationship between the superside and the maximum groups, under the condition that the adjacent relationship of vertexes is kept unchanged, the conflict graph is simplified into a conflict supergraph;
S4, performing resource allocation on the ultra-dense Internet of things by adopting a deep reinforcement learning model based on a graph convolution neural network, namely a GCN-DDPG model based on the conflict hypergraph;
the GCN-DDPG model comprises an actor network and a criticizer network;
Constructing a two-layer graph convolution neural network in the actor network, inputting an adjacent matrix and a feature matrix of the conflict hypergraph, wherein the feature matrix is a resource allocation matrix, namely a vertex dyeing matrix, then obtaining node features, and selecting proper colors according to the node features, namely performing resource allocation to obtain a colored feature matrix;
Constructing a two-layer graph convolution neural network in the criticizing network, inputting the adjacency matrix of the conflict hypergraph and the colored feature matrix, and outputting a node feature, namely evaluating the selected action;
In the step S4, the GCN-DDPG model further includes an actor target network and a criticizer target network;
During training, taking the BBU pool as an agent, inputting the current state s t into the actor network by the agent to obtain an action a t, executing a t=μ(stμ by the agent in the environment, wherein mu () represents the actor network, theta μ represents the parameters of the actor network, then obtaining a timely reward r t and obtaining the next state s t+1, and storing (s t,at,rt,st+1) in an experience replay buffer for further training; when the experience pool is full, the intelligent agent randomly selects Nc data to form small batch data, wherein an estimated value Q (s n,anQ) of a state-action (s n,an) of nth data is obtained by the criticizer network, and theta Q represents parameters of the criticizer network;
The target value for the state-action (s n,an) is calculated by the following equation:
wherein, gamma is a discount factor, Representing the target network of the criticizing agent,Parameters representing the criticizing target network,Representing the target network of the actor,Parameters representing the actor target network.
2. The GCN-DDPG based ultra-dense internet of things resource allocation method according to claim 1, wherein the criticizing network is trained by minimizing a mean square error, the mean square error L (θ Q) is calculated by:
Updating parameters of the criticizing network through a gradient descent method; the gradient of L (θ Q) is obtained by differentiating θ Q in L (θ Q).
3. The GCN-DDPG based ultra-dense internet of things resource allocation method according to claim 2, wherein the actor network is trained by maximizing the expected revenue of the initial distribution of environmental states, the expected revenue of the initial distribution of environmental states J (θ μ) is calculated by:
Updating parameters of the actor network through a gradient descent method; obtaining the gradient of J (theta μ) through differential chain rule calculation;
Updating the parameters of the criticizing target network and the parameters of the actor target network to obtain updated parameters of the criticizing target network and parameters of the actor network respectively.
4. The GCN-DDPG based ultra-dense internet of things resource allocation method according to claim 1, wherein in the actor network or the criticizing network, the operations of the graph roll-up neural network are expressed as:
f(X,A)=σ(AReLu(AXW0)W1)
Wherein the matrix is customized for a simplified form A=a+i, a represents the adjacency matrix of the collision hypergraph, I represents the identity matrix, d=Σ jAij represents a diagonal matrix, a ij represents the elements of the ith row and jth column of a, W k represents the weight matrix of the k+1st layer, k=0, 1, σ (·) is an activation function, reLu () represents ReLu activation function, and X represents the feature matrix.
5. The GCN-DDPG based ultra-dense internet of things resource allocation method of claim 4, wherein nodes H k in each layer are updated according to:
6. The GCN-DDPG-based ultra-dense internet of things resource allocation method of any one of claims 1 to 5, wherein the conflict hypergraph is denoted as G H={VH,EH }, where V H is a vertex set, vertices represent communication links, E H is a hyperedge set, hyperedges represent conflict relationships between communication links, any hyperedge in E H is a subset of V H, vertices in the hyperedge have the same relationship with other vertices, G H represents a relationship of any vertex V with any hyperedge E with an association matrix H, and values of elements H (V, E) of any row and any column of H are as follows:
where h (v, e) =1 indicates that vertex v is associated with superside e, i.e., superside e contains vertex v.
7. The resource allocation method of the ultra-dense Internet of things based on GCN-DDPG is characterized by comprising the following steps of: comprising a processing module for performing the steps S1-S4 of any of claims 1-6.
CN202311658595.5A 2023-12-06 GCN-DDPG-based ultra-dense Internet of things resource allocation method and system Active CN117640417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311658595.5A CN117640417B (en) 2023-12-06 GCN-DDPG-based ultra-dense Internet of things resource allocation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311658595.5A CN117640417B (en) 2023-12-06 GCN-DDPG-based ultra-dense Internet of things resource allocation method and system

Publications (2)

Publication Number Publication Date
CN117640417A CN117640417A (en) 2024-03-01
CN117640417B true CN117640417B (en) 2024-07-16

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547325A (en) * 2022-01-14 2022-05-27 北京帝测科技股份有限公司 Probabilistic hypergraph-driven geoscience knowledge graph reasoning optimization system and method
CN116321469A (en) * 2023-03-28 2023-06-23 南京邮电大学 Method for avoiding channel conflict of large-scale self-organizing network based on conflict graph

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547325A (en) * 2022-01-14 2022-05-27 北京帝测科技股份有限公司 Probabilistic hypergraph-driven geoscience knowledge graph reasoning optimization system and method
CN116321469A (en) * 2023-03-28 2023-06-23 南京邮电大学 Method for avoiding channel conflict of large-scale self-organizing network based on conflict graph

Similar Documents

Publication Publication Date Title
CN109639377B (en) Spectrum resource management method based on deep reinforcement learning
Song et al. Wireless device-to-device communications and networks
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
Li et al. Downlink transmit power control in ultra-dense UAV network based on mean field game and deep reinforcement learning
CN113162679A (en) DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method
CN111526592B (en) Non-cooperative multi-agent power control method used in wireless interference channel
Pan et al. Artificial intelligence-based energy efficient communication system for intelligent reflecting surface-driven VANETs
CN114698128B (en) Anti-interference channel selection method and system for cognitive satellite-ground network
Krishnan et al. Optimizing throughput performance in distributed MIMO Wi-Fi networks using deep reinforcement learning
CN111314928A (en) Wireless ad hoc network performance prediction method based on improved BP neural network
CN113596785A (en) D2D-NOMA communication system resource allocation method based on deep Q network
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN113239632A (en) Wireless performance prediction method and device, electronic equipment and storage medium
CN117376355B (en) B5G mass Internet of things resource allocation method and system based on hypergraph
CN113382060B (en) Unmanned aerial vehicle track optimization method and system in Internet of things data collection
Siddiqi et al. Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access
Rahim et al. Joint devices and IRSs association for terahertz communications in Industrial IoT networks
CN116684852B (en) Mountain land metallocene forest environment unmanned aerial vehicle communication resource and hovering position planning method
CN117640417B (en) GCN-DDPG-based ultra-dense Internet of things resource allocation method and system
CN115038155B (en) Ultra-dense multi-access-point dynamic cooperative transmission method
CN115811788B (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
CN116634450A (en) Dynamic air-ground heterogeneous network user association enhancement method based on reinforcement learning
CN117640417A (en) Ultra-dense Internet of things resource allocation method and system based on GCN-DDPG
KR102555696B1 (en) Device and method for allocating resource in vehicle to everything communication based on non-orhthogonal multiple access
Jiang et al. Dynamic spectrum access for femtocell networks: A graph neural network based learning approach

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant