CN116506918A - Relay selection method based on cache region prediction - Google Patents

Relay selection method based on cache region prediction Download PDF

Info

Publication number
CN116506918A
CN116506918A CN202310505985.2A CN202310505985A CN116506918A CN 116506918 A CN116506918 A CN 116506918A CN 202310505985 A CN202310505985 A CN 202310505985A CN 116506918 A CN116506918 A CN 116506918A
Authority
CN
China
Prior art keywords
relay
node
buffer
state
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310505985.2A
Other languages
Chinese (zh)
Inventor
智慧
费洁
王雅宁
段苗苗
黄彧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202310505985.2A priority Critical patent/CN116506918A/en
Publication of CN116506918A publication Critical patent/CN116506918A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/22Communication route or path selection, e.g. power-based or shortest path routing using selective relaying for reaching a BTS [Base Transceiver Station] or an access point
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/04Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources
    • H04W40/08Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources based on transmission power
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/12Communication route or path selection, e.g. power-based or shortest path routing based on transmission quality or channel quality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/20Communication route or path selection, e.g. power-based or shortest path routing based on geographic position or location
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a relay selection method based on cache prediction, which comprises the following steps: setting parameters of a communication environment, namely a buffer auxiliary relay forwarding system; constructing an LSTM-DQN network, and determining a state space, an action space and a reward function; the intelligent agent selects action in the action space according to the initial state in the state space, namely, makes a decision on the selection of the relay node in the communication environment and the receiving or transmitting of the relay node, so as to obtain the next state, and continuously repeats the process, so that the maximum rewarding value, namely, the maximum link capacity is finally obtained. The invention establishes the application scene that the buffer area requirement of the terminal user is limited and the available buffer area for cooperative communication is changed, when the buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, and realize the selection of receiving and transmitting data packets by the relay node.

Description

Relay selection method based on cache region prediction
Technical Field
The invention relates to the technical field of cooperative communication, in particular to a relay selection method based on cache prediction.
Background
The traditional cellular network is used for dividing the cell communication, and in addition, large and small fading can occur in the communication process, so that the signals of users at the cell edge are poor, interference can be generated when the communication is performed between adjacent cells, the signals of the users are poor, and the energy consumption is greatly increased for the base station. For the problems, the relay technology can effectively relieve the problems, the relay technology means that one or more relay nodes are arranged between an initial node and a destination node, the relay nodes can receive signals and transmit the signals after some processing, and the transmission distance of the signals is shortened, so that various fading problems and path loss in the communication process are effectively relieved, the communication quality is ensured, the communication range of the signals is enlarged, the overall performance of a wireless network is improved, the throughput of the network is increased, and the energy consumption of the system is reduced.
The cooperative communication improves the throughput of the wireless network and enlarges the communicable range of signals. However, in the half duplex operation mode of the conventional cooperative network, the relay node cannot obtain the optimal receiving and transmitting channels at the same time, and on the premise that the quality of the final signal is not guaranteed. At this time, the relay with the buffer is proposed, and the above problems can be effectively solved. Compared with the traditional relay scheme, the relay assistance communication scheme with the buffer zone has remarkable performances in the aspects of improving the system throughput, reducing the outage probability of the system, reducing the signal to noise ratio and the like.
Mobile terminals refer to computer devices that can be used in mobile applications, in the field of communications, in most cases to intelligent devices. But the terminal has limited buffering when acting as a relay, and its users also have own buffering requirements. Most of the cooperative communication relay selection based on the buffer areas only considers the relay full-deliberate cooperative forwarding, namely, relay all the buffer areas to assist the communication. The relay user's own buffering requirements are not considered. The relay divides the fixed buffer zone to assist the forwarding, namely, the buffer zone used by the relay user is fixed, when the requirement of the relay user is larger and the forwarding task of the relay is smaller, the buffer zone used for forwarding is free when the requirement of the relay user is not satisfied, the experience of the relay user can become poor, and the resource of the buffer zone is wasted, so that the requirement of the user is met firstly, and then the use efficiency of the buffer zone is improved, which is a key problem to be solved in the relay cooperative communication.
Disclosure of Invention
In order to solve the defect of fixed division of a limited buffer of a relay, the invention aims to provide a relay selection method based on buffer prediction, which comprehensively considers the packet loss rate and the buffer requirement of a terminal user, and can improve the use efficiency of the buffer on the premise of meeting the requirement of the user in a wireless network.
In order to achieve the above purpose, the present invention adopts the following technical scheme: a relay selection method based on cache prediction, the method comprising the sequential steps of:
(1) Parameter setting of a communication environment, namely a buffer auxiliary relay forwarding system is carried out: determining the number of relay nodes, the position coordinates of the source node and the position coordinates of the destination node, and determining the size of a total buffer area, the channel coefficient, the transmitting power, the noise power and the target data rate;
(2) Constructing an LSTM-DQN network, and determining a state space, an action space and a reward function;
(3) The intelligent agent selects action in the action space according to the initial state in the state space, namely, makes a decision on the selection of the relay node in the communication environment and the receiving or transmitting of the relay node, so as to obtain the next state, and continuously repeats the process, so that the maximum rewarding value, namely, the maximum link capacity is finally obtained.
The step (1) specifically refers to: the buffer auxiliary relay forwarding system comprises a source node S, a destination node D and a relay node R k The method comprises the steps that K is more than or equal to 1 and less than or equal to K, wherein K is the number of relay nodes, the source nodes and the destination nodes are located in an area of 100m multiplied by 100m, the relay nodes are composed of end users, all the nodes are assumed to have one antenna respectively and work in a half-duplex mode, no direct link exists between the source nodes and the destination nodes, and communication needs to be completed through relay forwarding; assuming that the time is divided into time slots with equal time length, in each time slot, a source node S sends a data packet with fixed power P, the buffer size of each relay node is limited, the total buffer size is L+1, the buffer size comprises the buffer requirement of a relay user and the buffer size for assisting in forwarding, L is the buffer size for assisting in forwarding, and the buffer requirement of the relay user occupies at least one data packet, so that the buffer size for assisting in forwarding is at most L in each time slot;
assuming that the buffer requirement of each user is Lu, the buffer size used for assisting forwarding is L+1-Lu; by L k Representing a relay node R k The number of data packets stored in the buffer of (1) is 0.ltoreq.L k L, in each time slot, for different L k Value, relay node R k The number of available links is also different:
(1a)L k =0: no data packet is sent, and only an active node-relay node link, namely an S-R link, is available;
(1b)0<L k <L+1-Lu: both the source node-relay node link, i.e., the S-R link, and the relay node-destination node link, i.e., the R-D link, can be used;
(1c)L k =l+1-Lu: only the relay node-destination node link, i.e., the R-D link, is available, and no buffer is used to store new packets;
firstly, judging according to the past relay channel state and the historical buffer requirement of a terminal relay user, and selecting the relay to send a data packet if the buffer area of the relay can store the data packet;when the kth relay node receives the data packet sent by the source node S, the corresponding buffer area occupies one data packet, and when the kth relay node successfully sends the data packet to the destination node D, the corresponding buffer area reduces one data packet; the relay node can only send the data packet to the destination node D after successfully receiving the data packet; assuming that the source node S always has the task of transmitting a data packet to the destination node D, the channel coefficient obeys Rayleigh distribution, the channel coefficient is kept unchanged in one time slot and is independently changed in different time slots, and assuming that the signal finally received by the destination node D is subjected to zero mean value and delta variance 2 Is added to the white gaussian noise contribution;
at a certain time slot, when a link from a source node S to a relay node R is selected, the relay R corresponding to the source node S k Transmitting a single data packet and storing in a buffer, at R k Where the received signalThe method comprises the following steps:
wherein x is S Is a data signal from S and,is the variance delta 2 Is added, P is the transmit power,is the channel coefficient from source node to relay node, < +.>Is the distance from the source node to the relay node, and α is the path loss index; if a relay to the destination link is selected, a data packet is transmitted from the relay buffer to the destination and the received signal is given at the destination>The method comprises the following steps:
wherein,,is from R k N of the data signal of (a) D Representing destination node D prescription difference delta 2 Is added to the white gaussian noise of the (c),is the channel coefficient of the relay node to the destination node, < >>Is the distance from the relay node to the destination node; link capacity C between node m and node n m,n The method comprises the following steps:
in the formula, h m,n For the channel coefficients node m to node n, d m,n Delta is the distance from node m to node n 2 Is additive white Gaussian noise power;
when C m,n And if eta is less than or equal to eta, the corresponding link is interrupted, wherein eta is the target data rate.
The step (2) specifically refers to: adding an LSTM network into a DQN of a deep reinforcement learning network to form an LSTM-DQN network, and inputting data of L continuous time steps into the LSTM network, wherein the network consists of a plurality of LSTM units, and the LSTM comprises three gates, namely an input gate, a forget gate and an output gate;
the state space, action space and prize values of the LSTM-DQN network are respectively:
state space: at time t, the observed state isWherein R is t-1 Indicating the use of the user buffer at time t-1 +.>Is the channel coefficient from source node to relay node, < +.>Is the channel coefficient from the relay node to the destination node, and the state space is defined as s= [ o ] t+l-N ,...,o t ]Where N represents the number of past observed states to be captured;
action space: buffer auxiliary relay forwarding system state s based on current limited and changed t The decision needs to be made on the selection of a relay and the reception or transmission of the relay, the environment selects a network for buffering the auxiliary relay, the action is to select one link for data transmission, which is equivalent to determining m k,j ,j∈{0,1},
Wherein k represents the number of relay nodes, 0 represents a relay received data packet, and 1 represents a relay transmitted data packet; if one relay network has k relay nodes, 2k transmission links are provided, and one link is selected for transmission in one time slot or no link is selected, so the total action number is 2k+1;
bonus function: rewards are related to an optimization objective function, and throughput is taken as a rewards function.
The step (3) specifically comprises the following steps:
(3a) In the deep reinforcement learning network DQN, the learning and decision maker is called an agent, the part interacting with the agent is called an environment, and the environment state is s at time slot t t According to the current state, the agent decides the next action: i.e. which link is selected or not selected for data transmission, and determining the state s by using epsilon-greedy strategy t Wherein ε (0, 1) is a greedy coefficient, n is the number of training iterations, ε is initially set to 1 to obtain good exploration and gradually decreases with the number of iterations;
(3b) Once the agent selects action a t I.e. selecting a relay node and determining whether the relay node receives or transmits, thereby obtaining a prize value and a next state, if a t The S-R or R-D link selection is caused, the corresponding buffer length is increased by 1 or reduced by 1 respectively, otherwise, the buffer length is kept unchanged; on the other hand, the channel state is changed independently from one time slot to another time slot, and then the state is converted to s according to the new buffer length and the channel state t+1
(3c) The current state, the action performed, the prize value obtained after the action is performed, and the next state are combined into a tuple, i.e.(s) t ,a t ,r t ,s t+1 ) Stored in an experience pool;
(3d) Returning to step (3 a), using state s t+1 The process is repeated and another set of tuples is generated until the state value reaches the end state and the prize value obtained reaches a maximum.
According to the technical scheme, the beneficial effects of the invention are as follows: firstly, when the self buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, so that the packet loss rate can be reduced, when the self buffer area requirement of the relay user is larger, the auxiliary relay forwarding buffer area which can be divided by the relay is quite limited, and the reinforcement learning can comprehensively consider the channel state and the relay history buffer area requirement to select a proper link for data packet transmission; secondly, compared with the existing relay selection method based on the fixed buffer zone, the invention adds the LSTM network into the DQN of the deep reinforcement learning network, so that reinforcement learning is more matched with the scene of the change of the size of the available buffer zone used by the terminal user for cooperative communication, and the channel state of the source node-relay node and the channel state of the relay node-destination node are used as states according to the requirement of the historical user on the buffer zone; thirdly, an application scene that the buffer area requirement of the terminal user is limited and changed to cause the available buffer area for cooperative communication is established, when the buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, and the selection of receiving and transmitting data packets by the relay node is realized.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a buffer-assisted relay forwarding system according to the present invention;
FIG. 3 is a schematic diagram of an LSTM network;
FIG. 4 is a schematic diagram of LSTM cell structure;
FIG. 5 is a process flow diagram of an LSTM-DQN network;
fig. 6 is a block diagram of a primary network and a destination network in an LSTM-DQN network.
Detailed Description
As shown in fig. 1, a relay selection method based on cache prediction includes the following sequential steps:
(1) Parameter setting of a communication environment, namely a buffer auxiliary relay forwarding system is carried out: determining the number of relay nodes, the position coordinates of the source node and the position coordinates of the destination node, and determining the size of a total buffer area, the channel coefficient, the transmitting power, the noise power and the target data rate;
(2) Constructing an LSTM-DQN network, and determining a state space, an action space and a reward function;
(3) The intelligent agent selects action in the action space according to the initial state in the state space, namely, makes a decision on the selection of the relay node in the communication environment and the receiving or transmitting of the relay node, so as to obtain the next state, and continuously repeats the process, so that the maximum rewarding value, namely, the maximum link capacity is finally obtained.
The step (1) specifically refers to: the buffer auxiliary relay forwarding system comprises a source node S, a destination node D and a relay node R k The method comprises the steps of forming the relay node, wherein K is more than or equal to 1 and less than or equal to K, K is the number of relay nodes, the source node and the destination node are positioned in an area of 100m multiplied by 100m, the relay nodes are composed of end users, all the nodes are assumed to have one antenna respectively and work in a half duplex mode, no direct link exists between the source node and the destination node, and the relay node is needed to pass throughForwarding to complete communication; assuming that the time is divided into time slots with equal time length, in each time slot, a source node S sends a data packet with fixed power P, the buffer size of each relay node is limited, the total buffer size is L+1, the buffer size comprises the buffer requirement of a relay user and the buffer size for assisting in forwarding, L is the buffer size for assisting in forwarding, and the buffer requirement of the relay user occupies at least one data packet, so that the buffer size for assisting in forwarding is at most L in each time slot;
assuming that the buffer requirement of each user is Lu, the buffer size used for assisting forwarding is L+1-Lu; by L k Representing a relay node R k The number of data packets stored in the buffer of (1) is 0.ltoreq.L k L, in each time slot, for different L k Value, relay node R k The number of available links is also different:
(1a)L k =0: no data packet is sent, and only an active node-relay node link, namely an S-R link, is available;
(1b)0<L k <L+1-Lu: both the source node-relay node link, i.e., the S-R link, and the relay node-destination node link, i.e., the R-D link, can be used;
(1c)L k =l+1-Lu: only the relay node-destination node link, i.e., the R-D link, is available, and no buffer is used to store new packets;
firstly, judging according to the past relay channel state and the historical buffer requirement of a terminal relay user, and selecting the relay to send a data packet if the buffer area of the relay can store the data packet; when the kth relay node receives the data packet sent by the source node S, the corresponding buffer area occupies one data packet, and when the kth relay node successfully sends the data packet to the destination node D, the corresponding buffer area reduces one data packet; the relay node can only send the data packet to the destination node D after successfully receiving the data packet; assuming that the source node S always has the task of transmitting a data packet to the destination node D, the channel coefficients follow the rayleigh distribution, the channel coefficients remain unchanged in one time slot, and change independently in different time slots, assuming the destinationThe signal finally received by the node D has the mean value of zero and the variance of delta 2 Is added to the white gaussian noise contribution;
at a certain time slot, when a link from a source node S to a relay node R is selected, the relay R corresponding to the source node S k Transmitting a single data packet and storing in a buffer, at R k Where the received signalThe method comprises the following steps:
wherein x is S Is a data signal from S and,is the variance delta 2 Is added, P is the transmit power,is the channel coefficient from source node to relay node, < +.>Is the distance from the source node to the relay node, and α is the path loss index; if a relay to the destination link is selected, a data packet is transmitted from the relay buffer to the destination and the received signal is given at the destination>The method comprises the following steps:
wherein,,is from R k N of the data signal of (a) D Representing the destination node DVariance delta 2 Is added to the white gaussian noise of the (c),is the channel coefficient of the relay node to the destination node, < >>Is the distance from the relay node to the destination node; link capacity C between node m and node n m,n The method comprises the following steps:
in the formula, h m,n For the channel coefficients from node m to node n, a m,n Delta is the distance from node m to node n 2 Is additive white Gaussian noise power;
when C m,n And if eta is less than or equal to eta, the corresponding link is interrupted, wherein eta is the target data rate.
The step (2) specifically refers to: adding an LSTM network into a DQN of a deep reinforcement learning network to form an LSTM-DQN network, and inputting data of L continuous time steps into the LSTM network, wherein the network consists of a plurality of LSTM units, and the LSTM comprises three gates, namely an input gate, a forget gate and an output gate;
the state space, action space and prize values of the LSTM-DQN network are respectively:
state space: at time t, the observed state isWherein R is t-1 Indicating the use of the user buffer at time t-1 +.>Is the channel coefficient from source node to relay node, < +.>Is the channel coefficient from the relay node to the destination nodeThe state space is defined as s= [ o ] t+l-N ,...,o t ]Where N represents the number of past observed states to be captured;
action space: buffer auxiliary relay forwarding system state s based on current limited and changed t The decision needs to be made on the selection of a relay and the reception or transmission of the relay, the environment selects a network for buffering the auxiliary relay, the action is to select one link for data transmission, which is equivalent to determining m k,j ,j∈{0,1},
Wherein k represents the number of relay nodes, 0 represents a relay received data packet, and 1 represents a relay transmitted data packet; if one relay network has k relay nodes, 2k transmission links are provided, and one link is selected for transmission in one time slot or no link is selected, so the total action number is 2k+1;
bonus function: rewards are related to an optimization objective function, and throughput is taken as a rewards function.
The step (3) specifically comprises the following steps:
(3a) In the deep reinforcement learning network DQN, the learning and decision maker is called an agent, the part interacting with the agent is called an environment, and the environment state is s at time slot t t According to the current state, the agent decides the next action: i.e. which link is selected or not selected for data transmission, and determining the state s by using epsilon-greedy strategy t Wherein ε (0, 1) is a greedy coefficient, n is the number of training iterations, ε is initially set to 1 to obtain good exploration and gradually decreases with the number of iterations;
(3b) Once the agent selects action a t I.e. selecting a relay node and determining whether the relay node receives or transmits, thereby obtaining a prize value and a next state, if a t The S-R or R-D link selection is caused, the corresponding buffer length is increased by 1 or reduced by 1 respectively, otherwise, the buffer length is kept unchanged; on the other hand, the channel state is changed independently from one time slot to another time slot, and then the state is converted to s according to the new buffer length and the channel state t+1
(3c) The current state, the action performed, the prize value obtained after the action is performed, and the next state are combined into a tuple, i.e.(s) t ,a t ,r t ,s t+1 ) Stored in an experience pool;
(3d) Returning to step (3 a), using state s t+1 The process is repeated and another set of tuples is generated until the state value reaches the end state and the prize value obtained reaches a maximum.
The key idea of the LSTM-DQN framework provided by the invention is to ensure that the relay user performs effective relay forwarding under the condition of ensuring partial state observation caused by self cache requirements and the like. To achieve this, adding LSTM networks to DQNs not only can preserve internal states, but also aggregate state observations over time, which gives relay-assisted communication networks the ability to infer future states by processing histories. In particular, data of L consecutive time steps is input to an LSTM network, which is made up of a plurality of LSTM cells. Generally, LSTM includes three gates, an input gate, a forget gate, and an output gate, respectively. The key of LSTM being able to stand out from RNN is that the hidden state of neuron (cell state) is the line penetrated from cell in the above figure, which can be simply understood as "memory" of recurrent neural network for input data, using c t Representing the "memory" of the neuron after time t, this vector covers the "summary" of the neural network for all input information before time t+1. The task of the forgetting gate is to decide to reserve and forget a long-term memory c t-1 Which part of (a) is to be used. The function of the memory gate is to determine what new information is stored in the cell state. Finally, an output value is determined based on the cell state.
As shown in fig. 2, the proposed buffer-assisted relay forwarding system consists of one source node S, one destination node D and k relay nodes R k The composition is that K is more than or equal to 1 and less than or equal to K. The relay node considered here is composed of end users, the buffer of the terminal is limited, and the self buffer requirement exists.
Fig. 3 shows an expanded LSTM network, in particular, L consecutive time steps of data are input to the LSTM network, which is made up of a plurality of LSTM cells, as shown in fig. 4.
Fig. 5 and 6 show LSTM-DQN frameworks of relay selection environments for limited and varying buffer assisted forwarding. The key idea of the proposed LSTM-DQN framework is to ensure that the relay user performs effective relay forwarding under the condition of ensuring partial state observation caused by self-caching requirements and the like.
In summary, when the buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, so that the packet loss rate can be reduced, when the buffer area requirement of the relay user is larger, the relay can divide more auxiliary relay forwarding buffer areas, and the reinforcement learning can comprehensively consider the channel state and the buffer area requirement of the relay history to select a proper link for data packet transmission; according to the invention, an LSTM network is added into a deep reinforcement learning network DQN, so that reinforcement learning is more suitable for the scene of the change of the size of an available buffer zone for cooperative communication of a terminal user, and the buffer zone, the channel state of a source node-a relay node and the channel state of a relay node-a destination node are used as states according to the historical user demand; the method establishes an application scene that the buffer area requirement of the terminal user is limited and changed, and the available buffer area for cooperative communication is limited, when the buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, and the selection of receiving and transmitting data packets by the relay node is realized.

Claims (4)

1. A relay selection method based on cache prediction is characterized in that: the method comprises the following steps in sequence:
(1) Parameter setting of a communication environment, namely a buffer auxiliary relay forwarding system is carried out: determining the number of relay nodes, the position coordinates of the source node and the position coordinates of the destination node, and determining the size of a total buffer area, the channel coefficient, the transmitting power, the noise power and the target data rate;
(2) Constructing an LSTM-DQN network, and determining a state space, an action space and a reward function;
(3) The intelligent agent selects action in the action space according to the initial state in the state space, namely, makes a decision on the selection of the relay node in the communication environment and the receiving or transmitting of the relay node, so as to obtain the next state, and continuously repeats the process, so that the maximum rewarding value, namely, the maximum link capacity is finally obtained.
2. The relay selection method based on cache prediction according to claim 1, wherein: the step (1) specifically refers to: the buffer auxiliary relay forwarding system comprises a source node S, a destination node D and a relay node R k The method comprises the steps that K is more than or equal to 1 and less than or equal to K, wherein K is the number of relay nodes, the source nodes and the destination nodes are located in an area of 100m multiplied by 100m, the relay nodes are composed of end users, all the nodes are assumed to have one antenna respectively and work in a half-duplex mode, no direct link exists between the source nodes and the destination nodes, and communication needs to be completed through relay forwarding; assuming that the time is divided into time slots with equal time length, in each time slot, a source node S sends a data packet with fixed power P, the buffer size of each relay node is limited, the total buffer size is L+1, the buffer size comprises the buffer requirement of a relay user and the buffer size for assisting in forwarding, L is the buffer size for assisting in forwarding, and the buffer requirement of the relay user occupies at least one data packet, so that the buffer size for assisting in forwarding is at most L in each time slot;
assuming that the buffer requirement of each user is Lu, the buffer size used for assisting forwarding is L+1-Lu; by L k Representing a relay node R k The number of data packets stored in the buffer of (1) is 0.ltoreq.L k L, in each time slot, for different L k Value, relay node R k The number of available links is also different:
(1a)L k =0: no data packet is sent, and only an active node-relay node link, namely an S-R link, is available;
(1b)0<L k <L+1-Lu: source node-relay node links, i.e. S-R links and relay node-destinationThe node links of (a) and (b) are all available;
(1c)L k =l+1-Lu: only the relay node-destination node link, i.e., the R-D link, is available, and no buffer is used to store new packets;
firstly, judging according to the past relay channel state and the historical buffer requirement of a terminal relay user, and selecting the relay to send a data packet if the buffer area of the relay can store the data packet; when the kth relay node receives the data packet sent by the source node S, the corresponding buffer area occupies one data packet, and when the kth relay node successfully sends the data packet to the destination node D, the corresponding buffer area reduces one data packet; the relay node can only send the data packet to the destination node D after successfully receiving the data packet; assuming that the source node S always has the task of transmitting a data packet to the destination node D, the channel coefficient obeys Rayleigh distribution, the channel coefficient is kept unchanged in one time slot and is independently changed in different time slots, and assuming that the signal finally received by the destination node D is subjected to zero mean value and delta variance 2 Is added to the white gaussian noise contribution;
at a certain time slot, when a link from a source node S to a relay node R is selected, the relay R corresponding to the source node S k Transmitting a single data packet and storing in a buffer, at R k Where the received signal y S,Rk The method comprises the following steps:
wherein x is S Is a data signal from S and,is the variance delta 2 Is the transmit power,/-for (x)>Is the channel coefficient from source node to relay node, < +.>Is the distance from the source node to the relay node, and α is the path loss index; if a relay to the destination link is selected, a data packet is transmitted from the relay buffer to the destination and the received signal is given at the destination>The method comprises the following steps:
wherein,,is from R k N of the data signal of (a) D Representing destination node D prescription difference delta 2 Additive white gaussian noise of +.>Is the channel coefficient of the relay node to the destination node, < >>Is the distance of the relay node to the destination node; link capacity C between node m and node n m,n The method comprises the following steps: />
In the formula, h m,n For the channel coefficients node m to node n, d m,n Delta is the distance from node m to node n 2 Is additive white Gaussian noise power;
when C m,n And if eta is less than or equal to eta, the corresponding link is interrupted, wherein eta is the target data rate.
3. The relay selection method based on cache prediction according to claim 1, wherein: the step (2) specifically refers to: adding an LSTM network into a DQN of a deep reinforcement learning network to form an LSTM-DQN network, and inputting data of L continuous time steps into the LSTM network, wherein the network consists of a plurality of LSTM units, and the LSTM comprises three gates, namely an input gate, a forget gate and an output gate;
the state space, action space and prize values of the LSTM-DQN network are respectively:
state space: at time t, the observed state isWherein R is t-1 Indicating the use of the user buffer at time t-1 +.>Is the channel coefficient from source node to relay node, < +.>Is the channel coefficient from the relay node to the destination node, and the state space is defined as s= [ o ] t+l-N ,...,o t ]Where N represents the number of past observed states to be captured;
action space: buffer auxiliary relay forwarding system state s based on current limited and changed t The decision needs to be made on the selection of a relay and the reception or transmission of the relay, the environment selects a network for buffering the auxiliary relay, the action is to select one link for data transmission, which is equivalent to determining m k,j J epsilon {0,1}, wherein k represents the number of relay nodes, 0 represents a relay received data packet, and 1 represents a relay transmitted data packet; if one relay network has k relay nodes, 2k transmission links are provided, and one link is selected for transmission in one time slot or no link is selected, so the total action number is 2k+1;
bonus function: rewards are related to an optimization objective function, and throughput is taken as a rewards function.
4. The relay selection method based on cache prediction according to claim 1, wherein: the step (3) specifically comprises the following steps:
(3a) Strengthening at depthIn the learning network DQN, the learning and decision maker is called an agent, the part interacting with the agent is called an environment, and the environment state is s at time slot t t According to the current state, the agent decides the next action: i.e. which link is selected or not selected for data transmission, and determining the state s by using epsilon-greedy strategy t Wherein ε (0, 1) is a greedy coefficient, n is the number of training iterations, ε is initially set to 1 to obtain good exploration and gradually decreases with the number of iterations;
(3b) Once the agent selects action a t I.e. selecting a relay node and determining whether the relay node receives or transmits, thereby obtaining a prize value and a next state, if a t The S-R or R-D link selection is caused, the corresponding buffer length is increased by 1 or reduced by 1 respectively, otherwise, the buffer length is kept unchanged; on the other hand, the channel state is changed independently from one time slot to another time slot, and then the state is converted to s according to the new buffer length and the channel state t+1
(3c) The current state, the action performed, the prize value obtained after the action is performed, and the next state are combined into a tuple, i.e.(s) t ,a t ,r t ,s t+1 ) Stored in an experience pool;
(3d) Returning to step (3 a), using state s t+1 The process is repeated and another set of tuples is generated until the state value reaches the end state and the prize value obtained reaches a maximum.
CN202310505985.2A 2023-05-08 2023-05-08 Relay selection method based on cache region prediction Pending CN116506918A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310505985.2A CN116506918A (en) 2023-05-08 2023-05-08 Relay selection method based on cache region prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310505985.2A CN116506918A (en) 2023-05-08 2023-05-08 Relay selection method based on cache region prediction

Publications (1)

Publication Number Publication Date
CN116506918A true CN116506918A (en) 2023-07-28

Family

ID=87321389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310505985.2A Pending CN116506918A (en) 2023-05-08 2023-05-08 Relay selection method based on cache region prediction

Country Status (1)

Country Link
CN (1) CN116506918A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117914378A (en) * 2023-12-12 2024-04-19 深圳市物联微电子有限公司 5G repeater signal processing method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117914378A (en) * 2023-12-12 2024-04-19 深圳市物联微电子有限公司 5G repeater signal processing method and system
CN117914378B (en) * 2023-12-12 2024-06-18 深圳市物联微电子有限公司 5G repeater signal processing method and system

Similar Documents

Publication Publication Date Title
CN109862610B (en) D2D user resource allocation method based on deep reinforcement learning DDPG algorithm
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
CN110809306B (en) Terminal access selection method based on deep reinforcement learning
Li et al. Multi-agent deep reinforcement learning based spectrum allocation for D2D underlay communications
CN109639760B (en) It is a kind of based on deeply study D2D network in cache policy method
CN114867030B (en) Dual-time scale intelligent wireless access network slicing method
CN114205791A (en) Depth Q learning-based social perception D2D collaborative caching method
CN113453358B (en) Joint resource allocation method of wireless energy-carrying D2D network
CN116506918A (en) Relay selection method based on cache region prediction
CN110267274A (en) A kind of frequency spectrum sharing method according to credit worthiness selection sensing user social between user
Dai et al. Multi-objective intelligent handover in satellite-terrestrial integrated networks
CN115412936A (en) IRS (intelligent resource management) assisted D2D (device-to-device) system resource allocation method based on multi-agent DQN (differential Quadrature reference network)
CN115065678A (en) Multi-intelligent-device task unloading decision method based on deep reinforcement learning
CN110932969B (en) Advanced metering system AMI network anti-interference attack routing algorithm for smart grid
CN106686567A (en) Directed self-organizing network adjacent node discovery method based on probability optimum
CN117042050A (en) Multi-user intelligent data unloading method based on distributed hybrid heterogeneous decision
CN111741520B (en) Cognitive underwater acoustic communication system power distribution method based on particle swarm
CN113784372B (en) Terminal multi-service model-oriented joint optimization method
CN113453197B (en) User pairing method combining mobile prediction and dynamic power
CN113595609A (en) Cellular mobile communication system cooperative signal sending method based on reinforcement learning
Chen et al. Adaptive relay strategy selection based on Q-learning for power line and wireless dual-media communication with hybrid duplex
Ma et al. Deep Reinforcement Learning-based Edge Caching and Multi-link Cooperative Communication in Internet-of-Vehicles
CN116614826B (en) Coverage and capacity optimization method for simultaneous transmission and reflection surface network
Kaneko et al. A greedy stable time via LEACH-based 2-hop trees in wireless sensor networks
CN113852972B (en) Beam sharing-based high-speed mobile terminal beam scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination