CN116506918A

CN116506918A - Relay selection method based on cache region prediction

Info

Publication number: CN116506918A
Application number: CN202310505985.2A
Authority: CN
Inventors: 智慧; 费洁; 王雅宁; 段苗苗; 黄彧
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2023-07-28

Abstract

The invention relates to a relay selection method based on cache prediction, which comprises the following steps: setting parameters of a communication environment, namely a buffer auxiliary relay forwarding system; constructing an LSTM-DQN network, and determining a state space, an action space and a reward function; the intelligent agent selects action in the action space according to the initial state in the state space, namely, makes a decision on the selection of the relay node in the communication environment and the receiving or transmitting of the relay node, so as to obtain the next state, and continuously repeats the process, so that the maximum rewarding value, namely, the maximum link capacity is finally obtained. The invention establishes the application scene that the buffer area requirement of the terminal user is limited and the available buffer area for cooperative communication is changed, when the buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, and realize the selection of receiving and transmitting data packets by the relay node.

Description

Relay selection method based on cache region prediction

Technical Field

The invention relates to the technical field of cooperative communication, in particular to a relay selection method based on cache prediction.

Background

The traditional cellular network is used for dividing the cell communication, and in addition, large and small fading can occur in the communication process, so that the signals of users at the cell edge are poor, interference can be generated when the communication is performed between adjacent cells, the signals of the users are poor, and the energy consumption is greatly increased for the base station. For the problems, the relay technology can effectively relieve the problems, the relay technology means that one or more relay nodes are arranged between an initial node and a destination node, the relay nodes can receive signals and transmit the signals after some processing, and the transmission distance of the signals is shortened, so that various fading problems and path loss in the communication process are effectively relieved, the communication quality is ensured, the communication range of the signals is enlarged, the overall performance of a wireless network is improved, the throughput of the network is increased, and the energy consumption of the system is reduced.

The cooperative communication improves the throughput of the wireless network and enlarges the communicable range of signals. However, in the half duplex operation mode of the conventional cooperative network, the relay node cannot obtain the optimal receiving and transmitting channels at the same time, and on the premise that the quality of the final signal is not guaranteed. At this time, the relay with the buffer is proposed, and the above problems can be effectively solved. Compared with the traditional relay scheme, the relay assistance communication scheme with the buffer zone has remarkable performances in the aspects of improving the system throughput, reducing the outage probability of the system, reducing the signal to noise ratio and the like.

Mobile terminals refer to computer devices that can be used in mobile applications, in the field of communications, in most cases to intelligent devices. But the terminal has limited buffering when acting as a relay, and its users also have own buffering requirements. Most of the cooperative communication relay selection based on the buffer areas only considers the relay full-deliberate cooperative forwarding, namely, relay all the buffer areas to assist the communication. The relay user's own buffering requirements are not considered. The relay divides the fixed buffer zone to assist the forwarding, namely, the buffer zone used by the relay user is fixed, when the requirement of the relay user is larger and the forwarding task of the relay is smaller, the buffer zone used for forwarding is free when the requirement of the relay user is not satisfied, the experience of the relay user can become poor, and the resource of the buffer zone is wasted, so that the requirement of the user is met firstly, and then the use efficiency of the buffer zone is improved, which is a key problem to be solved in the relay cooperative communication.

Disclosure of Invention

In order to solve the defect of fixed division of a limited buffer of a relay, the invention aims to provide a relay selection method based on buffer prediction, which comprehensively considers the packet loss rate and the buffer requirement of a terminal user, and can improve the use efficiency of the buffer on the premise of meeting the requirement of the user in a wireless network.

In order to achieve the above purpose, the present invention adopts the following technical scheme: a relay selection method based on cache prediction, the method comprising the sequential steps of:

(1) Parameter setting of a communication environment, namely a buffer auxiliary relay forwarding system is carried out: determining the number of relay nodes, the position coordinates of the source node and the position coordinates of the destination node, and determining the size of a total buffer area, the channel coefficient, the transmitting power, the noise power and the target data rate;

(2) Constructing an LSTM-DQN network, and determining a state space, an action space and a reward function;

(3) The intelligent agent selects action in the action space according to the initial state in the state space, namely, makes a decision on the selection of the relay node in the communication environment and the receiving or transmitting of the relay node, so as to obtain the next state, and continuously repeats the process, so that the maximum rewarding value, namely, the maximum link capacity is finally obtained.

The step (1) specifically refers to: the buffer auxiliary relay forwarding system comprises a source node S, a destination node D and a relay node R _k The method comprises the steps that K is more than or equal to 1 and less than or equal to K, wherein K is the number of relay nodes, the source nodes and the destination nodes are located in an area of 100m multiplied by 100m, the relay nodes are composed of end users, all the nodes are assumed to have one antenna respectively and work in a half-duplex mode, no direct link exists between the source nodes and the destination nodes, and communication needs to be completed through relay forwarding; assuming that the time is divided into time slots with equal time length, in each time slot, a source node S sends a data packet with fixed power P, the buffer size of each relay node is limited, the total buffer size is L+1, the buffer size comprises the buffer requirement of a relay user and the buffer size for assisting in forwarding, L is the buffer size for assisting in forwarding, and the buffer requirement of the relay user occupies at least one data packet, so that the buffer size for assisting in forwarding is at most L in each time slot;

assuming that the buffer requirement of each user is Lu, the buffer size used for assisting forwarding is L+1-Lu; by L _k Representing a relay node R _k The number of data packets stored in the buffer of (1) is 0.ltoreq.L _k L, in each time slot, for different L _k Value, relay node R _k The number of available links is also different:

(1a)L _k =0: no data packet is sent, and only an active node-relay node link, namely an S-R link, is available;

(1b)0<L _k <L+1-Lu: both the source node-relay node link, i.e., the S-R link, and the relay node-destination node link, i.e., the R-D link, can be used;

(1c)L _k =l+1-Lu: only the relay node-destination node link, i.e., the R-D link, is available, and no buffer is used to store new packets;

firstly, judging according to the past relay channel state and the historical buffer requirement of a terminal relay user, and selecting the relay to send a data packet if the buffer area of the relay can store the data packet;when the kth relay node receives the data packet sent by the source node S, the corresponding buffer area occupies one data packet, and when the kth relay node successfully sends the data packet to the destination node D, the corresponding buffer area reduces one data packet; the relay node can only send the data packet to the destination node D after successfully receiving the data packet; assuming that the source node S always has the task of transmitting a data packet to the destination node D, the channel coefficient obeys Rayleigh distribution, the channel coefficient is kept unchanged in one time slot and is independently changed in different time slots, and assuming that the signal finally received by the destination node D is subjected to zero mean value and delta variance ² Is added to the white gaussian noise contribution;

at a certain time slot, when a link from a source node S to a relay node R is selected, the relay R corresponding to the source node S _k Transmitting a single data packet and storing in a buffer, at R _k Where the received signalThe method comprises the following steps:

wherein x is _S Is a data signal from S and,is the variance delta ² Is added, P is the transmit power,is the channel coefficient from source node to relay node, < +.>Is the distance from the source node to the relay node, and α is the path loss index; if a relay to the destination link is selected, a data packet is transmitted from the relay buffer to the destination and the received signal is given at the destination>The method comprises the following steps:

wherein,,is from R _k N of the data signal of (a) _D Representing destination node D prescription difference delta ² Is added to the white gaussian noise of the (c),is the channel coefficient of the relay node to the destination node, < >>Is the distance from the relay node to the destination node; link capacity C between node m and node n _m，n The method comprises the following steps:

in the formula, h _m，n For the channel coefficients node m to node n, d _m，n Delta is the distance from node m to node n ² Is additive white Gaussian noise power;

when C _m，n And if eta is less than or equal to eta, the corresponding link is interrupted, wherein eta is the target data rate.

The step (2) specifically refers to: adding an LSTM network into a DQN of a deep reinforcement learning network to form an LSTM-DQN network, and inputting data of L continuous time steps into the LSTM network, wherein the network consists of a plurality of LSTM units, and the LSTM comprises three gates, namely an input gate, a forget gate and an output gate;

the state space, action space and prize values of the LSTM-DQN network are respectively:

state space: at time t, the observed state isWherein R is _t-1 Indicating the use of the user buffer at time t-1 +.>Is the channel coefficient from source node to relay node, < +.>Is the channel coefficient from the relay node to the destination node, and the state space is defined as s= [ o ] _t+l-N ，...，o _t ]Where N represents the number of past observed states to be captured;

action space: buffer auxiliary relay forwarding system state s based on current limited and changed _t The decision needs to be made on the selection of a relay and the reception or transmission of the relay, the environment selects a network for buffering the auxiliary relay, the action is to select one link for data transmission, which is equivalent to determining m _k,j ，j∈{0，1}，

Wherein k represents the number of relay nodes, 0 represents a relay received data packet, and 1 represents a relay transmitted data packet; if one relay network has k relay nodes, 2k transmission links are provided, and one link is selected for transmission in one time slot or no link is selected, so the total action number is 2k+1;

bonus function: rewards are related to an optimization objective function, and throughput is taken as a rewards function.

The step (3) specifically comprises the following steps:

(3a) In the deep reinforcement learning network DQN, the learning and decision maker is called an agent, the part interacting with the agent is called an environment, and the environment state is s at time slot t _t According to the current state, the agent decides the next action: i.e. which link is selected or not selected for data transmission, and determining the state s by using epsilon-greedy strategy _t Wherein ε (0, 1) is a greedy coefficient, n is the number of training iterations, ε is initially set to 1 to obtain good exploration and gradually decreases with the number of iterations;

(3b) Once the agent selects action a _t I.e. selecting a relay node and determining whether the relay node receives or transmits, thereby obtaining a prize value and a next state, if a _t The S-R or R-D link selection is caused, the corresponding buffer length is increased by 1 or reduced by 1 respectively, otherwise, the buffer length is kept unchanged; on the other hand, the channel state is changed independently from one time slot to another time slot, and then the state is converted to s according to the new buffer length and the channel state _t+1 ；

(3c) The current state, the action performed, the prize value obtained after the action is performed, and the next state are combined into a tuple, i.e.(s) _t ,a _t ,r _t ,s _t+1 ) Stored in an experience pool;

(3d) Returning to step (3 a), using state s _t+1 The process is repeated and another set of tuples is generated until the state value reaches the end state and the prize value obtained reaches a maximum.

According to the technical scheme, the beneficial effects of the invention are as follows: firstly, when the self buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, so that the packet loss rate can be reduced, when the self buffer area requirement of the relay user is larger, the auxiliary relay forwarding buffer area which can be divided by the relay is quite limited, and the reinforcement learning can comprehensively consider the channel state and the relay history buffer area requirement to select a proper link for data packet transmission; secondly, compared with the existing relay selection method based on the fixed buffer zone, the invention adds the LSTM network into the DQN of the deep reinforcement learning network, so that reinforcement learning is more matched with the scene of the change of the size of the available buffer zone used by the terminal user for cooperative communication, and the channel state of the source node-relay node and the channel state of the relay node-destination node are used as states according to the requirement of the historical user on the buffer zone; thirdly, an application scene that the buffer area requirement of the terminal user is limited and changed to cause the available buffer area for cooperative communication is established, when the buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, and the selection of receiving and transmitting data packets by the relay node is realized.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a buffer-assisted relay forwarding system according to the present invention;

FIG. 3 is a schematic diagram of an LSTM network;

FIG. 4 is a schematic diagram of LSTM cell structure;

FIG. 5 is a process flow diagram of an LSTM-DQN network;

fig. 6 is a block diagram of a primary network and a destination network in an LSTM-DQN network.

Detailed Description

As shown in fig. 1, a relay selection method based on cache prediction includes the following sequential steps:

The step (1) specifically refers to: the buffer auxiliary relay forwarding system comprises a source node S, a destination node D and a relay node R _k The method comprises the steps of forming the relay node, wherein K is more than or equal to 1 and less than or equal to K, K is the number of relay nodes, the source node and the destination node are positioned in an area of 100m multiplied by 100m, the relay nodes are composed of end users, all the nodes are assumed to have one antenna respectively and work in a half duplex mode, no direct link exists between the source node and the destination node, and the relay node is needed to pass throughForwarding to complete communication; assuming that the time is divided into time slots with equal time length, in each time slot, a source node S sends a data packet with fixed power P, the buffer size of each relay node is limited, the total buffer size is L+1, the buffer size comprises the buffer requirement of a relay user and the buffer size for assisting in forwarding, L is the buffer size for assisting in forwarding, and the buffer requirement of the relay user occupies at least one data packet, so that the buffer size for assisting in forwarding is at most L in each time slot;

firstly, judging according to the past relay channel state and the historical buffer requirement of a terminal relay user, and selecting the relay to send a data packet if the buffer area of the relay can store the data packet; when the kth relay node receives the data packet sent by the source node S, the corresponding buffer area occupies one data packet, and when the kth relay node successfully sends the data packet to the destination node D, the corresponding buffer area reduces one data packet; the relay node can only send the data packet to the destination node D after successfully receiving the data packet; assuming that the source node S always has the task of transmitting a data packet to the destination node D, the channel coefficients follow the rayleigh distribution, the channel coefficients remain unchanged in one time slot, and change independently in different time slots, assuming the destinationThe signal finally received by the node D has the mean value of zero and the variance of delta ² Is added to the white gaussian noise contribution;

wherein,,is from R _k N of the data signal of (a) _D Representing the destination node DVariance delta ² Is added to the white gaussian noise of the (c),is the channel coefficient of the relay node to the destination node, < >>Is the distance from the relay node to the destination node; link capacity C between node m and node n _m，n The method comprises the following steps:

in the formula, h _m，n For the channel coefficients from node m to node n, a _m，n Delta is the distance from node m to node n ² Is additive white Gaussian noise power;

state space: at time t, the observed state isWherein R is _t-1 Indicating the use of the user buffer at time t-1 +.>Is the channel coefficient from source node to relay node, < +.>Is the channel coefficient from the relay node to the destination nodeThe state space is defined as s= [ o ] _t+l-N ，...，o _t ]Where N represents the number of past observed states to be captured;

The step (3) specifically comprises the following steps:

The key idea of the LSTM-DQN framework provided by the invention is to ensure that the relay user performs effective relay forwarding under the condition of ensuring partial state observation caused by self cache requirements and the like. To achieve this, adding LSTM networks to DQNs not only can preserve internal states, but also aggregate state observations over time, which gives relay-assisted communication networks the ability to infer future states by processing histories. In particular, data of L consecutive time steps is input to an LSTM network, which is made up of a plurality of LSTM cells. Generally, LSTM includes three gates, an input gate, a forget gate, and an output gate, respectively. The key of LSTM being able to stand out from RNN is that the hidden state of neuron (cell state) is the line penetrated from cell in the above figure, which can be simply understood as "memory" of recurrent neural network for input data, using c _t Representing the "memory" of the neuron after time t, this vector covers the "summary" of the neural network for all input information before time t+1. The task of the forgetting gate is to decide to reserve and forget a long-term memory c _t-1 Which part of (a) is to be used. The function of the memory gate is to determine what new information is stored in the cell state. Finally, an output value is determined based on the cell state.

As shown in fig. 2, the proposed buffer-assisted relay forwarding system consists of one source node S, one destination node D and k relay nodes R _k The composition is that K is more than or equal to 1 and less than or equal to K. The relay node considered here is composed of end users, the buffer of the terminal is limited, and the self buffer requirement exists.

Fig. 3 shows an expanded LSTM network, in particular, L consecutive time steps of data are input to the LSTM network, which is made up of a plurality of LSTM cells, as shown in fig. 4.

Fig. 5 and 6 show LSTM-DQN frameworks of relay selection environments for limited and varying buffer assisted forwarding. The key idea of the proposed LSTM-DQN framework is to ensure that the relay user performs effective relay forwarding under the condition of ensuring partial state observation caused by self-caching requirements and the like.

In summary, when the buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, so that the packet loss rate can be reduced, when the buffer area requirement of the relay user is larger, the relay can divide more auxiliary relay forwarding buffer areas, and the reinforcement learning can comprehensively consider the channel state and the buffer area requirement of the relay history to select a proper link for data packet transmission; according to the invention, an LSTM network is added into a deep reinforcement learning network DQN, so that reinforcement learning is more suitable for the scene of the change of the size of an available buffer zone for cooperative communication of a terminal user, and the buffer zone, the channel state of a source node-a relay node and the channel state of a relay node-a destination node are used as states according to the historical user demand; the method establishes an application scene that the buffer area requirement of the terminal user is limited and changed, and the available buffer area for cooperative communication is limited, when the buffer area requirement of the relay user is smaller, the relay can divide more buffer areas to assist relay forwarding, and the selection of receiving and transmitting data packets by the relay node is realized.

Claims

1. A relay selection method based on cache prediction is characterized in that: the method comprises the following steps in sequence:

2. The relay selection method based on cache prediction according to claim 1, wherein: the step (1) specifically refers to: the buffer auxiliary relay forwarding system comprises a source node S, a destination node D and a relay node R _k The method comprises the steps that K is more than or equal to 1 and less than or equal to K, wherein K is the number of relay nodes, the source nodes and the destination nodes are located in an area of 100m multiplied by 100m, the relay nodes are composed of end users, all the nodes are assumed to have one antenna respectively and work in a half-duplex mode, no direct link exists between the source nodes and the destination nodes, and communication needs to be completed through relay forwarding; assuming that the time is divided into time slots with equal time length, in each time slot, a source node S sends a data packet with fixed power P, the buffer size of each relay node is limited, the total buffer size is L+1, the buffer size comprises the buffer requirement of a relay user and the buffer size for assisting in forwarding, L is the buffer size for assisting in forwarding, and the buffer requirement of the relay user occupies at least one data packet, so that the buffer size for assisting in forwarding is at most L in each time slot;

(1b)0<L _k <L+1-Lu: source node-relay node links, i.e. S-R links and relay node-destinationThe node links of (a) and (b) are all available;

firstly, judging according to the past relay channel state and the historical buffer requirement of a terminal relay user, and selecting the relay to send a data packet if the buffer area of the relay can store the data packet; when the kth relay node receives the data packet sent by the source node S, the corresponding buffer area occupies one data packet, and when the kth relay node successfully sends the data packet to the destination node D, the corresponding buffer area reduces one data packet; the relay node can only send the data packet to the destination node D after successfully receiving the data packet; assuming that the source node S always has the task of transmitting a data packet to the destination node D, the channel coefficient obeys Rayleigh distribution, the channel coefficient is kept unchanged in one time slot and is independently changed in different time slots, and assuming that the signal finally received by the destination node D is subjected to zero mean value and delta variance ² Is added to the white gaussian noise contribution;

at a certain time slot, when a link from a source node S to a relay node R is selected, the relay R corresponding to the source node S _k Transmitting a single data packet and storing in a buffer, at R _k Where the received signal y _S,Rk The method comprises the following steps:

wherein x is _S Is a data signal from S and,is the variance delta ² Is the transmit power,/-for (x)>Is the channel coefficient from source node to relay node, < +.>Is the distance from the source node to the relay node, and α is the path loss index; if a relay to the destination link is selected, a data packet is transmitted from the relay buffer to the destination and the received signal is given at the destination>The method comprises the following steps:

wherein,,is from R _k N of the data signal of (a) _D Representing destination node D prescription difference delta ² Additive white gaussian noise of +.>Is the channel coefficient of the relay node to the destination node, < >>Is the distance of the relay node to the destination node; link capacity C between node m and node n _m，n The method comprises the following steps: />

In the formula, h _m,n For the channel coefficients node m to node n, d _m,n Delta is the distance from node m to node n ² Is additive white Gaussian noise power;

when C _m,n And if eta is less than or equal to eta, the corresponding link is interrupted, wherein eta is the target data rate.

3. The relay selection method based on cache prediction according to claim 1, wherein: the step (2) specifically refers to: adding an LSTM network into a DQN of a deep reinforcement learning network to form an LSTM-DQN network, and inputting data of L continuous time steps into the LSTM network, wherein the network consists of a plurality of LSTM units, and the LSTM comprises three gates, namely an input gate, a forget gate and an output gate;

action space: buffer auxiliary relay forwarding system state s based on current limited and changed _t The decision needs to be made on the selection of a relay and the reception or transmission of the relay, the environment selects a network for buffering the auxiliary relay, the action is to select one link for data transmission, which is equivalent to determining m _k,j J epsilon {0,1}, wherein k represents the number of relay nodes, 0 represents a relay received data packet, and 1 represents a relay transmitted data packet; if one relay network has k relay nodes, 2k transmission links are provided, and one link is selected for transmission in one time slot or no link is selected, so the total action number is 2k+1;

4. The relay selection method based on cache prediction according to claim 1, wherein: the step (3) specifically comprises the following steps:

(3a) Strengthening at depthIn the learning network DQN, the learning and decision maker is called an agent, the part interacting with the agent is called an environment, and the environment state is s at time slot t _t According to the current state, the agent decides the next action: i.e. which link is selected or not selected for data transmission, and determining the state s by using epsilon-greedy strategy _t Wherein ε (0, 1) is a greedy coefficient, n is the number of training iterations, ε is initially set to 1 to obtain good exploration and gradually decreases with the number of iterations;