CN116137628A - Relay node selection method, device, equipment and computer readable storage medium - Google Patents

Relay node selection method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN116137628A
CN116137628A CN202111375086.2A CN202111375086A CN116137628A CN 116137628 A CN116137628 A CN 116137628A CN 202111375086 A CN202111375086 A CN 202111375086A CN 116137628 A CN116137628 A CN 116137628A
Authority
CN
China
Prior art keywords
signal
node
noise ratio
relay
relay node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111375086.2A
Other languages
Chinese (zh)
Inventor
杨科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202111375086.2A priority Critical patent/CN116137628A/en
Publication of CN116137628A publication Critical patent/CN116137628A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1061Peer-to-peer [P2P] networks using node-based peer discovery mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/123Evaluation of link metrics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention relates to the technical field of the Internet of things and discloses a relay node selection method, which is applied to a source node and comprises the following steps: transmitting signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group; acquiring a first signal-to-noise ratio and a second signal-to-noise ratio of a target node; the first signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node directly transmits signals to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits signals to the destination node through a link of the relay node; and determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio. Through the mode, the method and the device for determining the optimal relay node achieve the effects that the determining process of the optimal relay node is simple and the accuracy is high.

Description

Relay node selection method, device, equipment and computer readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of the Internet of things, in particular to a relay node selection method, a device, equipment and a computer readable storage medium.
Background
At present, in a modern wireless internet of things communication network, a large number of relay devices with simple functions are distributed, some relay devices which cannot be recycled and can not be charged manually may be replaced manually, and the economic cost of replacing relays is too high. In order to save energy and avoid the waste of relay equipment, in a wireless collaborative internet of things communication system, one of the research focuses on the problem of relay selection, namely, selecting an optimal relay node to participate in forwarding work, and possibly saving a certain number of relay nodes while guaranteeing the performance of the communication system, thereby achieving the purposes of prolonging the service life of the relay equipment and reducing the cost and saving the energy. The inventor of the application finds that the process of determining the optimal relay node in the prior art is complex and has low accuracy in the process of implementing the embodiment of the invention.
Disclosure of Invention
In view of the above problems, embodiments of the present invention provide a method, an apparatus, a device, and a computer readable storage medium for selecting a relay node, which are used to solve the problems in the prior art that a determination process of an optimal relay node is complex and an accuracy is low.
According to an aspect of the embodiment of the present invention, there is provided a relay node selection method applied to a source node, the method including:
Transmitting signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group;
acquiring a first signal-to-noise ratio and a second signal-to-noise ratio of a target node; the first signal-to-noise ratio is the signal-to-noise ratio of a destination node when the source node directly transmits a signal to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits the signal to the destination node through a link of a relay node;
and determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio.
In an alternative manner, after the signal is sent to each relay node and the destination node, the method includes: decoding the signals by each relay node; and recoding the signal by the successfully decoded relay node and then sending the recoded signal to the destination node.
In an optional manner, the relay node that passes the decoding success re-encodes the signal and then sends the signal to the destination node, which includes: and comparing the signal-to-noise ratio of each relay node after receiving the signals sent by the source node with an access threshold value, and determining whether the relay node is successfully decoded, thereby determining the relay node with successful decoding.
In an optional manner, before the determining, according to the first signal-to-noise ratio and the second signal-to-noise ratio, an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm, the method further includes:
the average throughput is determined according to the following formula:
Figure BDA0003359766780000021
wherein, gamma s,d First signal-to-noise ratio, gamma, of link for source node s to send signal directly to destination node d i,d A second signal to noise ratio for the link from the source node s to the destination node d via the ith relay node.
In an optional manner, the determining, according to the first signal-to-noise ratio and the second signal-to-noise ratio, an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm includes: taking the set of all the relay nodes as a state space set; the number of the relay nodes is formed to be an action space set; determining a reward function according to the first signal-to-noise ratio and the second signal-to-noise ratio; and carrying out iterative updating on the Q value matrix according to the state space set, the action space set, the rewarding function and the state transfer function until training is finished, and obtaining the optimal relay node.
In an optional manner, the step of iteratively updating the Q-value matrix according to the state space set, the action space set, the reward function and the state transfer function until training is finished, to obtain an optimal relay node includes: randomly selecting one state from the state space set as a current state; determining a current action according to the probability of selecting each action in the action space in the current state; executing the current action to obtain a reward value; updating a Q value function according to the reward value, the current state and the current action; updating the annealing temperature and the learning rate, updating the current state to the next state, and re-executing the current action according to the probability of selecting each action in the action space when the current state is adopted, and obtaining the optimal relay node until the training is finished.
In an optional manner, after determining the optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, the method includes: transmitting the signal to the optimal relay node; and transmitting the signal to the destination node through the optimal relay node.
According to another aspect of the embodiment of the present invention, there is provided a relay node selection apparatus, including:
the sending module is used for sending signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group;
the acquisition module is used for acquiring a first signal-to-noise ratio and a second signal-to-noise ratio of the target node; the first signal-to-noise ratio is the signal-to-noise ratio of a destination node when the source node directly transmits a signal to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits the signal to the destination node through a link of a relay node;
and the determining module is used for determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio.
According to another aspect of an embodiment of the present invention, there is provided a computing device including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform the operations of the relay node selection method described above.
According to yet another aspect of an embodiment of the present invention, there is provided a computer-readable storage medium having stored therein at least one executable instruction that, when executed on a computing device, causes the computing device to perform the operations of the relay node selection method described above.
The embodiment of the invention sends signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group, a first signal-to-noise ratio and a second signal-to-noise ratio of the target node are obtained, and an optimal relay node is determined from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, so that the optimal relay node can be determined rapidly and accurately.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present invention can be more clearly understood, and the following specific embodiments of the present invention are given for clarity and understanding.
Drawings
The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a schematic flow chart of a relay node selection method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a wireless cooperative networking system to which the relay node selection method according to the embodiment of the present invention is applied;
FIG. 3 shows a schematic diagram of throughput simulation employing QL-RSA and R-RSA optimal relay node selection;
fig. 4 is a three-dimensional diagram of a probability distribution of relay node selection provided by the embodiment of the invention;
fig. 5 is a schematic structural diagram of a relay node selection device according to an embodiment of the present invention;
FIG. 6 illustrates a schematic diagram of a computing device provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein.
The internet of things is one of three application scenarios of a fifth generation mobile communication system, and has penetrated into aspects of daily life of people. The sensor node equipment in the Internet of things is simple, numerous, limited in power and the like, so that the sensor node equipment is not suitable for long-distance transmission, and a communication system of the Internet of things with full coverage cannot be constructed, so that the aim of interconnection of everything is fulfilled. With the commercial use of 5G, future communication systems require lower latency, faster transmission rates, and higher quality communications than existing communication systems. Relay collaboration technology in wireless communication networks is one of the hotspot technologies that address the above-mentioned needs. The relay cooperation technology is to deploy relay stations between original communication stations to assist the original stations to communicate, so that the channel capacity can be increased, the communication distance can be extended, the coverage area of communication can be increased, and the diversity gain can be improved, thereby meeting the requirement of high-quality communication of users.
In modern wireless internet of things communication networks, a large number of relay devices with simple functions are distributed, some of the relay devices may not be recycled, and the relay devices cannot be charged manually, so that the economic cost of replacing the relay manually is too high. In order to save energy and avoid the waste of relay equipment, in a wireless collaborative internet of things communication system, one of the research focuses on the problem of relay selection, namely how to formulate a proper selection standard, and select an optimal relay node to participate in forwarding work, so that the performance of the communication system is ensured, and a certain number of relay nodes can be saved, thereby the service life of the relay equipment is prolonged, and the purposes of reducing cost and saving energy are achieved. The schemes of selecting the relay nodes to participate in transmission in the existing relay selection technical schemes can be roughly divided into the following three types:
1. the relay selection scheme based on the maximum receiving signal-to-noise ratio of the destination terminal is a common selection technology in the relay selection technology, and the relay selection is performed by taking the maximum signal-to-noise ratio of the destination terminal as a selection criterion.
2. Relay selection scheme based on physical topology location of nodes. The basic idea of this scheme is to abstract the source-to-relay and relay-to-destination distances as "average hops" and then select the relay with the smallest "average hops" among the candidate relay nodes to assist the system in completing the communication.
3. The scheme is based on the instantaneous channel state information, and the principle of the scheme is that the best link combination is selected from a plurality of communication links to carry out cooperative transmission, so that the scheme is the most convenient and best-performance scheme in theory.
The inventor of the present application has found that all three relay selection technologies have certain drawbacks:
1. for the relay selection scheme based on the maximum receiving signal-to-noise ratio of the destination terminal, the scheme is based on channel state statistical information, so that after the corresponding relay selection is completed, no matter whether the channel changes, the relay selection is not carried out again, and the relay node selection is not adaptively changed along with the change of the channel.
2. For a relay selection scheme based on the physical topological location of the nodes, the computational complexity of the scheme can increase greatly with the number of candidate relay nodes. For the practical scene that the wireless internet of things has large-scale candidate relay nodes and meets the low-delay and high-quality communication requirements, the scheme clearly faces a great challenge.
3. For the scheme based on the instantaneous channel state information, although the optimal performance can be obtained most simply in theory, in most practical scenes, it is difficult for the relay end to estimate the instantaneous channel state information in real time, so that the scheme is not widely applied in practice.
In recent years, artificial intelligence has been studied and applied in the fields of the internet of things and the like, and therefore, the inventors of the present application introduce reinforcement learning in artificial intelligence into cooperative communication as a relay selection algorithm. Reinforcement learning is a strategy in which an agent finds optimal data by constantly interacting with an unknown environment, and aims to make the agent pay a minimum cost to achieve a certain goal, and unlike supervised learning, reinforcement learning mainly relies on an environmental feedback signal (typically, such a signal is a scalar signal) to constantly correct its own behavior strategy, and does not need to manually set a certain judgment standard to perform intervention. Compared with the traditional relay selection technology, the method has the advantages that the reinforcement learning algorithm is adopted by the method:
1. the final decision of reinforcement learning is only derived from the rewards of the environmental feedback, excessive external factor intervention is not needed, and when the channel state changes, the rewards of the environmental feedback also change, so that the reinforcement learning algorithm can adaptively select the relay nodes for cooperative communication.
2. The reward required for reinforcement learning is a scalar signal, so that the relay is not required to estimate instantaneous channel state information in real time, and the requirement on relay hardware equipment is greatly reduced.
3. Since the iterative rules of reinforcement learning are simple, and are maximizing the cumulative rewards to learn the optimal strategy. Different return values can be designed for different communication standards to obtain an optimal strategy. The method has the advantages that a large number of complex formula deductions are not needed, the calculation complexity is greatly reduced, the ideal performance effect can be achieved, and the method has the characteristics of simplicity and universality in algorithm design.
Fig. 1 shows a flowchart of a relay node selection method provided by an embodiment of the present invention, which is performed by a computing device. The computing device may be a server, a terminal, a source node in the internet of things, other computer devices, such as a personal computer, a tablet computer, etc., or other intelligent agent devices. As shown in fig. 1, the method comprises the steps of:
step 110: transmitting signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group.
As shown in fig. 2, a wireless collaborative internet of things communication network is shown, which consists of a source node (S), a destination node (D) and m relay nodes (R). Considering that the internet of things node configuration is simple, it is assumed that each node is equipped with a single antenna and that the signals on the S-R-D and S-D paths use orthogonal channels through Time Division Multiple Access (TDMA), each channel link meeting small-scale rayleigh fading. The source node sends signals to each relay node, and sends the signals to the destination node through the relay node which decodes successfully. In addition, the source node also transmits the signal to the destination node directly through a feedback channel, wherein the feedback channel is a channel directly connected with the destination node by the source node. Thus, after sending signals to each relay node and destination node, the method comprises: decoding the signals by each relay node; and recoding the signal by the successfully decoded relay node and then sending the recoded signal to the destination node. And comparing the signal to noise ratio of each relay node after receiving the signal sent by the source node with an access threshold value, and determining whether the relay node is successfully decoded or not in the embodiment of the invention, thereby determining the relay node with successful decoding.
Specifically, the transmission of a signal is divided into two time slots: in the first time slot, the source node sends signals to the destination node and all the relay nodes in the form of broadcasting, and then the destination node and the ith relay node r i The received signals may be expressed as:
Figure BDA0003359766780000071
Figure BDA0003359766780000072
wherein y is s,d 、y s,i Representing signals received by the destination node and the ith relay node, P s Is the transmitting power of the source node, h s,d ,h s,i Respectively represent S-D and S-r i Parameters, eta of the channel link of (a) s,d And eta s,i Respectively the power is delta s,d 2 、δ s,i 2 Is added to the white gaussian noise of the (c),x represents a signal broadcast by a source node and satisfies E { |x| 2 }=1。
In the second time slot, DF forwarding protocol is adopted, if the relay node can successfully decode the signal transmitted by the source node, the recoding is sent to the destination node, if the decoding fails, the relay node keeps silent. Thus, a threshold value gamma is introduced th Signal-to-noise ratio gamma received from first stage of relay node s,i Comparing to determine whether decoding is successful, the destination receives the relay r i Is the signal y of (2) s,d Can be expressed as:
Figure BDA0003359766780000073
wherein P is i Is the transmission power of the ith relay node, h i,d Representing the i-th relay node r i The channel parameters of the D-channel link,
Figure BDA0003359766780000074
is the successfully decoded signal, gamma s,i Is the signal-to-noise ratio (SNR) and gamma after the ith relay received signal s,i =P s |h s,i | 2s,i 2 。η i,d The power of the additive Gaussian white noise which is independent and distributed at the destination end is delta i,d 2 . In the embodiment of the invention, the threshold value gamma is entered th The embodiment of the invention is not particularly limited, and is obtained by corresponding setting according to specific scenes by those skilled in the art.
Step 120: and acquiring a first signal-to-noise ratio and a second signal-to-noise ratio of the destination node.
In the embodiment of the invention, the first signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node directly transmits the signal to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits the signal to the destination node through a link of the relay node.
Wherein, under DF protocol, if relay node r i Selected, synthesizing at destination node by maximum ratio combiningThe signal, and thus the average throughput at the destination node, can be expressed as:
Figure BDA0003359766780000081
wherein, gamma s,d 、γ i,d The first signal-to-noise ratio and r of the S-D link respectively i -a second signal-to-noise ratio of the D link;
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003359766780000082
Figure BDA0003359766780000083
step 130: and determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio.
In practical communication, it is very difficult for the relay node to rapidly and accurately estimate the instantaneous channel information of each channel link, so the application assumes that the relay node does not know the instantaneous channel information. The feedback channel feeds back the received signal-to-noise ratio of the destination node to the source node, and the source node uses the information fed back by the destination node as the return rewards of reinforcement learning and is used for guiding the source node to select the best relay node for cooperative transmission.
In the embodiment of the invention, the optimal node can be selected by a time difference reinforcement learning algorithm or a Monte Carlo reinforcement learning algorithm. In the monte carlo reinforcement learning algorithm, an agent (an executing end of the relay node selection method, such as a server or a source node in the internet of things) estimates a value of a current state through sampling training in a plurality of complete cycles, and when the number of sampling is enough, the value of the bid value can be accurately calculated, so that the monte carlo reinforcement learning method is an unbiased estimation method. However, the benefit of the monte carlo method for obtaining the state must go through one complete cycle of sampling, many random states and actions are experienced in the process, if the probability distribution of the action rewards is large, the variance of the state benefit may be infinite due to multiple sampling, which is intolerable to the communication system and even the optimal strategy cannot be found by the monte carlo reinforcement learning method when the environment scene without the complete state sequence is encountered. In another embodiment of the present invention, a time difference reinforcement learning algorithm is adopted for selection, wherein the benefit of the algorithm in the current state is estimated through the next state, so that the time difference method belongs to biased estimation. The time difference reinforcement learning method is different from the Monte Carlo method in that only the next random state and action are used, the randomness of the obtained state gain is smaller than that in Monte Carlo, the corresponding variance is smaller than that in Monte Carlo method, and the optimal strategy can be found in the face of the full-cycle-free Markov decision process.
In one embodiment of the invention, a time-differential reinforcement learning algorithm of a Q-learning (QL) anomaly strategy is employed to determine an optimal relay node from among the relay nodes. Wherein, the formula of the Q value function (namely action cost function) is as follows:
Figure BDA0003359766780000091
where α=1/(1+visit (s, a), α∈ (0, 1)]Is the learning rate and visit (s, a) is the total number of times the Q-learning algorithm accesses the state-action pair (s, a). With the visit times (training times) t-infinity, the state action pairs can be updated countless times and alpha will approach zero, at this time Q t (s, a) will converge to an optimal Q * (s, a), and then obtaining the optimal strategy according to the following formula:
Figure BDA0003359766780000092
wherein the training time t represents the communication time, and the optimal relay node is determined along with the continuous increase of the communication time.
Therefore, according to the first signal-to-noise ratio and the second signal-to-noise ratio, determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm comprises:
taking the set of all the relay nodes as a state space set;
the number of the relay nodes is formed to be an action space set;
determining a reward function according to the first signal-to-noise ratio and the second signal-to-noise ratio;
and carrying out iterative updating on the Q value matrix according to the state space set, the action space set, the rewarding function and the state transfer function until training is finished, and obtaining the optimal relay node.
Specifically, the source node selects relay nodes, and the Softmax selection policy is transmitted in broadcast form to each selected relay node in the first time slot (the selected relay node has been activated, the other relay nodes remain silent). The prize value is obtained through a feedback channel between the source node and the destination node for updating the Q value matrix and guiding future policy selections. Each relay node is considered to be the state of QL in the system. Thus, states and actions in the collaborative communication network are defined as:
1. a set of state spaces (S). In the QL algorithm proposed in the present application, each relay node represents one state, so the state space set s= { S 1 ,s 2 ....,s m A set of relay nodes.
2. A set of motion spaces (A). A is composed of the number of relay nodes, wherein:
A={a 1 ,a 2 ....,a m };
action a i When executed, indicates that the system will enter the next state s i
3. State transition function sxa→s when the system is in state S j Action a k Is executed, the state transfer function is defined as:
f(s j ,a k )=s k
4. a bonus function r. Its design is based on an index of system performance. The purpose of the system is to obtain maximum throughput of the destination node after performing the operation. The higher the SNR obtained by the destination node, the greater its throughput. Thus, under the DF forwarding protocol, the reward functions may be defined as:
Figure BDA0003359766780000101
5. Q value matrix Q(s) t ,a t ). The Q-value function Q(s) t ,a t ) Whenever an agent selects an action, then gets rewarded, then updates the Q table according to equation (5). Just after training begins, each entry of the Q table is set to zero. In the QL algorithm, the agent must select an action a t E A, then transition to the corresponding state s according to the state transfer function t+1 E S to update the Q table. We define a Q table in the form of a matrix, called a Q value matrix, which can be written as:
Figure BDA0003359766780000102
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003359766780000103
wherein R is m×m Is a matrix of order m x m. The system updates the Q value by a continuous trial and error method and then stores the Q value in a Q value matrix.
In the embodiment of the present invention, according to the state space set, the action space set, the reward function and the state transfer function, the Q value matrix is iteratively updated until training is completed, so as to obtain an optimal relay node, which specifically includes:
randomly selecting one state from the state space set as a current state;
determining a current action according to the probability of each action in the action space selected in the current state;
executing the current action to obtain a reward value;
updating a Q value function according to the reward value, the current state and the current action;
Updating the annealing temperature and the learning rate, updating the current state to the next state, and re-executing the current action according to the probability of selecting each action in the action space when the current state is adopted, and obtaining the optimal relay node until the training is finished.
In order to balance between "exploration" and "utilization," among other things, the present application employs a Softmax selection algorithm to select actions. The distribution of the Softmax selection algorithm action probabilities satisfies the boltzmann distribution:
Figure BDA0003359766780000111
wherein: p (a) i S) means that the agent selects action a in state s i Probability of T>0 is the annealing temperature, and the temperature of the annealing material is equal to 0,
Figure BDA0003359766780000112
refers to selecting action a at state s i Is a function of the value of (2). Here, for example, in a communication process of a certain time slot, the source node selects the 3 rd relay node to transmit information, the state S at this time is 3, and the next transmission selects which node (a i ) It depends on equation 8. As can be seen from the formula (8), a larger T value can make all the action probability distribution of the relay node equal probability distribution, so that the randomness is strong, and the source node can fully 'explore' the relay node with the current Q value not higher, thereby being expected to obtain larger return. Conversely, when the value of T is smaller, the probability that an action with a higher Q will be selected will increase, so that the source node will "utilize" its learned knowledge to select an action that it deems to be the most rewarding. In order to ensure that the Q-learning algorithm can fully explore in the early stage and maximize utilization in the later stage in the training process, a very large initial T value is set, and the initial T value gradually decreases to a final T value along with the increase of training times final Smooth transition from "exploration" to "utilization" is achieved.
Specifically, as known from the definitions of action sets, state sets and immediate return values in reinforcement learning, the relay selection algorithm based on Q-learning directs the source node to select the relay node in a direction of maximizing the received signal-to-noise ratio of the destination node through the design of the return values, so that the system obtains the maximum throughput. The single relay selection algorithm based on Q-learning is as follows:
Figure BDA0003359766780000113
/>
Figure BDA0003359766780000121
in order to prove the effectiveness of the method, the simulation result based on the Q-learning relay node algorithm (QL based relay selection algorithm, QL-RSA) is subjected to performance analysis, and the data of the simulation graph are obtained through 5000 independent experimental calculations. As a comparison reference, the present application also simulates the performance under a random relay selection algorithm ((Random relay selection algorithm, R-RSA). The random relay selection algorithm refers to that a source node selects any relay node for transmission with equal uniform probability, simulation parameters are set as follows, m relay nodes are uniformly distributed on a plane with a radius r=0.5 and an x-y with an origin as a center, source and destination nodes are respectively located at (-0.5, 0) and (0.5, 0) channels between the two nodes
Figure BDA0003359766780000122
Wherein d is i,j Is the distance between two nodes. The path loss of the channel is set to v=2.5. In order to ensure that all relay nodes can be fully "explored" in the early stage of Q-learning training, only the optimal relay node can be "utilized" in the later stage of training, we set the initial temperature of the annealing process to t=10 50 And decreasing to the final temperature T with a negative exponential law of 0.9 final ,T final Set to 0.1. As shown in fig. 3, the performance differences of QL-RSA and R-RSA are compared in the case where the number of relay nodes m=10 at different access thresholds. As can be seen in fig. 3Consistent with expectations, the smaller the entry threshold (hereinafter referred to as the threshold), the greater the throughput that the system achieves and the better the performance of QL-RSA than R-RSA. When threshold gamma th When 3dB, QL-RSA performs substantially the same as R-RSA. This is because in DF mode, when the threshold requirement is too high, the relay node basically cannot successfully decode and forward, and the system basically selects the source-destination path for information transmission. The source node therefore obtains substantially the same return value based on whichever action is selected from set a by the QL-RSA, thus resulting in substantially the same performance as the R-RSA. When the threshold is smaller, the probability of success of decoding and forwarding of the relay nodes is increased, the relay nodes closer to the target node can be selected to obtain higher system throughput, and the QL-RSA can intelligently identify the relay nodes through continuous interactive learning with the environment, so that optimal action selection is made from the set A, and the throughput obviously superior to that of the R-RSA is obtained.
Fig. 4 shows a three-dimensional graph of probability distribution of simulation node selection, the coordinates of relay nodes are respectively: [ (-0.089, -0.481), (0.436,0.115), (-0.363,0.323), (0.021, -0.040), (-0.317,0.011), (0.294, -0.160), (-0.238, -0.148), (-0.401, -0.257), (-0.022, -0.485), (-0.106,0.387)]. At the beginning of training, the annealing temperature T of the Softmax selection strategy tends to infinity, so that probability of each node being selected is distributed with equal probability, and the Q-learning is in the 'exploration' stage. Along with continuous interaction with the environment, the annealing temperature T-T final As can be seen from the equation (8), the probability distribution of the probability of selecting each node depends on the Q value corresponding to each node, and Q-learning is in the "utilization" phase. As can be seen from the graph, the second node of the relay has a higher Q value, so as the training frequency increases continuously, the probability of selecting the second node of the relay tends to be 1, and the probability of selecting other nodes tends to be 0. As can be seen from fig. 3, only relay node 2 is the best node at this time.
According to the first signal-to-noise ratio and the second signal-to-noise ratio, after an optimal relay node is determined from the relay nodes by adopting a reinforcement learning algorithm, the embodiment of the invention further transmits the signal to the optimal relay node, and the signal is transmitted to the destination node through the optimal relay node.
The embodiment of the invention sends signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group, a first signal-to-noise ratio and a second signal-to-noise ratio of the target node are obtained, and an optimal relay node is determined from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, so that the optimal relay node can be determined rapidly and accurately.
Fig. 5 shows a schematic structural diagram of a relay node selection device according to an embodiment of the present invention. As shown in fig. 5, the apparatus 300 includes: a sending module 310, an obtaining module 320 and a determining module 330.
A transmitting module 310, configured to transmit signals to each relay node and destination node; the relay node is any relay node in the cooperative relay group;
an obtaining module 320, configured to obtain a first signal-to-noise ratio and a second signal-to-noise ratio of the destination node; the first signal-to-noise ratio is the signal-to-noise ratio of a destination node when the source node directly transmits a signal to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits the signal to the destination node through a link of a relay node;
And a determining module 330, configured to determine an optimal relay node from the relay nodes by using a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio.
In an alternative manner, after the signal is sent to each relay node and the destination node, the method includes: decoding the signals by each relay node; and recoding the signal by the successfully decoded relay node and then sending the recoded signal to the destination node.
In an optional manner, the relay node that passes the decoding success re-encodes the signal and then sends the signal to the destination node, which includes: and comparing the signal-to-noise ratio of each relay node after receiving the signals sent by the source node with an access threshold value, and determining whether the relay node is successfully decoded, thereby determining the relay node with successful decoding.
In an optional manner, before the determining, according to the first signal-to-noise ratio and the second signal-to-noise ratio, an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm, the method further includes:
the average throughput is determined according to the following formula:
Figure BDA0003359766780000141
/>
wherein, gamma s,d First signal-to-noise ratio, gamma, of link for source node s to send signal directly to destination node d i,d A second signal to noise ratio for the link from the source node s to the destination node d via the ith relay node.
In an optional manner, the determining, according to the first signal-to-noise ratio and the second signal-to-noise ratio, an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm includes: taking the set of all the relay nodes as a state space set; the number of the relay nodes is formed to be an action space set; determining a reward function according to the first signal-to-noise ratio and the second signal-to-noise ratio; and carrying out iterative updating on the Q value matrix according to the state space set, the action space set, the rewarding function and the state transfer function until training is finished, and obtaining the optimal relay node.
In an optional manner, the step of iteratively updating the Q-value matrix according to the state space set, the action space set, the reward function and the state transfer function until training is finished, to obtain an optimal relay node includes: randomly selecting one state from the state space set as a current state; determining a current action according to the probability of selecting each action in the action space in the current state; executing the current action to obtain a reward value; updating a Q value function according to the reward value, the current state and the current action; updating the annealing temperature and the learning rate, updating the current state to the next state, and re-executing the current action according to the probability of selecting each action in the action space when the current state is adopted, and obtaining the optimal relay node until the training is finished.
In an optional manner, after determining the optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, the method includes: transmitting the signal to the optimal relay node; and transmitting the signal to the destination node through the optimal relay node.
The specific working process of the relay node selection device in the embodiment of the present invention is substantially identical to that of the above embodiment, and will not be described herein.
The embodiment of the invention sends signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group, a first signal-to-noise ratio and a second signal-to-noise ratio of the target node are obtained, and an optimal relay node is determined from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, so that the optimal relay node can be determined rapidly and accurately.
FIG. 6 illustrates a schematic diagram of a computing device according to an embodiment of the present invention, and the embodiment of the present invention is not limited to a specific implementation of the computing device. The computing device may be a source node in the internet of things, may be a server, or may be other computing devices.
As shown in fig. 6, the computing device may include: a processor 402, a communication interface (Communications Interface) 404, a memory 406, and a communication bus 408.
Wherein: processor 402, communication interface 404, and memory 406 communicate with each other via communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. The processor 402 is configured to execute the program 410, and may specifically perform the relevant steps in the embodiment of the relay node selection method described above.
In particular, program 410 may include program code including computer-executable instructions.
The processor 402 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included by the computing device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
Memory 406 for storing programs 410. Memory 406 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
Program 410 may be specifically invoked by processor 402 to cause a computing device to:
transmitting signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group;
acquiring a first signal-to-noise ratio and a second signal-to-noise ratio of a target node; the first signal-to-noise ratio is the signal-to-noise ratio of a destination node when the source node directly transmits a signal to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits the signal to the destination node through a link of a relay node;
and determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio.
In an alternative manner, after the signal is sent to each relay node and the destination node, the method includes: decoding the signals by each relay node; and recoding the signal by the successfully decoded relay node and then sending the recoded signal to the destination node.
In an optional manner, the relay node that passes the decoding success re-encodes the signal and then sends the signal to the destination node, which includes: and comparing the signal-to-noise ratio of each relay node after receiving the signals sent by the source node with an access threshold value, and determining whether the relay node is successfully decoded, thereby determining the relay node with successful decoding.
In an optional manner, before the determining, according to the first signal-to-noise ratio and the second signal-to-noise ratio, an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm, the method further includes:
the average throughput is determined according to the following formula:
Figure BDA0003359766780000171
wherein, gamma s,d First signal-to-noise ratio, gamma, of link for source node s to send signal directly to destination node d i,d A second signal to noise ratio for the link from the source node s to the destination node d via the ith relay node.
In an optional manner, the determining, according to the first signal-to-noise ratio and the second signal-to-noise ratio, an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm includes: taking the set of all the relay nodes as a state space set; the number of the relay nodes is formed to be an action space set; determining a reward function according to the first signal-to-noise ratio and the second signal-to-noise ratio; and carrying out iterative updating on the Q value matrix according to the state space set, the action space set, the rewarding function and the state transfer function until training is finished, and obtaining the optimal relay node.
In an optional manner, the step of iteratively updating the Q-value matrix according to the state space set, the action space set, the reward function and the state transfer function until training is finished, to obtain an optimal relay node includes: randomly selecting one state from the state space set as a current state; determining a current action according to the probability of selecting each action in the action space in the current state; executing the current action to obtain a reward value; updating a Q value function according to the reward value, the current state and the current action; updating the annealing temperature and the learning rate, updating the current state to the next state, and re-executing the current action according to the probability of selecting each action in the action space when the current state is adopted, and obtaining the optimal relay node until the training is finished.
In an optional manner, after determining the optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, the method includes: transmitting the signal to the optimal relay node; and transmitting the signal to the destination node through the optimal relay node.
The specific working process of the computing device according to the embodiment of the present invention is substantially the same as that of the foregoing embodiment, and will not be described herein.
The embodiment of the invention sends signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group, a first signal-to-noise ratio and a second signal-to-noise ratio of the target node are obtained, and an optimal relay node is determined from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, so that the optimal relay node can be determined rapidly and accurately.
Embodiments of the present invention provide a computer readable storage medium storing at least one executable instruction that, when executed on a computing device, causes the computing device to perform a relay node selection method according to any of the method embodiments described above.
The executable instructions may be particularly useful for causing a computing device to:
transmitting signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group;
acquiring a first signal-to-noise ratio and a second signal-to-noise ratio of a target node; the first signal-to-noise ratio is the signal-to-noise ratio of a destination node when the source node directly transmits a signal to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits the signal to the destination node through a link of a relay node;
and determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio.
In an alternative manner, after the signal is sent to each relay node and the destination node, the method includes: decoding the signals by each relay node; and recoding the signal by the successfully decoded relay node and then sending the recoded signal to the destination node.
In an optional manner, the relay node that passes the decoding success re-encodes the signal and then sends the signal to the destination node, which includes: and comparing the signal-to-noise ratio of each relay node after receiving the signals sent by the source node with an access threshold value, and determining whether the relay node is successfully decoded, thereby determining the relay node with successful decoding.
In an optional manner, before the determining, according to the first signal-to-noise ratio and the second signal-to-noise ratio, an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm, the method further includes:
the average throughput is determined according to the following formula:
Figure BDA0003359766780000181
wherein, gamma s,d First signal-to-noise ratio, gamma, of link for source node s to send signal directly to destination node d i,d A second signal to noise ratio for the link from the source node s to the destination node d via the ith relay node.
In an optional manner, the determining, according to the first signal-to-noise ratio and the second signal-to-noise ratio, an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm includes: taking the set of all the relay nodes as a state space set; the number of the relay nodes is formed to be an action space set; determining a reward function according to the first signal-to-noise ratio and the second signal-to-noise ratio; and carrying out iterative updating on the Q value matrix according to the state space set, the action space set, the rewarding function and the state transfer function until training is finished, and obtaining the optimal relay node.
In an optional manner, the step of iteratively updating the Q-value matrix according to the state space set, the action space set, the reward function and the state transfer function until training is finished, to obtain an optimal relay node includes: randomly selecting one state from the state space set as a current state; determining a current action according to the probability of selecting each action in the action space in the current state; executing the current action to obtain a reward value; updating a Q value function according to the reward value, the current state and the current action; updating the annealing temperature and the learning rate, updating the current state to the next state, and re-executing the current action according to the probability of selecting each action in the action space when the current state is adopted, and obtaining the optimal relay node until the training is finished.
In an optional manner, after determining the optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, the method includes: transmitting the signal to the optimal relay node; and transmitting the signal to the destination node through the optimal relay node.
The specific working process of the instructions in the computer storage medium according to the embodiments of the present invention when executed on a computing device is substantially the same as the above embodiments, and will not be described herein.
The embodiment of the invention sends signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group, a first signal-to-noise ratio and a second signal-to-noise ratio of the target node are obtained, and an optimal relay node is determined from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio, so that the optimal relay node can be determined rapidly and accurately.
The embodiment of the invention provides a relay node selection device which is used for executing the relay node selection method.
Embodiments of the present invention provide a computer program that is callable by a processor to cause a computing device to perform the relay node selection method of any of the method embodiments described above.
An embodiment of the present invention provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when run on a computer, cause the computer to perform the relay node selection method in any of the method embodiments described above.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component, and they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims (10)

1. A relay node selection method, applied to a source node, comprising:
transmitting signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group;
Acquiring a first signal-to-noise ratio and a second signal-to-noise ratio of a target node; the first signal-to-noise ratio is the signal-to-noise ratio of a destination node when the source node directly transmits a signal to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits the signal to the destination node through a link of a relay node;
and determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio.
2. The method of claim 1, wherein after the signaling to each relay node and destination node, the method comprises:
decoding the signals by each relay node;
and recoding the signal by the successfully decoded relay node and then sending the recoded signal to the destination node.
3. The method according to claim 2, wherein the re-encoding the signal by the successfully decoded relay node and then transmitting the re-encoded signal to the destination node comprises:
and comparing the signal-to-noise ratio of each relay node after receiving the signals sent by the source node with an access threshold value, and determining whether the relay node is successfully decoded, thereby determining the relay node with successful decoding.
4. The method of claim 1, wherein prior to determining an optimal relay node from each of the relay nodes using a reinforcement learning algorithm based on the first signal-to-noise ratio and the second signal-to-noise ratio, the method further comprises:
the average throughput is determined according to the following formula:
Figure FDA0003359766770000011
wherein, gamma s,d First signal-to-noise ratio, gamma, of link for source node s to send signal directly to destination node d i,d A second signal to noise ratio for the link from the source node s to the destination node d via the ith relay node.
5. The method according to any one of claims 1-4, wherein the determining an optimal relay node from the relay nodes using a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio comprises:
taking the set of all the relay nodes as a state space set;
the number of the relay nodes is formed to be an action space set;
determining a reward function according to the first signal-to-noise ratio and the second signal-to-noise ratio;
and carrying out iterative updating on the Q value matrix according to the state space set, the action space set, the rewarding function and the state transfer function until training is finished, and obtaining the optimal relay node.
6. The method of claim 5, wherein iteratively updating the Q-value matrix according to the state space set, the action space set, the reward function, and the state transfer function until training is completed, to obtain an optimal relay node, comprises:
randomly selecting one state from the state space set as a current state;
determining a current action according to the probability of selecting each action in the action space in the current state;
executing the current action to obtain a reward value;
updating a Q value function according to the reward value, the current state and the current action;
updating the annealing temperature and the learning rate, updating the current state to the next state, and re-executing the current action according to the probability of selecting each action in the action space when the current state is adopted, and obtaining the optimal relay node until the training is finished.
7. The method of claim 1, wherein after determining an optimal relay node from each of the relay nodes using a reinforcement learning algorithm based on the first signal-to-noise ratio and the second signal-to-noise ratio, the method comprises:
transmitting the signal to the optimal relay node;
And transmitting the signal to the destination node through the optimal relay node.
8. A relay node selection apparatus, the apparatus comprising:
the sending module is used for sending signals to each relay node and the destination node; the relay node is any relay node in the cooperative relay group;
the acquisition module is used for acquiring a first signal-to-noise ratio and a second signal-to-noise ratio of the target node; the first signal-to-noise ratio is the signal-to-noise ratio of a destination node when the source node directly transmits a signal to the destination node, and the second signal-to-noise ratio is the signal-to-noise ratio of the destination node when the source node transmits the signal to the destination node through a link of a relay node;
and the determining module is used for determining an optimal relay node from the relay nodes by adopting a reinforcement learning algorithm according to the first signal-to-noise ratio and the second signal-to-noise ratio.
9. A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform the operations of the relay node selection method according to any one of claims 1-7.
10. A computer readable storage medium, wherein at least one executable instruction is stored in the storage medium, which when executed on a computing device, causes the computing device to perform the operations of the relay node selection method according to any of claims 1-7.
CN202111375086.2A 2021-11-17 2021-11-17 Relay node selection method, device, equipment and computer readable storage medium Pending CN116137628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111375086.2A CN116137628A (en) 2021-11-17 2021-11-17 Relay node selection method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111375086.2A CN116137628A (en) 2021-11-17 2021-11-17 Relay node selection method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116137628A true CN116137628A (en) 2023-05-19

Family

ID=86333208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111375086.2A Pending CN116137628A (en) 2021-11-17 2021-11-17 Relay node selection method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116137628A (en)

Similar Documents

Publication Publication Date Title
Zhao et al. Deep reinforcement learning based mobile edge computing for intelligent Internet of Things
Chen et al. iRAF: A deep reinforcement learning approach for collaborative mobile edge computing IoT networks
Tang et al. Computational intelligence and deep learning for next-generation edge-enabled industrial IoT
Li et al. NOMA-enabled cooperative computation offloading for blockchain-empowered Internet of Things: A learning approach
Wei et al. Joint optimization of caching, computing, and radio resources for fog-enabled IoT using natural actor–critic deep reinforcement learning
Nguyen et al. Non-cooperative energy efficient power allocation game in D2D communication: A multi-agent deep reinforcement learning approach
Nguyen et al. Distributed deep deterministic policy gradient for power allocation control in D2D-based V2V communications
Xu et al. Outage probability performance analysis and prediction for mobile IoV networks based on ICS-BP neural network
CN112261674A (en) Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling
Chen et al. Deep Q-Network based resource allocation for UAV-assisted Ultra-Dense Networks
Geng et al. Hierarchical reinforcement learning for relay selection and power optimization in two-hop cooperative relay network
Chen et al. Edge intelligence computing for mobile augmented reality with deep reinforcement learning approach
Sun et al. Graph-reinforcement-learning-based task offloading for multiaccess edge computing
Arroyo-Valles et al. A censoring strategy for decentralized estimation in energy-constrained adaptive diffusion networks
Sacco et al. A self-learning strategy for task offloading in UAV networks
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
Fang et al. Smart collaborative optimizations strategy for mobile edge computing based on deep reinforcement learning
CN111741520B (en) Cognitive underwater acoustic communication system power distribution method based on particle swarm
CN113923743A (en) Routing method, device, terminal and storage medium for electric power underground pipe gallery
Geng et al. Deep deterministic policy gradient for relay selection and power allocation in cooperative communication network
CN116827515A (en) Fog computing system performance optimization algorithm based on blockchain and reinforcement learning
CN116137628A (en) Relay node selection method, device, equipment and computer readable storage medium
Wu et al. Online learning to optimize transmission over an unknown gilbert-elliott channel
Dinh et al. Deep reinforcement learning-based offloading for latency minimization in 3-tier v2x networks
CN115442812A (en) Deep reinforcement learning-based Internet of things spectrum allocation optimization method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination