Disclosure of Invention
In order to effectively reduce interference and data collision in a network, improve the utilization rate of a channel and the throughput of a system and ensure the reliability of data service transmission among nodes, the invention provides a distributed channel allocation method in a wireless multi-hop network, which adopts a physical framework at least comprising a physical equipment layer, a calculation layer and a network service layer, wherein the physical equipment layer forms a multi-hop wireless communication network by n wireless nodes randomly deployed in the network, and each node is used as an autonomous Agent and interacts with an uncertain network environment through a local decision module; the aggregation node of the computation layer is responsible for aggregating, analyzing and processing data collected by other sites in the network, the node has an edge computation function or adopts a special edge server node, namely, the computation task of the node can be unloaded, an asynchronous DRL model can be trained based on empirical information collected in a distributed mode by the node, a multi-channel distribution problem is modeled into a POMDP problem, and distributed channel distribution is carried out by using the asynchronous DRL model trained by a centralized node or an edge server.
Further, modeling the multi-channel allocation problem as a POMDP problem, i.e. Agent observes the current network state s and performs action a in time period t, and after performing action a, transitions to the network state s' in the next time period with a state transition probability P, and obtains a corresponding reward R from the environment, the POMDP problem is expressed as:
M=<S,A,P,R,γ>;
wherein M represents a POMDP problem model; s is a state set representation state space; a is an action set representing an action space, wherein the action a belongs to the channel number to be switched by the node represented by A; r is a reward function; gamma is a discount factor. I.e. given the environment state S e S, Agent performs the action a e a, the environment state will migrate from S to S ', i.e. S → S', while getting the corresponding reward R from the environment.
Further, the environmental state observed by the node i in the t-th time period
Expressed as:
wherein the content of the first and second substances,
representing the occupation condition of the neighbor node of the node i to each wireless channel, namely the potential interference degree of each channel; k is the number of available channels, N is the number of nodes;
indicating the occupation of the channel j by the neighbor node of the node i in the t-th time period,
a neighbor node indicating the presence of node i uses channel j,
indicating that the neighbor node of the node i uses the channel j;
n
i,othe total number of neighbor nodes of node i.
Further, the reward R obtained from the environment when the node, after performing action a, transitions from state s to the next state s' may be expressed as:
wherein, the R (s, a) node i switches the channel to the reward R after the channel k in the t-th data period, that is, R ═ R (s, a);
the neighbor node indicating whether the node i exists in the current period uses the channel k: if not presentThe neighbor node of the node i uses the channel k, then
On the contrary, the method can be used for carrying out the following steps,
the neighbor successful transmission probability of node i for the t-th time period.
Further, the asynchronous DRL model deployed in the computation layer comprises a current network, a target network, an error computation module, an experience pool and a decision module deployed locally in the wireless node, wherein the network structure of the local decision module is the same as that of the current network, and parameters of the local decision module are periodically acquired from the edge node; wherein:
the target network fixes the network parameters and obtains a target value function,
the current network is used for evaluating strategy updating parameters and approximating a value function;
updating the parameter theta of the current network every time period; parameter θ of target network-Updating every a plurality of fixed time periods, wherein the time period is kept unchanged;
the experience e ═ S, a, r, S '>, S, S' belongs to S, a belongs to A, and the node in the network asynchronously collects from the wireless multi-hop network environment;
the error calculation module updates the parameters of the current network through the TD deviation calculated by the target network and the current network; in addition, the parameters of the current network are copied to the target network at regular intervals.
Further, the objective function
The calculation of (a) includes:
wherein R(s)t,at) For node i e [1, N ]](N is the number of nodes) state s at the t-th time periodtE.g. S performs action atThe reward obtained in the t time period after belonging to A; q(s)t+1,at+1;θ-),(st+1∈S,at+1E.g. A) represents a network, i.e. the t +1 time period is based on the target network, i.e. the parameter is theta-Node i in state st+1Performing action at+1The network of (2); st+1Is the state of the node i in the t +1 th time period; a ist+1An action performed for node i at time period t + 1; maxat+1∈AQ(st+1,at+1;θ-) Representing node i based on the target network (parameter θ)-) In a state st+1Lower selection action at+1To maximize the corresponding Q value.
Further, the error calculation module calculates a current network Q(s)
t,a
t(ii) a θ) and target value
The error between:
updating neural network parameters with gradient descent:
wherein L (theta) is a TD error function of the model;
expressing the expectation of the selected mini-batch empirical data; theta is the parameter of the current network updated in real time; an alpha learning rate;
is the corresponding gradient; q(s)
t,a
t(ii) a Theta) represents a network, i.e. the node i takes the state s at the time period of the t-th time when the network parameter is theta
tPerforming action a
tThe network of (2).
Furthermore, the whole system time is divided into a plurality of continuous superframe time, one superframe time is a time period, each superframe comprises a beacon frame, a control period and a data transmission period, and the control period adopts a fixed control channel to transmit related control information and channel allocation decisions; the data transmission period adopts K non-overlapping channels to support non-interference parallel data transmission; and in the control period, all nodes in the network are switched to the control channel to monitor and send the related control information; in the data transmission period, a node to be sent data is switched to a channel where a parent node of the node is located to transmit data based on a channel access mechanism.
Further, in the process of performing the action a, the node adopts a channel access mechanism based on RTS/DCTS, which includes:
if the node d is positioned in the mth hop and the node of the (m + 1) th hop next to the mth hop is the node i, the node d is the father node of the node i; if the node e is positioned in the mth hop and the node of the (m + 1) th hop next to the mth hop is the node j, the node e is the father node of the node j; the four nodes work on the same channel, and the backoff values of the node i and the node j are 0;
when the node i sends an RTS frame to the node d, the node d waits for a CIFS time and returns a CTS frame;
after receiving the RTS frame of the node i or the CTS frame of the node d, the child node of the node d sets a corresponding NAV based on the information in the Duration field;
when receiving an RTS frame from the node i, the node e waits for an SIFS frame and returns a CTS frame to inform the child node of delaying data transmission during the transmission period of the node i;
wherein RTS refers to request to send; CTS means clear to send; the CIFS is an interframe space used for returning CTS by the destination node; SIFS means to separate frames belonging to one dialog, and CIFS is slightly larger than SIFS.
Further, if node j is located within the communication range of node i and its parent node is not located within the communication range of node i, after node j receives the RTS frame and waits for an RIFS, node j sends the RTS frame to parent node e.
The invention solves the problems of hidden terminals and exposed terminals in a high-density multi-hop wireless network, effectively avoids the problems of data conflict and channel resource waste, and improves the overall network performance. In addition, an asynchronous DRL model is provided for dynamically optimizing a channel allocation strategy of the node aiming at the wireless multi-hop multi-channel network based on the channel access performance and the channel occupation condition of the node in the data transmission period. A novel wireless mode based on Mobile Edge Computing (MEC) is provided, the computing and storage pressure of a terminal node is solved, and a distributed interaction (micro-learning) and centralized training (macro-learning) framework is designed to train an asynchronous DRL model. Therefore, the asynchronous DRL model proposed by the present invention can be implemented even on resource-constrained terminals. In addition, the invention considers the non-stationary problem in the multi-agent scene (MAS), and can further accelerate the network convergence while avoiding the severe dynamic change of the network by only utilizing the neighbor local information.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a distributed channel allocation method in a wireless multi-hop network, which adopts a physical framework at least comprising a physical equipment layer, a calculation layer and a network service layer, wherein the physical equipment layer forms a multi-hop wireless communication network by n wireless nodes randomly deployed in the network, and each node is used as an autonomous Agent and interacts with an uncertain network environment through a local decision module; the aggregation node of the computation layer is responsible for aggregating, analyzing and processing data collected by other sites in the network, and the node has an edge computation function, namely, the computation task of the node can be unloaded, an asynchronous DRL model can be trained based on empirical information acquired in a distributed mode by the node, a multi-channel distribution problem is modeled into a POMDP problem, and the trained asynchronous DRL model is used for carrying out channel distribution.
Example 1
The present embodiment presents a system architecture diagram, as shown in fig. 2, the system architecture includes a physical device layer, a computing layer, and a network service layer. The physical equipment layer is a multi-hop wireless communication network consisting of n wireless nodes randomly deployed in the network, and each node is used as an autonomous Agent and interacts with an uncertain network environment through a local decision module; the aggregation node of the computation layer is responsible for aggregating, analyzing and processing data collected by other sites in the network, has an edge computation function and can unload computation tasks of the node, and an asynchronous DRL model can be trained on the basis of experience information acquired in a distributed mode by the node.
In the data transmission process, the present embodiment selects to perform data transmission in a superframe structure, where the superframe structure is shown in fig. 3, the system time is divided into a plurality of consecutive superframe times, and each superframe includes a beacon frame, a control period and a data transmission period. Wherein, the control period adopts a fixed control channel to transmit the relevant control information and channel allocation decision; the data transmission period employs K non-overlapping channels to support interference-free parallel data transmission. Thus, during a control period, all nodes in the network are to switch to a control channel to listen and transmit related control information (routing, time synchronization, channel switching, etc.); in the data transmission period, a node to be sent data is switched to a channel where a parent node of the node is located to transmit data based on a channel access mechanism.
As shown in fig. 4, the asynchronous DRL model adopted in this embodiment solves the problem of dynamic multi-channel allocation in a multi-hop wireless network by using DRL. The embodiment of the invention combines the DQN function approximation capability and an A3C asynchronous empirical sampling framework, provides an asynchronous DRL model, and aims to reasonably allocate channels for nodes so as to improve the reliability of data transmission to the maximum extent. The DRL model deployed on the edge server adopts a DQN framework, DNN is introduced to extract features from original data to approximate a behavior value function, and an asynchronous training framework of A3C is combined to solve the problem that the DQN is not suitable for a high-dimensional action space and an MAS, so that the correlation between experiences is broken, the convergence speed of the network is remarkably improved, and the problem that an A3C algorithm cannot be realized on a wireless node with limited resources is solved.
In the embodiment, the limited computing capability, energy and memory capability of the wireless node under certain scenes are considered, so that the computing bottleneck and low performance are caused, the support of high-level application is limited, and a computing-intensive task, namely the training of the DRL model, is operated. Therefore, the embodiment of the invention adopts a wireless network architecture based on edge computing enabling, and transfers the computing task of the node training asynchronous DRL model to the edge nodes (sink nodes) with rich resources. As shown in fig. 2, the asynchronous DRL model deployed at the computation layer is composed of a current network (main), a target network (target), and an experience pool (experience replay). Thus, the edge computation-enabled sink nodes complete the training and updating tasks of the model.
When the asynchronous DRL model is adopted for channel allocation, the invention combines the function approximation capability of DQN and the asynchronous interaction architecture of A3CThe distributed interaction module (micro-learning) in the asynchronous DRL model presented in fig. 4 allows the terminal node to asynchronously select channel resources using local observation information. In addition, a centralized training module (macro learning) trains the asynchronous DRL model by adjusting operating parameters, thereby directing the system to move toward an application-specific global optimization goal (e.g., maximizing reliability of data transfer). Wherein each terminal node maintains a DRL prediction model to independently allocate channels. In particular, embodiments of the present invention model the multi-channel allocation problem as a POMDP problem, which consists of five tuples, M ═ M<S,A,P,R,γ>State s, action a, state transition probability P, reward function R, and discount factor γ. The Agent observes the current network state s and executes action a at each control period of time step t. Then, the system transfers to the next state according to the state transition probability to obtain the reward R from the environmentt+1。
State space, S ═ S
1,S
2,...,S
2K+N}. Where K is the number of available channels and N is the number of nodes. For a particular node i, at the t-th cycle, its state vector,
wherein the content of the first and second substances,
indicating the occupancy of channel j by the neighbor node of node i,
indicating that the neighbor channel of the node i occupies the channel j; otherwise, S
i,t,j=0。
Is the total number of neighbor nodes for node i.
Motion space, a ═ a1,a2...,aK},akE.g. A. Wherein, the channel number used for indicating the node i to switch in the next data transmission period, ak=chi,t,k,chi,t,k=k∈[1,K]。
A reward function, R. When the node i is in the t data period, the local observation state
Performing an action
Switching to channel ch
i,t,kAt the end of the data transmission cycle, the environment returns to the node an immediate reward value, R (s, a), which can be solved by the following function:
wherein the content of the first and second substances,
in the current data cycle, the neighbor node indicating that there is no node i uses the channel ch
i,t,k(ii) a On the contrary, the method can be used for carrying out the following steps,
is to use the channel ch
i,t,kThe number of neighbor nodes of node i of k.
Is the node is in ch
i,t,kThe probability of successful transmission of the data transmission is performed.
The edge computing enabled sink node trains the DRL model in a centralized mode based on experience information acquired by each node in a distributed asynchronous mode in the network, updated network model parameters are sent to the nodes, and each node can acquire the latest network parameters from a parent node of the node.
The centralized training process of the DRL model is shown in fig. 5, where two networks with the same structure but different parameters exist in the asynchronous DRL model, and the current value of the Q estimate is predicted, using the latest parameters; and predicting the neural network target value parameter of Q reality, which uses the previous old parameter. In the embodiment, the state of a node is used as the input of the neural network, each node executes different actions as the class of the node, the probability of each action executed by the node is predicted by the neural network, and the probability is used as the output of the neural network, namely the value of Q, for example, Q (s, a; theta) represents the probability of executing the action a by inputting the state s of the node when the parameter of the neural network is theta.
When the model is trained, some (mini-batch) experiences are randomly taken out from the experience pool to be trained so as to break the correlation between the experiences. In addition, because the experience information in the experience pool is provided by the intelligent agent in an asynchronous sampling mode, the correlation between experiences can be further broken, and richer experiences are provided.
From FIG. 5, it can be seen that<s,a>Information is used as the input of the current value network to obtain Q (s, a; theta) used for evaluating the current state behavior value function; the S ' S information is used as input to the target value network to obtain the corresponding maxQ (S ', a '; theta)
-) (ii) a Calculate out
The method comprises the following steps:
thus, based on
And the value can be further calculated by adopting a DQN error function module:
the current network updates the parameters of the current network based on the error function gradient:
wherein S ∈ S, and a ∈ A. Copying parameters of the current value network to a target value network after a certain number of iterations;
θ-←θ
repeating the above process to make the network reach a stable state.
Although the asynchronous DRL based channel allocation model improves network performance by applying multiple parallel data transmissions, the hidden and exposed terminal problems on a specific channel will be further exacerbated in a high-density wireless multi-hop network scenario. Fig. 1 illustrates the hidden terminal and exposure problem in a wireless multihop network, when node D is transmitting data to node C, since node B is located outside the communication range of node D. Therefore, the node B mistakenly thinks that the channel is in an idle state, so when the node B sends data to the nodes C and a at the same time, data collision occurs at the node C, which causes unnecessary data retransmission, and further aggravates the network congestion degree; furthermore, when the node B1 transmits data to the node a1, since the node B2 is in the communication range of the node B1 and the node B2 and a2 are not in the communication ranges of the node a1 and the node B1, respectively, the node B2 mistakenly considers that the channel is in the idle state and delays data transmission, which causes unnecessary waste of channel resources. Therefore, the embodiment of the present invention proposes to solve the hidden terminal and exposed terminal problems in the wireless multi-hop network based on the RTS/DCTS mechanism. The RTS/DCTS mechanism is further described below by way of example.
Fig. 6 is a diagram illustrating a solution to the hidden terminal problem in the wireless multi-hop network based on RTS/DCTS according to a preferred embodiment of the present invention. Wherein, nodes i and j, and nodes d and e are respectively located at m and m +1 hops (which refer to different and adjacent hop counts) and operate on the same channel. Node d is a parent node of node i and node e is a parent node of node j. Node e is also a neighbor node of node i. Assume that the backoff values of nodes i and j are both 0 at this time.
When the node i sends an RTS frame to the node d, the node d waits for a CIFS time and returns a CTS frame;
after receiving the RTS frame of the node i or the CTS frame of the node d, the child node of the node d sets a corresponding NAV based on the information in the Duration field;
when node e receives the RTS frame from node i, waits for a SIFS, and returns a CTS frame to inform its child node of delaying data transmission during the transmission of node i, thereby avoiding the hidden terminal problem.
In the channel access mechanism in the multi-hop environment, the hidden terminal problem is unavoidable, so the probability of successful transmission of node i on a particular channel k,
the following formula can be used for calculation:
wherein τ is a transmission probability in the channel access slot. In particular, the amount of the solvent to be used,
(n
sis the total number of child nodes of the parent node of the node). n is
aRepresents the number of neighbor nodes of node i, and n
fThe number of neighbor nodes of the parent node of node i (excluding the child nodes of the parent node).
Referring to fig. 7, fig. 7 is a schematic diagram illustrating an example of solving the problem of exposed terminals in a wireless multi-hop network based on RTS/DCTS according to a preferred embodiment of the present invention. Wherein, nodes i and j, and nodes d and e are respectively located at m and m +1 hops (which refer to different and adjacent hop counts) and operate on the same channel. Node d is a parent node of node i and node e is a parent node of node j. Node j is also a neighbor node to node i. Assume that the backoff values of nodes i and j are both 0 at this time.
When the node i sends RTS to the node d, the node d waits for a CIFS time and returns a CTS frame; since node j is within communication range of node i. Therefore, node j will also receive the RTS frame, but since the destination node of the RTS frame is not the destination node of node j, node j will not set the NAV according to the Duration field information of the RTS;
after receiving the RTS frame, the node j waits for an RIFS and judges whether a CTS frame is received or not; since the parent node e is not in the communication range of the node i, the node e does not return a CTS after SIFS; therefore, node j does not receive a CTS frame after RIFS; node j sends RTS frame to father node e;
the nodes in the network execute the above processes, so that the problems of data conflict and channel resource waste caused by hidden terminals and exposed terminals in the network can be effectively solved; thus, the successful transmission probability can be rewritten as:
based on the RTS/DCTS mechanism, data collision between data links under adjacent father nodes on the same channel can be effectively avoided through SIFS and CTS; in addition, the channel access mechanism introduces RIFS interframe space to solve the problem of violent terminals in the network, thereby improving the successful transmission probability of the nodes, namely
Therefore, the channel access mechanism can improve the successful transmission probability of the nodes in the network;
in addition, P can be seen from the above formula
sAnd parameters
n
aAnd n
fDirectly related, while the parameter n
s,n
aAnd n
fCan be further optimized by optimizing the channel allocation strategy; therefore, the embodiment of the invention ensures the successful transmission probability of the node on the channel
As part of the channel allocation model reward function, to further optimize network performance.
The channel allocation and channel access mechanism provided by the embodiment of the invention optimizes channel resources from different layers, and the channel allocation optimizes the channel resources from the frequency domain and the channel access from the time domain. In addition, a reasonable channel allocation mechanism can further alleviate the interference problem in the channel access process, and the channel access performance of the node can further optimize the channel allocation strategy.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.