CN113840306B

CN113840306B - Distributed wireless network access decision method based on network local information interaction

Info

Publication number: CN113840306B
Application number: CN202010591293.0A
Authority: CN
Inventors: 朱磊; 范浩人; 姚昌华; 王磊; 杨健; 童玮
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2023-07-21
Anticipated expiration: 2040-06-24
Also published as: CN113840306A

Abstract

The invention discloses a distributed wireless network access decision method based on network local information interaction. A Deep learning framework based on a model learning commentator is designed, a CNN-Deep Q-net access decision strategy Deep network, a Deep network model-net for learning an influence model of global network information on link rate and a link rate prediction Deep network prediction-net based on LSTM are designed, and three Deep networks form a Deep reinforcement learning algorithm framework based on model learning under the condition that network state parts are known. The method solves the decision access problem of the complex dynamic wireless network by utilizing intelligent algorithm deep reinforcement learning.

Description

Distributed wireless network access decision method based on network local information interaction

Technical Field

The invention relates to the field of distributed wireless networks, in particular to a distributed wireless network access decision method based on network local information interaction.

Background

The problem of distributed wireless network link access is to adaptively select proper wireless resources for communication according to communication requirements under complex and dynamically changing network conditions. Wherein power control and channel access are key technologies to achieve maximum utilization of radio spectrum resources. The network communication quality and the information transmission rate are both related to the transmission power, and the transmission power also determines that the energy consumed by the node communication affects the service life of the node. Aiming at terminal users with limited electric quantity such as mobile phones, portable computers, sensors and the like, the transmission power also influences the service life of the users; the reasonable decision communication link access channel can maximally utilize wireless resources, and improve communication quality and spectrum efficiency. Communication channel and transmission power of reasonable joint decision communication link under requirement of guaranteeing communication quality

The wireless network link access problem is a decision-making problem in nature, and how to make an access decision-making policy is a key point for solving the problem. Decision control algorithms for such problems are mainly: decision algorithms based on a loss function and traditional optimization algorithms.

The network access selection algorithm based on the loss function is the simplest method, the method is to sort the loss functions according to different candidate strategies, and the final strategy is to make the decision of selecting the minimum loss function value as the best scheme. The loss function is a function with various network metrics and parameters as inputs, but sometimes cannot adapt to more complex dynamic networks because the loss function is more fixed.

The decision problem and the resource allocation problem are combined to be constructed into an optimization problem, the original optimization problem is simplified into a low-complexity problem by using a plurality of relaxation methods, and then the secondary optimal solution is obtained by using a traditional optimization method. In view of the problem of channel access, spectrum sensing is an important technology, and most of researches are performed on sensing optimization, such as optimizing sensing duration, optimizing sensing accuracy or power distribution, to obtain network communication performance indexes such as maximum throughput of a network or ensuring minimum network delay. And secondly, modeling the problem as a Markov model for the time sequence decision problem of the wireless channel, and deciding according to a correlation value function. Since the accuracy of the assumed model cannot be guaranteed, the research problem is greatly limited, and the algorithm complexity of the optimization problem is high, so that the method is difficult to adapt to the network under high dynamic state.

Disclosure of Invention

The invention aims to provide a distributed wireless network access decision method based on network local information interaction, which can be applied to a distributed wireless network link access decision system with regular dynamic changes. The method solves the problem of dynamic access decision of the distributed wireless network links of the regular change network state, and improves the link energy efficiency. And at each time slot T, a link in the network selects a channel and power accessed by the link according to a decision strategy, and performs information interaction with a neighbor link to obtain network local information in communication time T.

The technical solution for realizing the purpose of the invention is as follows: a distributed wireless network link access decision method based on a model commentator deep reinforcement learning framework comprises the following steps:

s1: designing a model perceptron neural network learning environment model, designing a neural network structure, designing neural network training data according to known network local information, and training the neural network;

s2, designing a predicted network depth neural network structure, and designing model perceptron depth neural network training data by using the model perceptron depth neural network and known network local information to train the neural network;

s3, designing an agent decision rewarding function by utilizing the predicted deep neural network and the known network state information;

s4, designing Deep Q-net Deep neural network structure, designing Deep Q-net Deep neural network training data by using an agent decision rewarding function and known interaction information, training the neural network, and obtaining a distributed wireless network access decision strategy by using the trained neural network.

Compared with the prior art, the invention has the remarkable advantages that: (1) Network state local information is obtained from neighbor link nodes with only a small amount of communication cost and saved as historical information for training the designed three deep networks. (2) The method does not need to model reality but learn a reality model, and continuously carries out interactive learning under a dynamic network environment, so that a decision strategy is dynamically and adaptively made to meet the communication requirement of a user, the utilization efficiency of link energy is improved, the service time of equipment is prolonged, and the consumption cost is saved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a block diagram of a model-net network of the method of the present invention.

Fig. 3 is a diagram of the LSTM network structure of the method of the present invention.

Fig. 4 is a network structure diagram of a CNN-Deep Q net of the method of the present invention.

Fig. 5 is a block diagram of an algorithm of the method of the present invention.

Fig. 6 is a distributed network topology of the method of the present invention.

Detailed Description

The invention makes the link in the network intelligent from the self-point of the link, and makes self-adaptive decision according to the local network information acquired by the self-adaptive decision.

The invention relates to a distributed wireless network access decision method based on network local information interaction, which comprises the following steps:

s1: preserving training data required by the deep network at each decision time slot t

Link l in a distributed wireless network _o For decision maker, link l _o The network link state information is sent between each decision time slot t link and the neighbor node thereof, and related interaction information comprises: link transceiver geographic information, link communication channel, link transmission node transmit power, link communication rate vector. And forms a coordinate vector group with the related data thereofCommunication channel vector->Transmitting power vector of transmitting node->And link rate vector->The interaction information is regarded as the characteristic information of the network environment state as s _t ，/>And simultaneously makes a link access decision t at the moment t. And link rate at time t is +.>a _t 、s _t And decision-derived return->Stored as historical state information.

S2, training three designed depth networks by utilizing network history state information

(1) Utilizing neighbor node coordinate vector sets in stored history information for each decision time slot t in the stored history informationLink communication channel vector->Transmitting power vector of transmitting node->And link rate vectorAs input to model learning network model-net, i.e. +.>Wherein->Representing the rate +.>And let->Training tags as model-net;

(2) Training model-net;

(3) Selecting historical network state information over a period of time from the saved data as input to a prediction-netThe length of the material is T,

(4) Will s _t Conversion to model-net input dataThe input model-net obtains the maximum communication rate achieved by the link at time t by calculation>In addition->A tag that is predictive-net;

(5) Training the prediction-net;

(6) Composing training data required by CNN-Deep Q-net by using information stored in each decision time slot t

(7) According to e _t S in the data of (2) _t+1 Generating predictive-net input data from historical information-related data

(8) Will beInput prediction-net get +.>And obtaining maxQ (S) according to the reorder function _t+1 ,*)；

(9) Will S _t Inputting CNN-Deep Q-net, and inputting CNN-Deep Q-net and a _t The corresponding label is

(10) Training CNN-Deep Q-net.

The invention is further described below with reference to the drawings.

As shown in fig. 1, the present invention provides a distributed wireless network state part-aware downlink access decision method based on deep reinforcement learning of model learning, comprising the following steps:

1. each time slot t-link l _o Executing decision a according to the current decision strategy _t Wherein the decision strategy is: a random decision strategy is adopted when training model-net and prediction-net; and after the two networks are trained, adopting a CNN-Deep Q-net to output a decision strategy according to the obtained input data.

2. When taking decision a _t Rear link l _o Calculating decision returns based on self-link communication qualityWherein the method comprises the steps of

3. Network link information interaction is carried out with a communication neighbor link to obtain network local information, and each communication time delta t=t-t-1And (5) preserving.

4. The link rate model based on network local information is learned by the model-net involved, with a specific network architecture as shown in fig. 2. Wherein the input data isThe tag data isThe loss function of the network is: />The network parameter updating mode is gradient updating.

After Model-net training is completed, a Model-net learning Model is utilized to train the prediction-net. The Prediction-net is used for predicting the link l of the next communication period _o The maximum link rate can be reached in the decision space. The network structure is an LSTM network, and the structure is shown in figure 3.

6. And extracting the stored historical data, generating training data with a time sequence relation, training the prediction-net, and predicting the link communication rate at the next moment by utilizing the characteristics of the LSTM structure. The training data is historical data in a period of time, and the time length is T. Wherein s is to _t Conversion to the input data format required by the trained model-net, i.eCalculating the maximum link rate in the decision space at time t by means of model-net>the data from time T-1 to T-T+1 are used as input data, i.e. +.>The network Loss function is->

7. Training CNN-Deep Q-net. The CNN-Deep Q-net is a decision strategy network, and the decision of each time t is obtained through the output of the network, and the network structure is shown in figure 4. The training input data also takes historical data with the time length of T, wherein the information of each time TIs thatthe information from T-1 to T-T+1 is

The training label of CNN-Deep Q-net is that the prediction-net is required to participate in generation. Converting the data from T to T-T+1 time toObtain->In addition->

9. After CNN-Deep Q-net training is finished, link l _o The decision strategy is given by the network every moment. Thereby achieving the aim of improving the energy efficiency of the link.

Claims

1. A distributed wireless network access decision method based on network local information interaction is characterized by comprising the following steps:

s1: designing a model perceptron deep neural network learning environment model, designing a neural network structure, designing neural network training data according to known network local information, and training the neural network;

in the step S1, three deep neural networks are designed to provide decision strategies for users, and training data required by the deep neural networks are stored in each decision time slot t;

link l in a distributed wireless network _o For decision maker, link l _o The network link state information is sent between each decision time slot t link and the neighbor node thereof, and related interaction information comprises: geographical information of a link transceiver, a link communication channel, transmission power of a link transmission node and a link communication rate vector; and forms a coordinate vector group with the related data thereofCommunication channel vector->Transmitting power vector of transmitting node->And link rate vector->The interaction information is regarded as the characteristic information of the network environment state as s _t ，/>And making a link access decision t at the moment t at the same time; and link rate at time t is +.>a _t 、s _t And the return r obtained by decision _at Saving as history state information;

in the step S2, three deep networks are designed by training network history state information:

(2) Training model-net;

(3) Historical network state information over a period of time, selected from the saved data, is used as an input to a prediction-net, of length T,

(5) Training the prediction-net;

(6) Composing training data e required by CNN-Deep Q-net by using information stored in each decision time slot t _t ＝(S _t ,a _t ,r _at ,S _t+1 )；

(10) Training CNN-Deep Q-net;