CN113840306B - Distributed wireless network access decision method based on network local information interaction - Google Patents

Distributed wireless network access decision method based on network local information interaction Download PDF

Info

Publication number
CN113840306B
CN113840306B CN202010591293.0A CN202010591293A CN113840306B CN 113840306 B CN113840306 B CN 113840306B CN 202010591293 A CN202010591293 A CN 202010591293A CN 113840306 B CN113840306 B CN 113840306B
Authority
CN
China
Prior art keywords
network
net
deep
link
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010591293.0A
Other languages
Chinese (zh)
Other versions
CN113840306A (en
Inventor
朱磊
范浩人
姚昌华
王磊
杨健
童玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Engineering University of PLA
Original Assignee
Army Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Engineering University of PLA filed Critical Army Engineering University of PLA
Priority to CN202010591293.0A priority Critical patent/CN113840306B/en
Publication of CN113840306A publication Critical patent/CN113840306A/en
Application granted granted Critical
Publication of CN113840306B publication Critical patent/CN113840306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a distributed wireless network access decision method based on network local information interaction. A Deep learning framework based on a model learning commentator is designed, a CNN-Deep Q-net access decision strategy Deep network, a Deep network model-net for learning an influence model of global network information on link rate and a link rate prediction Deep network prediction-net based on LSTM are designed, and three Deep networks form a Deep reinforcement learning algorithm framework based on model learning under the condition that network state parts are known. The method solves the decision access problem of the complex dynamic wireless network by utilizing intelligent algorithm deep reinforcement learning.

Description

Distributed wireless network access decision method based on network local information interaction
Technical Field
The invention relates to the field of distributed wireless networks, in particular to a distributed wireless network access decision method based on network local information interaction.
Background
The problem of distributed wireless network link access is to adaptively select proper wireless resources for communication according to communication requirements under complex and dynamically changing network conditions. Wherein power control and channel access are key technologies to achieve maximum utilization of radio spectrum resources. The network communication quality and the information transmission rate are both related to the transmission power, and the transmission power also determines that the energy consumed by the node communication affects the service life of the node. Aiming at terminal users with limited electric quantity such as mobile phones, portable computers, sensors and the like, the transmission power also influences the service life of the users; the reasonable decision communication link access channel can maximally utilize wireless resources, and improve communication quality and spectrum efficiency. Communication channel and transmission power of reasonable joint decision communication link under requirement of guaranteeing communication quality
The wireless network link access problem is a decision-making problem in nature, and how to make an access decision-making policy is a key point for solving the problem. Decision control algorithms for such problems are mainly: decision algorithms based on a loss function and traditional optimization algorithms.
The network access selection algorithm based on the loss function is the simplest method, the method is to sort the loss functions according to different candidate strategies, and the final strategy is to make the decision of selecting the minimum loss function value as the best scheme. The loss function is a function with various network metrics and parameters as inputs, but sometimes cannot adapt to more complex dynamic networks because the loss function is more fixed.
The decision problem and the resource allocation problem are combined to be constructed into an optimization problem, the original optimization problem is simplified into a low-complexity problem by using a plurality of relaxation methods, and then the secondary optimal solution is obtained by using a traditional optimization method. In view of the problem of channel access, spectrum sensing is an important technology, and most of researches are performed on sensing optimization, such as optimizing sensing duration, optimizing sensing accuracy or power distribution, to obtain network communication performance indexes such as maximum throughput of a network or ensuring minimum network delay. And secondly, modeling the problem as a Markov model for the time sequence decision problem of the wireless channel, and deciding according to a correlation value function. Since the accuracy of the assumed model cannot be guaranteed, the research problem is greatly limited, and the algorithm complexity of the optimization problem is high, so that the method is difficult to adapt to the network under high dynamic state.
Disclosure of Invention
The invention aims to provide a distributed wireless network access decision method based on network local information interaction, which can be applied to a distributed wireless network link access decision system with regular dynamic changes. The method solves the problem of dynamic access decision of the distributed wireless network links of the regular change network state, and improves the link energy efficiency. And at each time slot T, a link in the network selects a channel and power accessed by the link according to a decision strategy, and performs information interaction with a neighbor link to obtain network local information in communication time T.
The technical solution for realizing the purpose of the invention is as follows: a distributed wireless network link access decision method based on a model commentator deep reinforcement learning framework comprises the following steps:
s1: designing a model perceptron neural network learning environment model, designing a neural network structure, designing neural network training data according to known network local information, and training the neural network;
s2, designing a predicted network depth neural network structure, and designing model perceptron depth neural network training data by using the model perceptron depth neural network and known network local information to train the neural network;
s3, designing an agent decision rewarding function by utilizing the predicted deep neural network and the known network state information;
s4, designing Deep Q-net Deep neural network structure, designing Deep Q-net Deep neural network training data by using an agent decision rewarding function and known interaction information, training the neural network, and obtaining a distributed wireless network access decision strategy by using the trained neural network.
Compared with the prior art, the invention has the remarkable advantages that: (1) Network state local information is obtained from neighbor link nodes with only a small amount of communication cost and saved as historical information for training the designed three deep networks. (2) The method does not need to model reality but learn a reality model, and continuously carries out interactive learning under a dynamic network environment, so that a decision strategy is dynamically and adaptively made to meet the communication requirement of a user, the utilization efficiency of link energy is improved, the service time of equipment is prolonged, and the consumption cost is saved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a block diagram of a model-net network of the method of the present invention.
Fig. 3 is a diagram of the LSTM network structure of the method of the present invention.
Fig. 4 is a network structure diagram of a CNN-Deep Q net of the method of the present invention.
Fig. 5 is a block diagram of an algorithm of the method of the present invention.
Fig. 6 is a distributed network topology of the method of the present invention.
Detailed Description
The invention makes the link in the network intelligent from the self-point of the link, and makes self-adaptive decision according to the local network information acquired by the self-adaptive decision.
The invention relates to a distributed wireless network access decision method based on network local information interaction, which comprises the following steps:
s1: preserving training data required by the deep network at each decision time slot t
Link l in a distributed wireless network o For decision maker, link l o The network link state information is sent between each decision time slot t link and the neighbor node thereof, and related interaction information comprises: link transceiver geographic information, link communication channel, link transmission node transmit power, link communication rate vector. And forms a coordinate vector group with the related data thereofCommunication channel vector->Transmitting power vector of transmitting node->And link rate vector->The interaction information is regarded as the characteristic information of the network environment state as s t ,/>And simultaneously makes a link access decision t at the moment t. And link rate at time t is +.>a t 、s t And decision-derived return->Stored as historical state information.
S2, training three designed depth networks by utilizing network history state information
(1) Utilizing neighbor node coordinate vector sets in stored history information for each decision time slot t in the stored history informationLink communication channel vector->Transmitting power vector of transmitting node->And link rate vectorAs input to model learning network model-net, i.e. +.>Wherein->Representing the rate +.>And let->Training tags as model-net;
(2) Training model-net;
(3) Selecting historical network state information over a period of time from the saved data as input to a prediction-netThe length of the material is T,
(4) Will s t Conversion to model-net input dataThe input model-net obtains the maximum communication rate achieved by the link at time t by calculation>In addition->A tag that is predictive-net;
(5) Training the prediction-net;
(6) Composing training data required by CNN-Deep Q-net by using information stored in each decision time slot t
(7) According to e t S in the data of (2) t+1 Generating predictive-net input data from historical information-related data
(8) Will beInput prediction-net get +.>And obtaining maxQ (S) according to the reorder function t+1 ,*);
(9) Will S t Inputting CNN-Deep Q-net, and inputting CNN-Deep Q-net and a t The corresponding label is
(10) Training CNN-Deep Q-net.
S3, designing an agent decision rewarding function by utilizing the predicted deep neural network and the known network state information;
s4, designing Deep Q-net Deep neural network structure, designing Deep Q-net Deep neural network training data by using an agent decision rewarding function and known interaction information, training the neural network, and obtaining a distributed wireless network access decision strategy by using the trained neural network.
The invention is further described below with reference to the drawings.
As shown in fig. 1, the present invention provides a distributed wireless network state part-aware downlink access decision method based on deep reinforcement learning of model learning, comprising the following steps:
1. each time slot t-link l o Executing decision a according to the current decision strategy t Wherein the decision strategy is: a random decision strategy is adopted when training model-net and prediction-net; and after the two networks are trained, adopting a CNN-Deep Q-net to output a decision strategy according to the obtained input data.
2. When taking decision a t Rear link l o Calculating decision returns based on self-link communication qualityWherein the method comprises the steps of
3. Network link information interaction is carried out with a communication neighbor link to obtain network local information, and each communication time delta t=t-t-1And (5) preserving.
4. The link rate model based on network local information is learned by the model-net involved, with a specific network architecture as shown in fig. 2. Wherein the input data isThe tag data isThe loss function of the network is: />The network parameter updating mode is gradient updating.
After Model-net training is completed, a Model-net learning Model is utilized to train the prediction-net. The Prediction-net is used for predicting the link l of the next communication period o The maximum link rate can be reached in the decision space. The network structure is an LSTM network, and the structure is shown in figure 3.
6. And extracting the stored historical data, generating training data with a time sequence relation, training the prediction-net, and predicting the link communication rate at the next moment by utilizing the characteristics of the LSTM structure. The training data is historical data in a period of time, and the time length is T. Wherein s is to t Conversion to the input data format required by the trained model-net, i.eCalculating the maximum link rate in the decision space at time t by means of model-net>the data from time T-1 to T-T+1 are used as input data, i.e. +.>The network Loss function is->
7. Training CNN-Deep Q-net. The CNN-Deep Q-net is a decision strategy network, and the decision of each time t is obtained through the output of the network, and the network structure is shown in figure 4. The training input data also takes historical data with the time length of T, wherein the information of each time TIs thatthe information from T-1 to T-T+1 is
The training label of CNN-Deep Q-net is that the prediction-net is required to participate in generation. Converting the data from T to T-T+1 time toObtain->In addition->
9. After CNN-Deep Q-net training is finished, link l o The decision strategy is given by the network every moment. Thereby achieving the aim of improving the energy efficiency of the link.

Claims (1)

1. A distributed wireless network access decision method based on network local information interaction is characterized by comprising the following steps:
s1: designing a model perceptron deep neural network learning environment model, designing a neural network structure, designing neural network training data according to known network local information, and training the neural network;
in the step S1, three deep neural networks are designed to provide decision strategies for users, and training data required by the deep neural networks are stored in each decision time slot t;
link l in a distributed wireless network o For decision maker, link l o The network link state information is sent between each decision time slot t link and the neighbor node thereof, and related interaction information comprises: geographical information of a link transceiver, a link communication channel, transmission power of a link transmission node and a link communication rate vector; and forms a coordinate vector group with the related data thereofCommunication channel vector->Transmitting power vector of transmitting node->And link rate vector->The interaction information is regarded as the characteristic information of the network environment state as s t ,/>And making a link access decision t at the moment t at the same time; and link rate at time t is +.>a t 、s t And the return r obtained by decision at Saving as history state information;
s2, designing a predicted network depth neural network structure, and designing model perceptron depth neural network training data by using the model perceptron depth neural network and known network local information to train the neural network;
in the step S2, three deep networks are designed by training network history state information:
(1) Utilizing neighbor node coordinate vector sets in stored history information for each decision time slot t in the stored history informationLink communication channel vector->Transmitting power vector of transmitting node->And link rate vectorAs input to model learning network model-net, i.e. +.>Wherein->Representing the rate +.>And let->Training tags as model-net;
(2) Training model-net;
(3) Historical network state information over a period of time, selected from the saved data, is used as an input to a prediction-net, of length T,
(4) Will s t Conversion to model-net input dataThe input model-net obtains the maximum communication rate achieved by the link at time t by calculation>In addition->A tag that is predictive-net;
(5) Training the prediction-net;
(6) Composing training data e required by CNN-Deep Q-net by using information stored in each decision time slot t t =(S t ,a t ,r at ,S t+1 );
(7) According to e t S in the data of (2) t+1 Generating predictive-net input data from historical information-related data
(8) Will beInput prediction-net get +.>And obtaining maxQ (S) according to the reorder function t+1 ,*);
(9) Will S t Inputting CNN-Deep Q-net, and inputting CNN-Deep Q-net and a t The corresponding label is
(10) Training CNN-Deep Q-net;
s3, designing an agent decision rewarding function by utilizing the predicted deep neural network and the known network state information;
s4, designing Deep Q-net Deep neural network structure, designing Deep Q-net Deep neural network training data by using an agent decision rewarding function and known interaction information, training the neural network, and obtaining a distributed wireless network access decision strategy by using the trained neural network.
CN202010591293.0A 2020-06-24 2020-06-24 Distributed wireless network access decision method based on network local information interaction Active CN113840306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010591293.0A CN113840306B (en) 2020-06-24 2020-06-24 Distributed wireless network access decision method based on network local information interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010591293.0A CN113840306B (en) 2020-06-24 2020-06-24 Distributed wireless network access decision method based on network local information interaction

Publications (2)

Publication Number Publication Date
CN113840306A CN113840306A (en) 2021-12-24
CN113840306B true CN113840306B (en) 2023-07-21

Family

ID=78964910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010591293.0A Active CN113840306B (en) 2020-06-24 2020-06-24 Distributed wireless network access decision method based on network local information interaction

Country Status (1)

Country Link
CN (1) CN113840306B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 Terminal access selection method based on deep reinforcement learning
CN110958680A (en) * 2019-12-09 2020-04-03 长江师范学院 Energy efficiency-oriented unmanned aerial vehicle cluster multi-agent deep reinforcement learning optimization method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 Terminal access selection method based on deep reinforcement learning
CN110958680A (en) * 2019-12-09 2020-04-03 长江师范学院 Energy efficiency-oriented unmanned aerial vehicle cluster multi-agent deep reinforcement learning optimization method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Deep reinforcement learning for energy efficiency optimization in wireless networks;Haoren Fan, etc.;《IEEE》;全文 *

Also Published As

Publication number Publication date
CN113840306A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
CN110809306B (en) Terminal access selection method based on deep reinforcement learning
CN112383922B (en) Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN110113190A (en) Time delay optimization method is unloaded in a kind of mobile edge calculations scene
CN109936865B (en) Mobile sink path planning method based on deep reinforcement learning algorithm
Yang et al. Deep reinforcement learning based wireless network optimization: A comparative study
CN112492691A (en) Downlink NOMA power distribution method of deep certainty strategy gradient
CN111491358A (en) Adaptive modulation and power control system based on energy acquisition and optimization method
Li et al. Joint scheduling design in wireless powered MEC IoT networks aided by reconfigurable intelligent surface
Kashyap et al. Deep learning based offloading scheme for IoT networks towards green computing
CN115065678A (en) Multi-intelligent-device task unloading decision method based on deep reinforcement learning
CN110278570B (en) Wireless communication system based on artificial intelligence
Manalastas et al. Where to go next?: A realistic evaluation of AI-assisted mobility predictors for HetNets
Saraiva et al. Deep reinforcement learning for QoS-constrained resource allocation in multiservice networks
CN115065728A (en) Multi-strategy reinforcement learning-based multi-target content storage method
CN114885340A (en) Ultra-dense wireless network power distribution method based on deep transfer learning
Fowdur et al. A review of machine learning techniques for enhanced energy efficient 5G and 6G communications
CN112738849B (en) Load balancing regulation and control method applied to multi-hop environment backscatter wireless network
Ghanshala et al. Self-organizing sustainable spectrum management methodology in cognitive radio vehicular adhoc network (CRAVENET) environment: a reinforcement learning approach
Chen et al. iPAS: A deep Monte Carlo Tree Search-based intelligent pilot-power allocation scheme for massive MIMO system
Liu et al. Power allocation in ultra-dense networks through deep deterministic policy gradient
CN110661566B (en) Unmanned aerial vehicle cluster networking method and system adopting depth map embedding
CN113840306B (en) Distributed wireless network access decision method based on network local information interaction
Das et al. Reinforcement learning-based resource allocation for M2M communications over cellular networks
CN116112934A (en) End-to-end network slice resource allocation method based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant