Background
With the continuous deepening of intelligent substation construction, the substation business is more and more complex, and the network is updated and has higher requirements. The wireless communication technology, in particular to the 5 th generation mobile communication (5G) has the technical advantages of high reliability, low time delay, large bandwidth and wide connection, can effectively meet the application requirements of complex transformer substation scenes and various service requirements, and better promotes the construction of intelligent transformer substations. In the current heterogeneous wireless network environment, in order to meet the service communication requirements of the user terminal, the complementary characteristics among various networks are fully utilized, and the method for selecting the access network is required to be researched based on the characteristics of the substation service and the wireless network, so that the appropriate network is selected for different services, the network resource utilization is optimized, and the service quality is improved.
At present, common access network selection algorithm researches mainly comprise an access network selection algorithm based on multi-attribute decision, an access network selection algorithm based on an optimization strategy, an access network selection algorithm based on artificial intelligence and the like. The access network selection algorithm based on the multi-attribute decision comprehensively considers a plurality of attributes of the network, makes trade-off and coordination among a plurality of targets, and has balanced overall performance but higher complexity. The literature [2] utilizes the analytic hierarchy process and the gray correlation analysis process jointly to analyze the user preference so as to obtain attribute weights, and further designs a joint optimal selection scheme, so that the throughput of the system is effectively improved; document [3] based on multi-attribute decision theory and fuzzy logic theory, comprehensively evaluating weights of user quality of service (Quality of Service, qoS), cost and network load attribute, the proposed network selection scheme meets the multi-quality of service requirement. An access network selection algorithm based on an optimization strategy may optimize user traffic performance in some way. The literature [4] utilizes dynamic planning theory to design an access network selection scheme with the maximization of network transmission rate as a target; the literature [5] designs an access network selection algorithm with the aim of maximizing the ratio of throughput to power consumption; the literature [6] aims at maximizing the signal-to-interference-and-noise ratio, and determines an optimal network resource allocation strategy; however, the utility functions used at present are all measured by traditional indexes, and sometimes cannot meet the higher business requirements of more users. The access network selection algorithm based on artificial intelligence is beneficial to searching the optimal strategy in the complex environment with high uncertainty. Document [7] combines game theory and machine learning to target the design of network selection algorithms to optimize user experience and reduce average delay. Document [8] establishes an optimization model for throughput maximization, training samples using deep reinforcement learning and transfer learning to obtain an optimal allocation strategy. However, at present, access selection strategies based on artificial intelligence mostly optimize network specific performance under the condition of single service type, and ignore diversified requirements of different services.
Aiming at the complex scene of the transformer substation, the network access resource allocation strategy optimization needs to be carried out by comprehensively considering the diversified requirements of different services, threshold constraints and the like. Because the business of the transformer substation is more real-time, the outdated data can cause untimely state monitoring so as to cause safety problems, and the freshness of the data is critical to safe and stable operation [9] . In this case, it is insufficient to consider only low delay, should be combinedAnd joint design and optimization are performed by considering information generation, transmission and channel states. To this end, the optimal access selection strategy is obtained herein using Deep Q-Learning (DQN) with the goal of minimizing the average information age. The concept of information age is proposed by Kaul et al to measure the freshness of the received information, expressed as the time difference between the time the target node successfully received the packet and the current time. The smaller the information age, the higher the information freshness. Document [10 ]]AoI (Age of Information, aoI) related metrics are presented, compared, and related studies are summarized. Factors affecting AoI include, but are not limited to, latency, reliability, data update period, and the like. Compared with the traditional measurement standards such as time delay, the AoI measurement is more comprehensive [11] . In recent years, many documents develop researches on link scheduling, data updating methods and the like by taking AoI as a measurement index, and particularly, the research has been receiving a great deal of attention in a multi-user wireless transmission scene. Document [12]Scheduling data transmission links with network AAoI (Average Age of Information, AAoI) minimization as an optimization objective, literature [13 ]]The information age of the radio frequency energy acquisition network is studied, the long-term AAoI minimization is taken as a target, the online transmission strategy of the wireless network is optimized, and the document [14 ]]The scheduling policy is designed as a function of the weighted sum of the time-varying AoI of the different sensors to minimize the system AAoI.
The existing access research is mostly aimed at public network scenes, the access network selection research in transformer substation scenes is less, only the traditional time delay index is considered, and AoI is not used for analysis and optimization. Thus, it is the focus of the study herein to study access network selection algorithms that minimize AAoI in substation scenarios. Firstly, taking AAoI minimization as an optimization target to establish a heterogeneous network access selection model, then utilizing a DQN algorithm to perform analysis and optimization so as to obtain an optimal strategy for distributing network resources to different services under the condition of minimizing total AAoI, improving the freshness of information update, and finally verifying the reliability and effectiveness of the proposed theory and algorithm through simulation.
Disclosure of Invention
The invention aims to provide a multi-frequency heterogeneous wireless communication network access selection algorithm suitable for substation business, which is characterized by comprising the following steps:
1. establishing a multi-frequency heterogeneous wireless communication network model
The multi-frequency heterogeneous wireless communication network is a heterogeneous wireless network formed by a plurality of wireless technologies consisting of WLAN, 4G base stations and 5G base stations; communication and transmission business services are provided for the substation terminal;
the terminal of the transformer substation is provided with a multimode interface, and all networks in the area of transmission service can be selectively accessed from the multimode interface; the substation terminals are assumed to be uniformly distributed in the coverage area, and the types and the numbers of the transmission services of the terminals are random. Collecting network state information by adopting a centralized controller, wherein when a substation terminal sends an access request, the centralized controller distributes a proper network to the transmission service of each terminal according to an access network selection scheme;
the transmission service of the substation terminal provided by the heterogeneous wireless network is Z, Z epsilon Z; z=1, 2,; an alternative network is denoted as L, L e l=1, 2,; in the substation scenario, in order to ensure the freshness AoI of the received information, the data of the transmission service z needs to be frequently updated; aoI at any instant is expressed as the difference between the current time T and the time of generation of the last successfully received packet T (T), expressed as:
A(t)=t-T(t) (1)
wireless network access selection optimization algorithm based on information age
All networks in the transmission service time zone can be selected to be accessed from a multimode interface, which is solved by adopting a wireless network access selection optimization algorithm based on information age, so as to effectively improve the information freshness of the service, select the optimization algorithm by taking AAoI (advanced active area infrastructure) of service transmission as a target, consider performance constraints of transmission rate, transmission delay and bit error rate, form an optimization problem based on AAoI minimization, and adopt a reinforcement learning framework to find an optimal decision; and solving the optimization problem through the DQN to realize that a user selects an access network according to service requirements.
The transmission is performed to ensure the freshness AoI of the received informationThe data of the service z needs to be frequently updated; let A z,i (t) is AoI value of the ith data packet at time t in the process of transmitting the update of the transmission service z, which can be expressed as:
in the formula (2), the amino acid sequence of the compound,
indicating the generation time of the ith successfully received data of terminal service z +.>
Indicating the reception time of the ith successfully received data of the transmission service z of the terminal, A
z,i (t) AoI, which represents the ith successfully received data of the service z at time t;
when the receiving end fails to receive the transmitted data, the generation time of the data packet sent by the sending end is due to
The AoI value remains linearly increasing over time without change;
let the probability of successful transmission be P n N=1, 2, N; the packet loss rate is 1-P n Namely, the formula (2) can be expressed as:
terminal service z is at T z A (i) The peak AoI at time is:
peak AoI represents the maximum value of system AoI; to avoid the situation that the data is not updated for a long time, the peak value AoI during service transmission is ensured not to exceed the AoI maximum value of the service requirement; thereby obtaining the constraint condition:
the freshness of the received information is guaranteed, and the data of the terminal service z needs to be frequently updated, the freshness of the data transmitted by the network is taken as a measurement index, and the smaller the AAoI is, the higher the freshness of the data is;
the AAoI is the sum of information ages over a period of time (S z,1 +S z,2 +...+S z,i ) Divided by the total length of time, where S z,i Representing the age of the information within a certain time;
recording device
C
z,i and Y
z,i Random variables of the interval time and the system delay are transmitted corresponding to the state of the transmission service z, respectively. Then formula (6) can be expressed as:
the freshness AAoI of the network transmission data of the transmission service z can be expressed as:
all "E" in (8)]"is the desired operator;
the lower the global AAoI in the network is, the higher the freshness of the information of the terminal service is; optimizing AAoI of network to promote data transmission andtimeliness.
When selecting a bearer network for user service, in order to ensure the service transmission to be effective and reliable, the performance index must be optimized on the premise that the performance index meets the service threshold requirement. It is therefore necessary to satisfy the respective constraints when allocating network resources for different services to ensure efficient and reliable service transmission. When the transmission traffic z is transmitted over the network l, the signal-to-interference-and-noise ratio can be expressed as:
wherein p
z,l 、p
i,l The power transmitted by the service z and the service i on the network l are respectively; h is a
z,l 、h
i,l Channel state gains when the network l transmits the service z and the service i respectively;
for the additive white gaussian noise power of the network l, the present invention defaults to transmitting noise as gaussian noise. According to shannon's formula, when the bandwidth resource allocated by the network l for the service z is B
z,l When the service z is transmitted in the network l, the maximum information transmission rate that can be achieved can be defined as:
R z,l =B z,l log 2 (1+SINR z,l ) (10)
to ensure that the information transmission rate of the network l when transmitting the traffic z is greater than the minimum transmission rate requirement R of the transmission traffic z for the network to transmit data zmin Constraints can be obtained:
the transmission delay of the network l when transmitting the service z is expressed as:
wherein ,Mz The size of the data to be transmitted for the terminal service z; c z,l Is a binary network selection state variable representing the selection state of a service to a network, c z,l E {0,1}. To ensure that the transmission delay of the network l in transmitting the service z is smaller than the delay threshold D of the transmission service z zmax Then the constraint needs to be satisfied:
the error rate of the transmission traffic z when transmitted in the network l is expressed as:
wherein ,Ml Is the modulation index of network l. To ensure the reliability of service transmission, the access network selection needs to satisfy the bit error rate constraint, namely:
2. wireless network access selection optimization algorithm based on information age
The access selection optimization algorithm, which aims at minimizing AAoI of transmission service, effectively improves information freshness of service, realizes that a user selects an access network according to service requirements, and comprises the following steps:
2.1 optimization target modeling
Taking the performance constraints of the transmission rate, the transmission delay and the bit error rate into consideration, forming an optimization problem based on AAoI minimization, solving the optimization problem through DQN, and combining the above formula (5), formula (11), formula (13) and formula (15), the multi-user access selection optimization problem under the heterogeneous wireless network can be described as follows:
in the formula (16), C 1 Indicating that any service z can only select one network access; c (C) 2 The peak AoI representing the transmission traffic z cannot exceed the maximum AoI of the traffic demand; c (C) 3 Indicating that the resources occupied by all services accessing the network i cannot be greater than the available bandwidth of the network; c (C) 4 Indicating that the rate at which traffic z is transmitted cannot be less than the minimum transmission rate required by that traffic; c (C) 5 Indicating that the total delay generated by transmitting the service z cannot exceed the maximum delay required by the service; c (C) 6 Indicating that the bit error rate generated by the transmission service z cannot exceed the maximum bit error rate required by the service.
2.2 DQN optimization algorithm
The optimization problem of AAoI minimization solves the optimization problem through DQN, and the optimization problem is Markov (MDP) in nature, namely future decisions are only related to the current state, so the optimization problem is represented by a markov decision process; the reinforcement learning framework is defined based on MDP, has potential markov, and such decisions are model-free, so equation (16) can employ the reinforcement learning framework to find the optimal decision; solving the proposed network access selection model by using an DQN algorithm;
the Q learning algorithm is a widely used reinforcement learning algorithm, takes a Bellman equation as a core, and iteratively updates a function by using a form record mode; in the process of interaction between the wireless network and the service, adopting an epsilon-greedy strategy to explore, and selecting the action with the largest return value; in connection with the optimization model presented herein, three key parts in DQN: the state space, action space, and reward function are defined as follows:
2.2.1 State space
The state space of the system is designed based on selectable network frequency bands, the occupation condition of the network bandwidths of the frequency bands, service uploading power and service transmission power, and the state of the system in the time period t is expressed as follows:
wherein ,
representing the bandwidth occupation state of each network; p is p
z,l Representing the transmission power of the transmission service z when it accesses the network l; h is a
z,l The channel state gain of network l when traffic z is transmitted is indicated.
2.2.2 action space
Strictly consider C 1 -C 6 Selecting access network of service according to all constraint conditions, and further determining the size of allocated resources; the action at time t is denoted as a t ∈A,A={c z,l ,p z,l ,B z,l -an action space to solve the network access selection problem; wherein c z,l Is a network selection action based on a given state s t The agent performs action c z,l Selecting a network for accessing to the transmission service z to execute a data transmission task; after selecting the access network, the proxy performs action p during network selection, access resource allocation and bandwidth allocation z,l Allocating proper transmission power for service, executing action B z,l And allocating bandwidth for the service for transmission.
2.2.3 reward function
The optimization objective of the freshness AAoI of the network transmission data when transmitting traffic z is to minimize AAoI when time goes to infinity, so the overall cumulative return is inversely related to AAoI; i.e. based on the actions performed, if the system minimizes AAoI, the agent gets a positive reward, otherwise gets a negative reward; achieving a cumulative prize maximization by exploring updates to achieve optimal operation; the reward function of the system available according to equation (16) is:
the long-term jackpot for a system is defined as:
gamma epsilon [0,1] is discount rate, is an influence factor reflecting the current value of the follow-up state rewards, and determines future rewards effects according to the selection access of different networks; the smaller the value of γ, the higher the impact of the current prize on value;
to minimize the AAoI of the network, it is desirable to select an appropriate network access selection scheme to maximize the future cumulative rewards:
Q π (s, a) is an action cost function representing the cumulative desire to take action a at state s, expressed as:
Q π (s,a)=E π (G t |S t =s,A t =a) (20)
q of (19) * (s, a) is an optimal value function, information (s, a, r, s ', a') is recursively obtained, and updated according to equation (18):
wherein alpha E [0,1] is the learning rate; since equation (17) can only obtain an optimal value when time t goes to infinity; it is difficult to practice, so DQN uses a deep neural network (Deep Neural Network, DNN) as a function approximator to approximate the function Q (s, a; ω) ≡q (s, a), trains the weights ω in DNN, takes the states and actions as inputs to DNN, takes the output of DNN after training is finished as Q value, and minimizes the loss function by training the network and updating the weights in an iterative process:
the overall strategy of the network access selection optimization algorithm of the DQN is to make the current state s
0 As an input there is provided,
corresponding to all possible actions, outputting Q value Q (s, a; omega) by adjusting the weight omega; and calculating a reward according to equation (17), and then applying the experience (s
t ,a
t ,r
t ,s
t+1 ) Stored in the experience playback pool m.
The heterogeneous wireless network comprises WLAN, 4G and 5G networks, and several typical services in a transformer substation are selected: distribution automation, accurate load control, transformer equipment state sensing and high-definition comprehensive video monitoring ensure that the power terminals are distributed in all network coverage areas, and simulation network parameters and service demand parameter threshold values are set, so that all networks can be selectively accessed when service is transmitted.
The heterogeneous wireless network access selection scheme with the information freshness minimization as a main target is provided aiming at the service in the transformer substation scene under the condition of ensuring the transmission rate, the transmission delay and the bit error rate performance requirement constraint of the service. The scheme uses a network AAoI to measure information freshness, converts an AAoI optimization problem into a Markov decision process, establishes a solution algorithm based on an DQN framework, and performs policy optimization on an access selection problem. Simulation results show that the access selection scheme provided by the invention effectively optimizes the network AAoI and improves the freshness of transmission data while guaranteeing the service transmission delay, the switching rate and the network load balance. In future research works, the priority of the substation service is classified, and the more urgent service is classified into a higher priority, so that the real-time performance of the service with higher urgency is ensured to the greatest extent.
Detailed Description
1. System model
As shown in fig. 1, a heterogeneous wireless network consisting of a plurality of wireless technologies including WLAN, 4G base station and 5G base station is considered. The heterogeneous wireless network provides communication services for service Z (Z e Z, z= {1,2,..once., Z }) within the substation, the alternative network being denoted as L (L e L, l= {1,2,..once., L }). It is assumed herein that all substation terminals have multimode interfaces, and that all networks can be selectively accessed in the transmission service area. The terminals of the transformer substation are assumed to be uniformly distributed in the coverage area, and the service types and the quantity of the terminals are random. The centralized controller is used for collecting network state information, and when the substation terminal sends an access request, the centralized controller distributes a proper network to the service transmitted by each terminal according to an access network selection scheme.
In the substation scenario, in order to ensure the freshness of the received information, the data of the terminal service needs to be updated frequently. According to the definition of AoI, aoI at any time is represented as the difference between the current time T and the generation time T (T) of the last successfully received packet, and is represented as:
A(t)=t-T(t) (1)
when the receiving end fails to receive the transmitted data, the generation time T of the data packet sent by the sending end is due to z S (t) does not change, its AoI value remains linearly increasing over time. A is that z,i (t) is expressed as AoI value of the ith packet at time t in the process of transmitting the update, and can be expressed as:
indicating the generation time of the data successfully received by service z i +.>
Representing the ith element of the service zTime of reception of successfully received data, A
z,i (t) represents AoI of the ith successfully received data of the service z at time t.
The probability of successful transmission is recorded as P n (n=1, 2,., N), the packet loss rate is 1-P n I.e., equation 2 can be expressed as:
service z is in
The peak AoI at time is:
peak AoI represents the maximum value of system AoI; to avoid long-term data non-update, it is ensured that the peak AoI of the traffic transmission does not exceed the AoI maximum of the traffic demand. Thereby obtaining the constraint condition:
for all links in the network, AAoI is generally used as a measure of freshness of data transmitted by the network, and the smaller AAoI is, the higher the freshness of the data is. AAoI is information age S over a period of time z,i The sum divided by the total length of time, i.e., the sum of the trapezoidal areas in fig. 2 (S z,1 +S z,2 +...+S z,i ) Divided by the length of time.
Recording device
C
z,i and Y
z,i Respectively corresponding to the state of the transmission service zRandom variables of interval time and system delay are sent in states. Then formula (6) can be expressed as:
the network AAoI transporting traffic z can be expressed as:
wherein all "E]"is the desired operator.
Indicating global AAoI in the network, the lower the freshness of the information indicating transport traffic. The invention aims to design an access network selection scheme, optimize AAoI of a network and improve timeliness of data transmission.
When a user selects a bearer network for transmission service, in order to ensure that the transmission of the transmission service is effective and reliable, target optimization must be performed on the premise that the performance index meets the transmission service threshold requirement. It is necessary to satisfy corresponding constraint conditions when allocating network resources for different transmission services, respectively.
When the transmission traffic z is transmitted over the network l, the signal-to-interference-and-noise ratio can be expressed as:
wherein p
z,l 、p
i,l The transmission traffic z and the power transmitted on the network l, respectively; h is a
z,l 、h
i,l Channel state gains for network l transmission traffic z, respectively;
for the additive white gaussian noise power of the network l, the default transmission noise is gaussian noise. When network l is transmission service z-branchThe allocated bandwidth resource is B
z,l When the transmission service z is transmitted in the network l, the maximum information transmission rate that can be achieved can be defined as:
R z,l =B z,l log 2 (1+SINR z,l ) (10)
to ensure that the information transmission rate of the network l when transmitting the traffic z is greater than the minimum transmission rate requirement R of the traffic z zmin Constraints can be obtained:
the transmission delay of the network l when transmitting the service z is expressed as:
wherein ,Mz The size of the data that needs to be transmitted for transmission of service z; c z,l Is a binary network selection state variable representing the selection state of the transmission service to the network, c z,l E {0,1} is a delay threshold D for ensuring that the transmission delay of the network l when transmitting the service z is less than the transmission delay threshold D of the service z zmax Then the constraint needs to be satisfied:
the error rate of the transmission traffic z when transmitted in the network l is expressed as:
wherein ,Ml The modulation index of the network l is the modulation index, and in order to ensure the reliability of service transmission, the access network selection needs to meet the bit error rate constraint, namely:
2 Wireless network access selection optimization algorithm based on information age
2.1 optimization target modeling
In order to effectively improve the information freshness of the service, an access selection optimization algorithm which aims at minimizing the AAoI of the transmission service is provided, and the optimization problem based on the AAoI minimization is formed by considering the performance constraint of the transmission rate, the transmission delay and the bit error rate. And solving the optimization problem through the DQN to realize that a user selects an access network according to service requirements.
In summary, the constraint formulas (5), (11), (13) and (15) above, the multi-user access selection optimization problem in heterogeneous wireless networks can be described as:
C 1 indicating that any transmission service z can only select one network access; c (C) 2 The peak AoI representing the transmission traffic z cannot exceed the maximum AoI of the traffic demand; c (C) 3 Indicating that the resources occupied by all services accessing the network i cannot be greater than the available bandwidth of the network; c (C) 4 Indicating that the rate of the transmission traffic z cannot be less than the minimum transmission rate required by the transmission traffic; c (C) 5 Indicating that the total delay generated by the transmission service z cannot exceed the maximum delay required by the transmission service; c (C) 6 Indicating that the bit error rate generated by the transmission service z cannot exceed the maximum bit error rate required by the service.
2.2 DQN-based optimization algorithm
The optimization problem presented by the present invention is markov-like, i.e. future decisions are only relevant to the current state. The optimization problem is represented by a markov decision process (Markov Decision Process, MDP). The reinforcement learning framework is defined based on MDP, with potential markov, and such decisions are model-free, so equation (16) can employ the reinforcement learning framework to find the optimal decision. The Q learning algorithm is a widely used reinforcement learning algorithm, and uses the bellman equation as a core to update the function iteratively by using a table record mode. However, when the state action space is too large, the table recording mode of Q learning is difficult to traverse to finish each step. The DQN integrates decision making capability of Q learning and strong data analysis capability of the deep neural network, so that the problem of dimensional explosion caused by large state space in the Q learning algorithm can be solved, and training stability can be effectively improved. The proposed network access selection model is solved here using the DQN algorithm to find the optimal resource allocation strategy. And in the process of interaction between the wireless network and the service, adopting an epsilon-greedy strategy to search, and selecting the action with the largest return value. In connection with the optimization model presented herein, three key parts in DQN: the state space, action space, and reward function are defined as follows:
2.2.1 State space
The state space of the system is designed based on selectable network frequency bands, the occupation condition of the network bandwidths of the frequency bands, service uploading power and service transmission power, and the state of the system in the time period t is expressed as follows:
wherein L (L epsilon L) represents a selectable network frequency band when the service in the system is accessed;
representing the bandwidth occupation state of each network; p is p
z,l Representing the transmission power of the service z when accessing the network l; h is a
z,l The channel state gain of network l when traffic z is transmitted is indicated.
2.2.2 action space
The decision action will select the access network of the service, and then decide the allocated resource size; the action at time t is denoted as a t ∈A,A={c z,l ,p z,l ,B z,l And is an action space to solve the network access selection problem. Wherein c z,l Is a network selection action based on a givenState s t The agent performs action c z,l Selecting a network for accessing for the service z to execute a data transmission task; after selecting the access network, the proxy performs action p z,l Allocating proper transmission power for service, executing action B z,l And allocating bandwidth for transmission service for transmission. In the process of network selection, access resource allocation and bandwidth allocation, C is strictly considered 1 -C 6 All constraints.
2.2.3 reward function
The optimization objective of the present invention is to minimize AAoI as time goes to infinity so the overall cumulative return is inversely related to AAoI. I.e. based on the actions performed, if the system minimizes AAoI, the representative gets a positive prize, otherwise gets a negative prize. The jackpot maximization is achieved by exploring updates to achieve optimal operation. The reward function of the system available according to equation (16) is:
the long-term jackpot for a system is defined as:
gamma epsilon [0,1] is discount rate, is an influence factor reflecting the current value of the follow-up state rewards, and determines future rewards effects according to the selection access of different networks. The smaller the value of γ, the higher the impact of the current prize on value.
To minimize the AAoI of the network, an appropriate network access selection scheme needs to be selected to maximize the future cumulative rewards:
Q π (s, a) is an action cost function representing the cumulative desire to take action a at state s, expressed as:
Q π (s,a)=E π (G t |S t =s,A t =a) (20)
Q * (s, a) is an optimal value function, information (s, a, r, s ', a') is obtained, typically recursively, and updated according to equation (18):
wherein alpha E [0,1] is the learning rate. Since equation (17) can only obtain the optimal value when time t goes to infinity, it is difficult to practice, DQN approximates the function Q (s, a; ω) ≡q (s, a) using the deep neural network (Deep Neural Network, DNN) as a function approximator, trains the weights ω in DNN, takes states and actions as inputs to DNN, takes the output of DNN after training is completed as Q value, and minimizes the loss function by training the network and updating the weights in an iterative process:
the invention provides a network access selection optimization algorithm based on DQN, wherein the overall strategy is to select the current state s
0 As an input there is provided,
the Q value Q (s, a; ω) is output by adjusting the weight ω corresponding to all possible actions. To ensure that the agent can trade-off between exploration of an unknown environment and utilization of learned knowledge, the agent chooses actions according to an epsilon-greedy policy. The primary operation of the proxy is to select the appropriate access network. Depending on the selected network, the agent performs the next action to allocate resources. After all actions are completed, the agent transitions to a new state (s
t+1 ) And calculates the reward according to equation (17), and then the experience (s
t ,a
t ,r
t ,s
t+1 ) Stored in the experience playback pool m.
3. Simulation experiment and result analysis
3.1 simulation parameter set-up
The heterogeneous wireless network in the simulation scene comprises WLAN, 4G and 5G networks, and more typical several types of services in the transformer substation are selected: distribution automation, accurate load control, transformer equipment state sensing and high-definition comprehensive video monitoring ensure that the power terminals are distributed in all network coverage areas, and all networks can be selectively accessed when transmitting service.
The method comprises the steps that the time delay requirements of four types of services, namely, power distribution automation, accurate load control, substation equipment state sensing and high-definition comprehensive video monitoring, are respectively set to 12ms, 50ms, 500ms and 200ms by referring to national power grid standard for substation service communication requirements; the transmission rate is set to 2Mbps, 5Mbps, 0.1Mbps, 4Mbps; the error rate is set to 10 -4 、10 -4 、10 -2 、10 -2 The method comprises the steps of carrying out a first treatment on the surface of the The data transmission interval is set to 0.1ms, 3ms, 100ms.
Parameter setting is carried out on 5G (700M), 5G (2.6G), 5G (3.5G), 5G (4.9G), 4G and WLAN networks, and bandwidths are respectively 30MHz, 20MHz, 30MHz, 25MHz, 20MHz and 10MHz; the time delays are respectively 25ms, 12.5ms, 5ms, 3ms, 30ms and 500ms; the transmission rates are 60Mbps, 80Mbps, 240Mbps, 375Mbps, 12.5Mbps and 9Mbps respectively; the transmitting power is 23dB, 26dB, 43dB and 20dB respectively; error rates of 10 respectively -3 、10 -4 、10 -4 、10 -4 、10 -3 、10 -2 。
3.2 analysis of simulation results
The simulation results are analyzed by referring to the accompanying drawings
The network AAoI shown in FIG. 3 compares the network AAoI of different algorithms in a heterogeneous wireless network scene; it can be seen that as the number of users increases, the network AAoI of the three algorithms all have a growing trend; the network AoI increases with the increase of the number of access services due to the limited load capacity of the network, so that the overall network AAoI increases, but the network AAoI of the algorithm is obviously smaller than the AHP algorithm and the time delay minimization algorithm, which indicates that the access scheme provided by the invention can effectively reduce the network AAoI and improve the information freshness.
Fig. 4 analyzes service transmission delays under different algorithms, and it can be seen that the service transmission delay curve of the algorithm of the present invention is between delay minimization and AHP algorithm, superior to the AHP algorithm, inferior to the delay minimization algorithm. Limited by the total amount of network resources, the transmission delay of the three algorithms eventually increases as the number of services accessing the network increases. The AHP algorithm mainly considers fairness of resource allocation, and is deficient in optimizing single performance, so that the propagation delay growth speed of the AHP algorithm is fastest in three simulation algorithms. The time delay minimization algorithm mainly considers the optimal time delay performance, so the time delay speed increasing is slowest.
Fig. 5 compares network load balancing under three algorithms. Network load balancing is described by a network load rate variance. The variance of the network load rate and the network load balance are in negative correlation, and the smaller the variance is, the better the balance is. As the performance of each network is different, the service number difference of each network access becomes larger along with the increase of the service access number, and the variance of the network load rate increases. It can be seen that the algorithm proposed by the present invention is superior to the delay minimization algorithm but inferior to the AHP algorithm. Although the load balancing problem is also considered, the influence of the network load degree on the service AAoI is mainly considered, the AHP algorithm is considered by integrating a plurality of attributes, the balance is better, and the delay minimization algorithm is lack of consideration on the aspect of network load balancing because the delay performance is mainly optimized.
Fig. 6 compares the cumulative handover rates under three algorithms. The cumulative handover rate is a ratio describing the total number of cumulative handovers of the current service to the total number of services of the current access network as the number of services increases. Network switching is performed at a proper time, so that the network load rate can be effectively reduced, the network transmission performance is improved, but the network switching is too frequent, so that the ping-pong effect is easy to cause, and the service quality of the service is influenced. The cumulative switching rate of the algorithm herein is higher than the single performance optimized latency minimization algorithm, but less than the equalization focused AHP algorithm.
Aiming at the service in the transformer substation scene, the invention provides a heterogeneous wireless network access selection scheme taking information freshness minimization as a main target under the condition of ensuring the constraint of the service on the transmission rate, transmission delay and bit error rate performance requirements. The scheme uses a network AAoI to measure information freshness, converts an AAoI optimization problem into a Markov decision process, and establishes a solution algorithm based on an DQN framework to perform strategy optimization on an access selection problem. Simulation results show that the access selection scheme provided by the method ensures service transmission delay, switching rate and network load balance, effectively optimizes network AAoI and improves the freshness of transmission data. In future research works, the priority of the substation service is classified, and the more urgent service is classified into a higher priority, so that the real-time performance of the service with higher urgency is ensured to the greatest extent.