CN115052325A

CN115052325A - Multi-frequency heterogeneous wireless communication network access selection algorithm suitable for transformer substation service

Info

Publication number: CN115052325A
Application number: CN202210635044.6A
Authority: CN
Inventors: 韩东升; 岳栩彤
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2022-09-13
Anticipated expiration: 2042-06-07
Also published as: CN115052325B

Abstract

The invention discloses a multi-frequency heterogeneous wireless communication network access selection algorithm suitable for transformer substation services, and belongs to the technical field of wireless network communication of intelligent transformer substations. Aiming at services in a transformer substation scene, the invention provides a heterogeneous wireless network access selection scheme which mainly aims at minimizing information freshness under the condition of ensuring the requirements of the services on transmission rate, transmission delay and error rate performance. According to the scheme, information freshness is measured by network AAoI, and a solution algorithm based on a DQN framework is established to carry out strategy optimization on an access selection problem. Simulation results show that the access selection scheme provided by the invention can effectively optimize the AAoI of the network and improve the freshness of transmission data while ensuring the service transmission delay, the switching rate and the network load balance. In future research work, the division of the priority of the transformer substation service is considered, the higher priority is divided for the more urgent service, and the real-time performance of the more urgent service is ensured.

Description

Multi-frequency heterogeneous wireless communication network access selection algorithm suitable for transformer substation service

Technical Field

The invention belongs to the technical field of wireless network communication of intelligent substations, and particularly relates to a multi-frequency heterogeneous wireless communication network access selection algorithm suitable for substation services.

Background

With the continuous deepening of the construction of the intelligent transformer substation, the transformer substation service is more and more complex, and the requirement for updating and raising the network is met. The wireless communication technology, especially the 5 th generation mobile communication (5G), has the technical advantages of high reliability, low time delay, large bandwidth and wide connection, can effectively meet the application requirements of complex transformer substation scenes and various service requirements, and can better promote the construction of intelligent transformer substations. In order to meet the service communication requirements of a user terminal and make full use of complementary characteristics among various networks in the current heterogeneous wireless network environment, an access network selection method needs to be researched based on the characteristics of transformer substation services and wireless networks, so that appropriate networks are selected for different services, network resource utilization is optimized, and service quality is improved.

At present, common access network selection algorithm researches mainly include an access network selection algorithm based on multi-attribute decision, an access network selection algorithm based on an optimization strategy, an access network selection algorithm based on artificial intelligence and the like. The multi-attribute decision-based access network selection algorithm comprehensively considers a plurality of attributes of the network, makes balance and coordination among a plurality of targets, and has balanced overall performance but higher complexity. The literature [2] jointly utilizes an analytic hierarchy process and a grey correlation analysis method to analyze user preference so as to obtain attribute weight, and then a joint optimal selection scheme is designed, so that the system throughput is effectively improved; document [3] comprehensively evaluates the Quality of Service (QoS) of a user, cost and weight of network load attributes based on a multi-attribute decision theory and a fuzzy logic theory, and the proposed network selection scheme meets the multi-Quality of Service requirements. An access network selection algorithm based on an optimization strategy may optimize the performance of the user traffic in some respect. Document [4] designs an access network selection scheme with the goal of maximizing the network transmission rate by using a dynamic programming theory; document [5] designs an access network selection algorithm with the goal of maximizing the ratio of throughput to power consumption; document [6] determines an optimal network resource allocation strategy with the goal of maximizing the signal-to-interference-and-noise ratio; however, the utility functions used at present are all measured by traditional indexes, and sometimes cannot meet higher service requirements of more users. The access network selection algorithm based on artificial intelligence is beneficial to finding the optimal strategy in a complex environment with high uncertainty. Document [7] jointly utilizes game theory and machine learning to design a network selection algorithm with the goal of optimizing user experience and reducing average delay. Document [8] establishes an optimization model with maximized throughput, and trains samples by deep reinforcement learning and transfer learning to obtain an optimal allocation strategy. However, most of the current access selection strategies based on artificial intelligence optimize the specific performance of the network under the condition of a single service type, and ignore the diversified requirements of different services.

Aiming at complex scenes of the transformer substation, the network access resource allocation strategy needs to be optimized by comprehensively considering the diversified demands of different services, threshold value constraints and the like. As most of the transformer substation services are real-time, overdue data can cause untimely state monitoring to cause safety problems, and the freshness of the data is crucial to safe and stable operation ^[9] . It is not sufficient to consider only low latency at this time, and joint design and optimization should be performed in consideration of information generation, transmission, and channel state jointly. For this reason, the optimal access selection strategy is obtained by Deep Q-Learning (DQN) with the goal of average information age minimization. The concept of age of information is proposed by Kaul et al to measure how fresh a received message is, expressed as the difference between the time the packet was successfully received by the target node and the current time. The smaller the information age, the higher the information freshness. Document [10 ]]AoI (Age of Information, AoI) correlation metrics were introduced, compared, and a summary of the correlation study was presented. Factors affecting AoI include, but are not limited to, latency, reliability, data update period, etc. AoI is more comprehensive than conventional measures such as latency ^[11] . In recent years, many documents have been studied from the aspects of link scheduling, data updating methods, and the like, using AoI as a metric index, and particularly have attracted much attention in a multi-user wireless transmission scenario. Document [12 ]]Scheduling data transmission links with network AAoI (Average Age of Information, AAoI) minimization as an optimization target, document [13 ]]Research into the age of information for radio frequency energy harvesting networks, targeting long-term AAoI minimization, optimizing the online transmission strategy for wireless networks [14]Scheduling strategies designed as a function of AoI time-varying weighted sums of different sensorsTo minimize the system AAoI.

The existing access research is more directed to public network scenes, the research on access network selection under the transformer substation scene is less, only the traditional time delay index is considered, and AoI is not used for analysis and optimization. Therefore, it is a research focus herein to study an access network selection algorithm for minimizing AAoI in a substation scenario. The method comprises the steps of firstly establishing a heterogeneous network access selection model by taking AAoI minimization as an optimization target, then analyzing and optimizing by utilizing a DQN algorithm to obtain an optimal strategy for distributing network resources to different services under the condition of minimizing total AAoI, improving the freshness of information updating, and finally verifying the reliability and effectiveness of the proposed theory and algorithm through simulation.

Disclosure of Invention

The invention aims to provide a multi-frequency heterogeneous wireless communication network access selection algorithm suitable for transformer substation services, which is characterized by comprising the following steps:

1. establishing multi-frequency heterogeneous wireless communication network model

The multi-frequency heterogeneous wireless communication network is a heterogeneous wireless network formed by multiple wireless technologies consisting of a WLAN (wireless local area network), a 4G base station and a 5G base station; communication and transmission service is provided for the transformer substation terminal;

the terminals of the transformer substation are all provided with multimode interfaces, and all networks in a transmission service area can be accessed from the multimode interfaces; the transformer substation terminals are uniformly distributed in the coverage area, and the types and the number of transmission services of the terminals are random. The method comprises the steps that an integrated controller is adopted to collect network state information, and when a substation terminal sends an access request, the integrated controller distributes a proper network to a transmission service of each terminal according to an access network selection scheme;

the transmission service of the transformer substation terminal provided by the heterogeneous wireless network is Z, and Z belongs to Z; z1, 2,. and Z; an alternative network is denoted as L, L ∈ L, L ═ 1, 2., L; in a substation scene, in order to ensure the freshness AoI of received information, the data of the transmission service z needs to be updated frequently; AoI at any time is expressed as the difference between the current time t and the generation time t (t) of the last successfully received packet, expressed as:

A(t)＝t-T(t) (1)

information age-based wireless network access selection optimization algorithm

All networks in the service transmission region can be selected to be accessed from a multimode interface, and the selected access is solved by adopting a wireless network access selection optimization algorithm based on information age, so that the information freshness of the service is effectively improved, the access selection optimization algorithm taking AAoI of service transmission as a target is minimized, the performance constraints of transmission rate, transmission delay and bit error rate are considered, the optimization problem based on AAoI minimization is formed, and a reinforcement learning framework is adopted to find the optimal decision; and solving the optimization problem through the DQN to realize that the user selects an access network according to the service requirement.

The data of the transmission service z needs to be updated frequently to ensure the freshness AoI of the received information; let A _z,i (t) in the process of transmitting update for the transmission service z, the value AoI of the ith packet at time t may be represented as:

in the formula (2), the reaction mixture is,

indicating the time of generation of the ith successfully received data of the terminal service z,

indicating the reception time of the ith successfully received data of the transmission service z of the terminal, A _z,i (t) AoI indicating the service z ith successfully received data at time t;

when the receiving end fails to receive the transmitted data, the generation time of the data packet sent by the sending end is shortened

If no change occurs, the AoI value keeps increasing linearly with the time;

let the probability of successful transmission be P _n N is 1,2, N; the packet loss rate is 1-P _n Namely, the formula (2) can be expressed as:

terminal service z is in

The peak AoI at time is:

peak AoI represents the maximum value of system AoI; in order to avoid the situation that data is not updated for a long time, the peak value AoI during service transmission is ensured not to exceed the AoI maximum value of the service requirement; the constraint is thus obtained:

if the freshness of the received information is ensured and the data of the terminal service z needs to be updated frequently, the freshness of the network transmission data is taken as a measurement index by using AAoI, and the smaller the AAoI is, the higher the freshness of the data is;

the AAoI is the age sum (S) of the information over a period of time _z,1 +S _z,2 +…+S _z,i ) Divided by the total length of time, where S _z,i Indicating the age of the information within a certain time;

note the book

C _z,i and Y_z,i Random variation of state transmission interval and system delay respectively corresponding to transmission service zAmount of the compound (A). Equation (6) can be expressed as:

then when the time t tends to infinity, the freshness AAoI of the network transmission data of the transmission traffic z can be expressed as:

all of "E" in the formula (8)]"is the desired operator;

the lower the overall AAoI in the network, the higher the information freshness degree of the terminal service; and optimizing the AAoI of the network to improve the timeliness of data transmission.

When selecting a bearer network for a user service, in order to ensure effective and reliable service transmission, target optimization must be performed on the premise that a performance index meets a service threshold requirement. Therefore, when network resources are allocated to different services to ensure effective and reliable service transmission, corresponding constraint conditions need to be met respectively. When transmission traffic z is transmitted over network i, the signal to interference plus noise ratio can be expressed as:

wherein p_z,l 、p _i,l The power transmitted on the network l for service z and service i, respectively; h is _z,l 、h _i,l Channel state gains when respectively transmitting a service z and a service i for a network l;

the invention defaults that the transmission noise is Gaussian noise, which is the additive Gaussian white noise power of the network I. According to the Shannon formula, when the bandwidth resource allocated to the service z by the network l is B _z,l The traffic z is transmitted in the network lThe maximum information transmission rate that can be achieved can be defined as:

R _z,l ＝B _z,l log ₂ (1+SINR _z,l ) (10)

in order to ensure that the information transmission rate of the network l during the transmission of the service z is greater than the minimum transmission rate requirement R of the transmission service z for the transmission of data by the network _zmin The constraint conditions are obtained:

the transmission delay generated by the network l when transmitting the service z is expressed as:

wherein ,M_z The size of data to be transmitted for the terminal service z; c. C _z,l Is a binary network selection state variable representing the selection state of the service to the network, c _z,l E {0,1 }. To ensure that the transmission delay of the network l during the transmission of the traffic z is smaller than the delay threshold D of the transmission traffic z _zmax Then the constraint needs to be satisfied:

the error rate of the transmission service z during transmission in the network l is represented as:

wherein ,M_l Is the modulation index of the network i. In order to ensure the reliability of service transmission, the access network selection needs to satisfy the bit error rate constraint, that is:

2. wireless network access selection optimization algorithm based on information age

The method for effectively improving the information freshness of the service and realizing the selection of the access network by the user according to the service requirement by the access selection optimization algorithm with the aim of minimizing the AAoI of the transmission service comprises the following steps:

2.1 optimization target modeling

Considering performance constraints of transmission rate, transmission delay and bit error rate, forming an optimization problem based on AAoI minimization, and solving the optimization problem through DQN, and combining the above equations (5), (11), (13) and (15), the multi-user access selection optimization problem under the heterogeneous wireless network can be described as follows:

in the formula (16), C ₁ The fact that any service z can only select one network to access is shown; c ₂ Peak AoI indicating that the transmitted traffic z cannot exceed the maximum AoI of the traffic demand; c ₃ Indicating that resources occupied by all services of the access network l cannot be larger than available bandwidth of the network; c ₄ Indicating that the rate of transmitting the service z cannot be less than the minimum transmission rate required by the service; c ₅ The total time delay generated by the transmission service z cannot exceed the maximum time delay of the service requirement; c ₆ Indicating that the transmission traffic z does not yield a bit error rate that exceeds the maximum bit error rate required for that traffic.

2.2DQN optimization algorithm

The above-mentioned optimization problem of AAoI minimization is Markov (MDP) in that solving the optimization problem through DQN is an optimization problem, i.e. future decisions are only related to the current state and are therefore represented by a markov decision process; the reinforcement learning framework is defined based on MDP, has potential Markov property, and the decision is modeless, so formula (16) can use the reinforcement learning framework to find the optimal decision; solving the proposed network access selection model by using a DQN algorithm;

the Q learning algorithm is a widely used reinforcement learning algorithm, takes a Bellman equation as a core, and iteratively updates functions in a form recording mode; in the process of interaction between a wireless network and a service, an epsilon-greedy strategy is adopted for exploration, and an action with the maximum return value is selected; in conjunction with the optimization model presented herein, for three key components in DQN: the state space, action space and reward function are defined as follows:

2.2.1 State space

The state space of the system is designed based on the selectable network frequency bands, the occupation condition of the network bandwidth of each frequency band, the service uploading power and the service transmission power, and the system state in a time period t is expressed as follows:

wherein ,

representing the bandwidth occupation state of each network; p is a radical of _z,l Representing the transmission power when the transmission service z is accessed to the network l; h is _z,l Representing the channel state gain of the network i when transmitting the traffic z.

2.2.2 motion space

Strictly consider C ₁ -C ₆ Selecting a service access network by all constraint conditions, and further determining the size of the allocated resources; the action at time t is represented as a _t ∈A，A＝{c _z,l ,p _z,l ,B _z,l The motion space for solving the network access selection problem; wherein, c _z,l Is a network selection action, based on a given state s _t The agent performs action c _z,l Selecting a network l for accessing a transmission service z to execute a data transmission task; after selecting an access network, the agent performs action p in the course of network selection, access resource allocation and bandwidth allocation _z,l Allocate the appropriate transmission power for the service, perform action B _z,l And allocating bandwidth for the service to transmit.

2.2.3 reward function

The optimization goal of the freshness AAoI of the network transport data of transport traffic z is to minimize AAoI as time goes to infinity, so the overall cumulative reward is inversely related to AAoI; i.e. if the system minimizes AAoI, the agent gets a positive reward, otherwise it gets a negative reward, depending on the action that has been performed; the accumulated reward maximization is realized by exploring and updating, so that the optimal operation is obtained; the reward function for an available system according to equation (16) is:

the long term jackpot for the system is defined as:

gamma belongs to [0,1] as a discount rate which is an influence factor reflecting the reward of the subsequent state on the current value, and the future reward effect is determined according to the selective access of different networks; the smaller the value of gamma, the higher the influence degree of the current reward on the value;

to minimize the AAoI of the network, it is desirable to select an appropriate network access selection scheme to maximize the future cumulative reward:

Q _π (s, a) is an action cost function representing the cumulative expectation of taking action a at state s, expressed as:

Q _π (s,a)＝Eπ(G _t |S _t ＝s,A _t ＝a) (20)

formula (19) Q ^* (s, a) is an optimal value function, and the information (s, a, r, s ', a') is obtained in a recursive manner and updated according to equation (18):

wherein alpha is [0,1]]Is the learning rate; since the equation (17) can obtain the optimum value when the time t tends to infinity; difficult to practice, so DQN approximates the function Q (s, a; ω) ≈ Q using Deep Neural Networks (DNN) as a function approximator ^* (s, a) training the weights ω in the DNN, taking the states and actions as the input of the DNN, taking the output of the DNN after the training as the Q-value, and minimizing the loss function by training the network and updating the weights in an iterative process:

the overall strategy of the network access selection optimization algorithm of the DQN is to select the current state s ₀ As an input to the process, the process may,

corresponding to all possible actions, the Q value Q (s, a; omega) is output by adjusting the weight omega; and calculates the reward according to equation (17), and then applies the experience(s) _t ,a _t ,r _t ,s _t+1 ) Stored in the experience playback pool m.

The heterogeneous wireless network comprises a WLAN (wireless local area network), a 4G network and a 5G network, and several typical services in a transformer substation are selected: the method has the advantages that distribution automation, accurate load control, substation equipment state sensing and high-definition comprehensive video monitoring are achieved, the power terminals are guaranteed to be distributed in all network coverage areas, simulation network parameters and service requirement parameter threshold values are set, and all networks can be selectively accessed during service transmission.

The method has the beneficial effects that aiming at the service in the scene of the transformer substation, under the condition that the requirements of the service on the transmission rate, the transmission delay and the error rate performance are restrained, a heterogeneous wireless network access selection scheme which takes the minimization of information freshness as a main target is provided. According to the scheme, information freshness is measured by network AAoI, an AAoI optimization problem is converted into a Markov decision process, a solution algorithm based on a DQN frame is established, and strategy optimization is carried out on an access selection problem. Simulation results show that the access selection scheme provided by the invention can effectively optimize the AAoI of the network and improve the freshness of transmission data while ensuring the service transmission delay, the switching rate and the network load balance. In future research work, the classification of the service priority of the transformer substation is considered, and the higher priority is classified for the more urgent service, so that the real-time performance of the service with higher urgency is ensured to the maximum extent.

Drawings

FIG. 1 is a schematic diagram of a network scene model

FIG. 2 AoI graph showing the change in time

FIG. 3 network AAoI

Fig. 4 network total transmission delay

FIG. 5 network load rate variance

Fig. 6 traffic cumulative switching rate

Detailed Description

1. System model

As shown in fig. 1, consider a heterogeneous wireless network formed by multiple wireless technologies including WLAN, 4G base stations and 5G base stations. The heterogeneous wireless network provides communication service for a service Z (Z is equal to Z, Z is equal to {1, 2.., Z }) in a transformer substation, and an alternative network is represented as L (L is equal to L, and L is equal to {1, 2.., L }). In this document, it is assumed that all substation terminals have multimode interfaces, and all networks in a transmission service area can be selectively accessed. The transformer substation terminals are uniformly distributed in the coverage area, and the service types and the number of the terminals are random. And collecting network state information by adopting the centralized controller, and when the substation terminal sends an access request, the centralized controller allocates a proper network to the service transmitted by each terminal according to an access network selection scheme.

In a substation scene, in order to ensure the freshness of received information, the data of terminal services needs to be updated frequently. By definition at AoI, AoI at any time is expressed as the difference between the current time t and the generation time t (t) of the last successfully received packet, and is expressed as:

A(t)＝t-T(t) (1)

when the receiving end fails to successfully receive the transmitted dataDue to the generation time of the data packet sent by the sending end

Without change, its AoI value remained linearly increasing over time. A. the _z,i (t) represents that in the process of transmitting the update for the service z, the value AoI of the ith packet at time t can be represented as:

indicating the time of generation of the ith successfully received data of service z,

indicates the time of reception of the ith successfully received data of service z, A _z,i (t) AoI representing the i-th successfully received data of service z at time t.

Noting the probability of successful transmission as P _n (N is 1,2, …, N), the packet loss rate is 1-P _n I.e. equation 2 can be expressed as:

service z is

The peak AoI at time is:

peak AoI represents the maximum value of system AoI; to avoid long-term data non-update, it is ensured that the peak AoI during traffic transmission does not exceed the AoI maximum value of the traffic demand. The constraint is thus obtained:

for all links in the network, AAoI is generally used as a measure of freshness of data transmitted by the network, and the smaller the AAoI, the higher the freshness of the data. AAoI is the age of the message S over a period of time _z,i The sum divided by the total length of time, i.e. the sum of the trapezoidal areas in FIG. 2 (S) _z,1 +S _z,2 +…+S _z,i ) Divided by the length of time.

Note the book

C _z,i and Y_z,i Random variables corresponding to the state transmission interval time and the system delay of the transmission service z, respectively. Equation (6) can be expressed as:

then when the time t goes to infinity, the network AAoI carrying the traffic z can be expressed as:

wherein all of "E", "a",]"is the desired operator.

Indicating global AAoI in the network, the lower the information freshness indicating the transport traffic. The invention aims to design an access network selection scheme to optimize the AAoI of the network so as to improve the timeliness of data transmission.

When a user selects a bearer network for a transmission service, in order to ensure effective and reliable transmission of the transmission service, target optimization must be performed on the premise that a performance index meets a transmission service threshold requirement. So that corresponding constraint conditions need to be satisfied when allocating network resources for different transmission services.

When transmission service z is transmitted on network i, the signal to interference plus noise ratio can be expressed as:

wherein p_z,l 、p _i,l Respectively the transmission service z and the power transmitted on the network l; h is _z,l 、h _i,l Respectively transmitting the channel state gain of the service z for the network l;

the additive white gaussian noise power of network i is the default transmission noise to be gaussian noise. When the bandwidth resource allocated by the network l for the transmission service z is B _z,l Then, the maximum information transmission rate that can be achieved when the transmission service z is transmitted in the network l may be defined as:

R _z,l ＝B _z,l log ₂ (1+SINR _z,l ) (10)

in order to ensure that the information transmission rate of the network l when transmitting the service z is greater than the minimum transmission rate requirement R of the service z _zmin The constraint conditions are obtained:

wherein ,M_z The size of data to be transmitted for transmission service z; c. C _z,l Is a binary network selection state variable which represents the transmission service to the networkSelection state of c _z,l The element is {0,1} is a time delay threshold value D for ensuring that the transmission time delay of the network l when transmitting the service z is less than the transmission time delay z _zmax Then the constraint needs to be satisfied:

the bit error rate of the transmission service z during transmission in the network l is represented as:

wherein ,M_l The modulation index of the network l is used to ensure the reliability of service transmission, and the access network selection needs to satisfy the bit error rate constraint, that is:

2 wireless network access selection optimization algorithm based on information age

2.1 optimization target modeling

In order to effectively improve the information freshness of the service, an access selection optimization algorithm aiming at minimizing the AAoI of the transmission service is provided, and the performance constraints of the transmission rate, the transmission delay and the error rate are considered to form an optimization problem based on the AAoI minimization. And solving the optimization problem through the DQN to realize that the user selects an access network according to the service requirement.

Combining the above constraint equations (5), (11), (13), and (15), the multi-user access selection optimization problem under the heterogeneous wireless network can be described as follows:

C ₁ the network access can be selected from any transmission service z; c ₂ The peak AoI representing the transmission traffic z cannot exceed the traffic zAoI for maximum traffic demand; c ₃ Indicating that resources occupied by all services of the access network l cannot be larger than available bandwidth of the network; c ₄ The rate of the transmission service z cannot be smaller than the minimum transmission rate required by the transmission service; c ₅ The total time delay generated by the transmission service z cannot exceed the maximum time delay required by the transmission service; c ₆ Meaning that the error rate generated by the transmission service z cannot exceed the maximum error rate required by the service.

2.2 DQN-based optimization Algorithm

The optimization problem proposed by the present invention is markov, i.e. future decisions are only relevant to the current state. The optimization problem is represented by a Markov Decision Process (MDP). The reinforcement learning framework is defined based on MDP, has potential markov, and this decision is modeless, so equation (16) can use the reinforcement learning framework to find the optimal decision. The Q learning algorithm is a widely used reinforcement learning algorithm, takes a Bellman equation as a core, and iteratively updates functions in a form of table record. However, when the state action space is too large, the table record mode of Q learning is difficult to traverse to complete each step. The DQN integrates decision-making capability of Q learning and strong data analysis capability of a deep neural network, the problem of dimension explosion caused by large state space in a Q learning algorithm can be solved, and training stability can be effectively improved. The DQN algorithm is used herein to solve the proposed network access selection model to find the optimal resource allocation strategy. And in the process of interaction between the wireless network and the service, adopting an epsilon-greedy strategy to explore and selecting the action with the maximum return value. In conjunction with the optimization model presented herein, for three key components in DQN: the state space, action space and reward function are defined as follows:

2.2.1 State space

The state space of the system is designed based on the selectable network frequency band, the network bandwidth occupation condition of each frequency band, the service uploading power and the service transmission power, and the system state in the time period t is expressed as:

wherein L (L belongs to L) represents a selectable network frequency band when the service in the system is accessed;

representing the bandwidth occupation state of each network; p is a radical of formula _z,l Representing the transmission power of the service z when accessing the network l; h is _z,l Representing the channel state gain of the network i when transmitting the traffic z.

2.2.2 motion space

The decision-making action selects the access network of the service, and then determines the size of the allocated resources; the action at time t is expressed as at ∈ A, and A ∈ { c } _z,l ,p _z,l ,B _z,l The motion space to solve the network access selection problem. Wherein, c _z,l Is a network selection action, based on a given state s _t The agent performs action c _z,l Selecting a network l for accessing the service z to execute a data transmission task; after selecting an access network, the agent performs action p _z,l Allocate the appropriate transmission power for the service, perform action B _z,l And allocating bandwidth for the transmission service to transmit. Strictly considering C in the process of network selection, access resource allocation and bandwidth allocation ₁ -C ₆ All constraints.

2.2.3 reward function

The optimization objective of the present invention is to minimize AAoI as time goes to infinity, so the overall cumulative return is inversely related to AAoI. I.e. if the system minimizes AAoI, a positive reward is obtained if not, depending on the action performed. The cumulative reward is maximized by exploring the updates, thereby achieving optimal operation. The reward function for the system available according to equation (16) is:

the long term jackpot for the system is defined as:

gamma belongs to [0,1] is a discount rate which is an influence factor reflecting the current value of the subsequent state reward, and the future reward effect is determined according to the selective access of different networks. The smaller the value of γ, the higher the impact of the current reward on value.

To minimize the AAoI of the network, a suitable network access selection scheme needs to be selected to maximize the future cumulative reward:

Qπ(s,a)＝Eπ(G _t |S _t ＝s,A _t ＝a) (20)

Q ^* (s, a) is an optimal value function, and the information (s, a, r, s ', a') is typically obtained in a recursive manner and updated according to equation (18):

wherein α ∈ [0,1]]Is the learning rate. Since equation (17) can only obtain an optimum value when time t tends to infinity, which is difficult to practice, DQN approximates function Q (s, a; ω) to Q using Deep Neural Network (DNN) as a function approximator ^* (s, a) training the weights ω in the DNN, taking the states and actions as the input of the DNN, taking the output of the DNN after the training as the Q-value, and minimizing the loss function by training the network and updating the weights in an iterative process:

the invention provides an overall strategy of a DQN-based network access selection optimization algorithm, which is to select the current state s ₀ As an input to the process, the process may be,

and corresponding to all possible actions, the Q value Q (s, a; omega) is further output by adjusting the weight omega. To ensure that the agent can make a trade-off between exploration of unknown environments and utilization of learned knowledge, the agent chooses an action according to an epsilon-greedy policy. The primary operation of the proxy is to select the appropriate access network. The agent takes the next action to allocate resources according to the selected network. After all actions are completed, the agent transitions to a new state(s) _t+1 ) And calculating a reward according to equation (17), followed by experience(s) _t ,a _t ,r _t ,s _t+1 ) Stored in the empirical playback pool m.

3. Simulation experiment and result analysis

3.1 simulation parameter settings

The heterogeneous wireless network in the simulation scene comprises WLAN, 4G and 5G networks, and typical services in a transformer substation are selected: distribution automation, accurate load control, substation equipment state perception and high-definition comprehensive video monitoring guarantee that power terminals are distributed in all network coverage areas, and all networks can be selectively accessed during service transmission.

The communication requirements of the transformer substation services are set to 12ms, 50ms, 500ms and 200ms respectively for the time delay requirements of four services of power distribution automation, accurate load control, transformer equipment state perception and high-definition comprehensive video monitoring by referring to the national grid standard; the transmission rate is set to 2Mbps, 5Mbps, 0.1Mbps, 4 Mbps; bit error rate set to 10 ^-4 、10 ^-4 、10 ^-2 、10 ^-2 (ii) a The data transmission interval is set to 0.1ms, 3ms, 100 ms.

Setting parameters of 5G (700M), 5G (2.6G), 5G (3.5G), 5G (4.9G), 4G and WLAN networks, wherein the bandwidths are respectively 30MHz, 20MHz, 30MHz, 25MHz, 20MHz and 10 MHz; the time delays are respectively 25ms, 12.5ms, 5ms, 3ms, 30ms and 500 ms; transmission ofThe rates are respectively 60Mbps, 80Mbps, 240Mbps, 375Mbps, 12.5Mbps and 9 Mbps; the transmitting power is 23dB, 26dB, 43dB and 20dB respectively; the bit error rates are respectively 10 ^-3 、10 ^-4 、10 ^-4 、10 ^-4 、10 ^-3 、10 ^-2 。

3.2 simulation results analysis

The simulation result is analyzed by combining the attached drawing

Fig. 3 shows a network AAoI, which contrasts network AAoI of different algorithms in a heterogeneous wireless network scenario; it can be seen that as the number of users increases, the network AAoI of the three algorithms all tends to increase; this is because the load capacity of the network is limited, and the network AoI will increase with the increase of the number of access services, so the overall network AAoI will increase, but it can be seen that the network AAoI of the algorithm of the present invention is significantly smaller than the AHP algorithm and the delay minimization algorithm, which shows that the access scheme proposed by the present invention can effectively reduce the network AAoI and improve the information freshness.

Fig. 4 analyzes the service transmission delay under different algorithms, and it can be seen that the service transmission delay curve of the algorithm of the present invention is between the delay minimization and the AHP algorithm, and is superior to the AHP algorithm and inferior to the delay minimization algorithm. Limited by the total amount of network resources, the transmission delay of all three algorithms will eventually increase as the amount of traffic accessing the network increases. The AHP algorithm mainly considers the fairness of resource allocation, and is deficient in the optimization of single performance, so that the speed of increasing the transmission delay is the fastest among the three simulation algorithms. The delay minimization algorithm mainly considers the optimal delay performance, so the delay speed is slowest.

Fig. 5 compares the network load balancing under the three algorithms. Network load balancing is described by the network load rate variance. The network load rate variance and the network load balance are in negative correlation, and the smaller the variance is, the better the balance is. Due to different performances of each network, as the number of the accessed services increases, the difference of the number of the accessed services of each network becomes larger, and the variance of the network load rate increases accordingly. It can be seen that the network balance of the algorithm provided by the invention is superior to the delay minimization algorithm, but inferior to the AHP algorithm. Although the load balancing problem is also considered, the influence of the network load degree on the service AAoI is mainly considered, and the AHP algorithm is considered by integrating a plurality of attributes, so that the balancing performance is better, and the delay minimization algorithm is deficient in consideration of the network load balancing performance due to the main optimization of the delay performance.

Fig. 6 compares the cumulative switching rates for the three algorithms. The cumulative switching rate is a ratio of the total number of the current accumulated switching of the services to the total number of the services currently accessed to the network, which is described as the number of the services increases. Network switching is carried out at a proper time, so that the network load rate can be effectively reduced, the network transmission performance is improved, but the ping-pong effect is easily caused by too frequent network switching, and the service quality of the service is influenced. The cumulative switching rate of the algorithm is higher than that of a single performance optimization delay minimization algorithm, but is smaller than that of an AHP algorithm paying attention to balance.

Aiming at services in a transformer substation scene, the invention provides a heterogeneous wireless network access selection scheme which mainly aims at minimizing information freshness under the condition of ensuring the requirements of the services on transmission rate, transmission delay and error rate performance. According to the scheme, information freshness is measured by network AAoI, an AAoI optimization problem is converted into a Markov decision process, and a solution algorithm based on a DQN frame is established to perform strategy optimization on an access selection problem. Simulation results show that the access selection scheme provided by the invention can effectively optimize the AAoI of the network and improve the freshness of transmission data while ensuring the service transmission delay, the switching rate and the network load balance. In future research work, the classification of the service priority of the transformer substation is considered, and the higher priority is classified for the more urgent service, so that the real-time performance of the more urgent service is ensured to the maximum extent.

Claims

1. A multi-frequency heterogeneous wireless communication network access selection algorithm suitable for substation services is characterized by comprising the following steps:

(1) establishing multi-frequency heterogeneous wireless communication network model

the terminals of the transformer substation are all provided with multimode interfaces, and all networks in a transmission service area can be accessed from the multimode interfaces; assuming that the transformer substation terminals are uniformly distributed in a coverage area, and the types and the number of transmission services of the terminals are random; the method comprises the steps that an integrated controller is adopted to collect network state information, and when a substation terminal sends an access request, the integrated controller distributes a proper network to a transmission service of each terminal according to an access network selection scheme;

A(t)＝t-T(t) (1)

(2) wireless network access selection optimization algorithm based on information age

The heterogeneous wireless network comprises a WLAN (wireless local area network), a 4G network and a 5G network, and several typical services in a transformer substation are selected: the method comprises the steps of distribution automation, accurate load control, substation equipment state sensing and high-definition comprehensive video monitoring, ensures that power terminals are distributed in all network coverage areas, and sets simulation network parameters and service requirement parameter thresholds, so that all networks can be selectively accessed during service transmission;

2. The multi-frequency heterogeneous wireless communication network access selection algorithm suitable for the substation service, according to claim 1, wherein to ensure the freshness AoI of the received information, the data of the transmission service z needs to be updated frequently; let A _z,i (t) in the process of transmitting update for the transmission service z, the value AoI of the ith packet at time t may be represented as:

in the formula (2), the reaction mixture is,

when the receiving end fails to receive the transmitted data, the generation time T of the data packet sent by the sending end is caused _z ^S (t) unchanged, its AoI value remains linearly increasing over time;

terminal service z is in

Peak value of timeAoI is:

3. the multi-frequency heterogeneous wireless communication network access selection algorithm applicable to substation services, according to claim 2, wherein the freshness of the received information is guaranteed, and data of the transmission service z needs to be updated frequently, AAoI is used as the freshness of network transmission data as a measure, and the smaller the AAoI is, the higher the freshness of the data is.

4. The multi-frequency heterogeneous wireless communication network access selection algorithm for substation services according to claim 3, wherein the AAoI is an age sum (S) of information over a period of time _z,1 +S _z,2 +…+S _z,i ) Divided by the total length of time, where S _z,i Indicating the age of the information within a certain time;

note the book

C _z,i and Y_z,i Random variables of state transmission interval time and system time delay respectively corresponding to the service z; equation (6) can be expressed as:

then when the time t goes to infinity, the freshness AAoI of the network transmission data of the terminal traffic z can be expressed as:

all of "E" in the formula (8)]"is the desired operator;

the lower the overall AAoI in the network, the higher the information freshness degree of the terminal service; the AAoI of the network is optimized to improve the timeliness of data transmission;

when selecting a bearer network for a user service, in order to ensure the effectiveness and reliability of service transmission, target optimization must be performed on the premise that a performance index meets a service threshold requirement; therefore, when network resources are allocated for different services to ensure that the service transmission is effective and reliable, corresponding constraint conditions need to be met respectively;

the additive white gaussian noise power of network i, the default transmission noise here is gaussian noise. According to the Shannon formula, when the bandwidth resource allocated to the service z by the network l is B _z,l The transmission of the service z in the network l is enabledThe maximum information transfer rate achieved can be defined as:

R _z,l ＝B _z,l log ₂ (1+SINR _z,l ) (10)

in order to ensure that the information transmission rate of the network l during the transmission of the terminal service z is greater than the minimum transmission rate requirement R of the terminal service z for the transmission of data by the network _{z min} The constraint conditions can be obtained:

wherein ,M_z The size of data to be transmitted for the terminal service z; c. C _z,l Is a binary network selection state variable representing the selection state of the service to the network, c _z,l E {0,1} is a time delay threshold value D for ensuring that the transmission time delay of the network l when transmitting the terminal service z is less than the terminal service z _{z max} Then the constraint needs to be satisfied:

the error rate of the traffic z when transmitted in the network l is expressed as:

5. the multi-frequency heterogeneous wireless communication network access selection algorithm applicable to substation services according to claim 1, wherein the access selection optimization algorithm for effectively improving information freshness of services and aiming at minimizing AAoI of service transmission is implemented to enable a user to select an access network according to service requirements, and comprises:

1) optimizing object modeling

in formula (16), C ₁ The fact that any service z can only select one network to access is shown; c ₂ Peak AoI indicating that the transmitted traffic z cannot exceed the maximum AoI of the traffic demand; c ₃ Indicating that resources occupied by all services of the access network l cannot be larger than available bandwidth of the network; c ₄ Indicating that the rate of transmitting the service z cannot be less than the minimum transmission rate required by the service; c ₅ The total time delay generated by the transmission service z cannot exceed the maximum time delay of the service requirement; c ₆ Indicating that the error rate generated by the transmission service z cannot exceed the maximum error rate of the service requirement;

2) DQN optimization algorithm

The above AAoI minimization optimization problem is solved by DQN. The optimization problem is Markov (MDP) in the sense that future decisions are only relevant to the current state and are therefore represented by a markov decision process; the reinforcement learning framework is defined based on MDP, has potential Markov property, and the decision is modeless, so formula (16) can use the reinforcement learning framework to find the optimal decision; solving the proposed network access selection model by using a DQN algorithm;

the Q learning algorithm is a widely used reinforcement learning algorithm, takes a Bellman equation as a core, and iteratively updates functions in a form recording mode; in the process of interaction between a wireless network and a transmission service, an epsilon-greedy strategy is adopted for exploration, and the action with the maximum return value is selected; in conjunction with the optimization model presented herein, for three key components in DQN: a state space, an action space, and a reward function.

6. The multi-frequency heterogeneous wireless communication network access selection algorithm suitable for substation services, according to claim 5, wherein the state space is designed based on selectable network frequency bands, network bandwidth occupation conditions of each frequency band, transmission service upload power and transmission power, and the system state at time period t is represented as:

wherein ,

7. The multi-frequency heterogeneous wireless communication network access selection algorithm for substation services according to claim 5, wherein the action space strictly considers C ₁ -C ₆ Selecting a service access network by all constraint conditions, and further determining the size of the allocated resources; the action at time t is represented as a _t ∈A，A＝{c _z,l ,p _z,l ,B _z,l The motion space to solve the network access selection problem; wherein, c _z,l Is a network selection action, based on a given state s _t The agent performs action c _z,l Selecting a network l for accessing a transmission service z to execute a data transmission task; after selecting an access network, the agent performs action p in the course of network selection, access resource allocation and bandwidth allocation _z,l Allocating suitable transmission power for the transmission service, and executing action B _z,l And allocating bandwidth for the transmission service to transmit.

8. The multi-frequency heterogeneous wireless communication network access selection algorithm suitable for substation services according to claim 5, wherein the reward function is that when the optimization goal of freshness AAoI of network transmission data of transmission service z is to minimize AAoI when time goes to infinity, the agent gets a positive reward, otherwise gets a negative reward; the accumulated reward maximization is realized by exploring and updating, so that the optimal operation is obtained; the reward function for the system available according to equation (16) is:

the long term jackpot for the system is defined as:

gamma belongs to [0,1] is a discount rate which is an influence factor reflecting the reward of the subsequent state on the current value, and the future reward effect is determined according to the selective access of different networks; the smaller the value of gamma, the higher the influence degree of the current reward on the value;

Q _π (s,a)＝E _π (G _t |S _t ＝s,A _t ＝a) (20)

wherein α ∈ [0,1] is the learning rate; since the equation (17) can obtain the optimum value when the time t tends to infinity; it is difficult to practice, so DQN approximates the function Q (s, a; ω) ≈ Q (s, a) using Deep Neural Networks (DNN) as a function approximator, trains the weight ω in DNN, takes the state and action as the input of DNN, takes the output of DNN after the training as the Q value, minimizes the loss function by training the Network and updating the weight in an iterative process:

9. the multi-frequency heterogeneous wireless communication network access selection algorithm suitable for substation services according to claim 5, wherein the overall strategy of the DQN network access selection optimization algorithm is to determine the current state s ₀ As an input to the process, the process may,

corresponding to all possible actions, the Q value Q (s, a; omega) is output by adjusting the weight omega; and calculates the reward according to equation (17), and then applies the experience(s) _t ,a _t ,r _t ,s _t+1) Stored in the empirical playback pool m.