CN117042050A

CN117042050A - Multi-user intelligent data unloading method based on distributed hybrid heterogeneous decision

Info

Publication number: CN117042050A
Application number: CN202311036967.0A
Authority: CN
Inventors: 徐煜华; 刘松仪; 李国鑫; 徐逸凡; 张晓凯; 辜方林; 马文峰; 陈韬亦
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2023-08-17
Filing date: 2023-08-17
Publication date: 2023-11-10

Abstract

The application provides a multi-user intelligent data unloading method based on a distributed hybrid heterogeneous decision, which comprises the following steps: according to the set multi-purpose edge calculation model, multidimensional resource management modeling in the mobile unloading-oriented network is carried out; setting a multi-user mobile data unloading neural network model based on a distributed hybrid heterogeneous decision according to modeling information of a multi-user network scene; initializing the observation matrixes and networks of the base station and each user; respectively training a high-level network and a low-level network in the neural network model before reaching a preset round number; and the mobile user searches the environment according to the trained model, and updates the user observation state and the channel state. The method provided by the application has complete model and clear physical meaning, and the method for unloading the multi-user intelligent data based on the distributed mixed heterogeneous decision is provided, so that the effective solution is realized.

Description

Multi-user intelligent data unloading method based on distributed hybrid heterogeneous decision

Technical Field

The application relates to the technical field of infinite communication, in particular to a multi-user intelligent data unloading method based on distributed hybrid heterogeneous decision.

Background

Among the edge calculations, the edge calculations help to improve the calculation and communication efficiency of the mobile device. However, the resources of mobile devices are severely limited and cannot meet the demands of increasingly resource demanding applications. The mobile edge computing technology can well meet the demands of delay sensitive tasks by offloading data to surrounding high-performance equipment, has become a research hotspot in recent years, and is widely applied to various scenes such as the Internet of things and the Internet of vehicles. In the above scenario, however, the mobile user equipment is typically composed of small sensor nodes, which typically have the characteristics of limited computational resources, small memory space, limited data transmission and low battery capacity. In addition, in consideration of existence of external interference attack and openness of wireless channel, the data unloading process between multiple users in the multi-access mobile edge computing network is easy to generate conflict or be attacked by interference. This makes mobile devices a serious challenge in terms of transmission security. Therefore, the problem of MEC resource optimization in a dynamic unknown interference environment is more and more important.

In addition, considering network dynamics and uncertainty caused by user mobility and the sensitivity requirement of different task requirements to time delay, the problem of strong coupling formed by multiple users needs to be rapidly solved. The problem is difficult to quickly solve by conventional gaming methods (ref: H.Gao, W.Li, R.A.Banez, Z.Han, and H.V.Poor, "Mean Field Evolutionary Dynamics in Dense-User Multi-Access Edge Computing Systems," IEEE Transactions on Wireless Communications, vol.19, no.12, pp.7825-7835, dec.2020.) because of the difficulty in obtaining environmental prior information such as task arrival event distribution, time-varying channel conditions, etc. The intelligent learning algorithms such as reinforcement learning can make real-time and online resource allocation decisions based on exploration of dynamic unknown environments, and further solve the problems (references: X.Liu et al, "A heterogeneous information fusion deep reinforcement learning for intelligent frequency selection of HF communication," in China Communications, vol.15, no.9, pp.73-84, sept.2018 references: H.Peng and X.shen, "Multi-Agent Reinforcement Learning Based Resource Management in MEC-and UAV-Assisted Vehicular Networks," IEEE Journal on Selected Areas in Communications, vol.39, no.1, pp.131-141, jan.2021). However, existing Reinforcement Learning (RL) based resource allocation schemes mostly focus on discrete or continuous mobile edge computation decisions. The hybrid decision space problem for joint optimization of communication resources and computing resources is rarely considered. In addition, the traditional multi-user intelligent method has the problems that the performance is severely limited and the exploration speed is slower when the algorithm decision exploration is carried out in the mixed action space, sparse rewards are easy to cause, and the like. Therefore, research into new multi-user intelligent learning algorithms is urgently needed to solve the above-mentioned problems.

Disclosure of Invention

The application provides a multi-user intelligent data unloading method based on a distributed hybrid heterogeneous decision, which can be used for solving the technical problems that the traditional multi-user intelligent algorithm is severely limited in performance and slower in exploration speed when performing algorithm decision exploration in a hybrid action space, and sparse rewards are easy to cause.

The application provides a multi-user intelligent data unloading method based on a distributed hybrid heterogeneous decision, which comprises the following steps:

step 1, according to a set multi-purpose edge calculation model, multidimensional resource management modeling in a mobile unloading network is carried out;

step 2, setting a multi-user mobile data unloading neural network model based on a distributed hybrid heterogeneous decision according to modeling information of a multi-user network scene;

step 3, initializing the observation matrix and the network of the base station and each user;

step 4: respectively training a high-level network and a low-level network in the neural network model before reaching a preset round number;

and step 5, the mobile user searches the environment according to the trained model, and updates the user observation state and the channel state.

Optionally, step 1 performs multidimensional resource management modeling in a mobile offload network according to the set multi-purpose edge computing model, including:

Step 1.1, modeling an edge computing scenario in a mobile offload network:

the model comprises a base station, edge computing equipment, a plurality of mobile users and a plurality of jammers, wherein the mobile users need to perform edge computing according to task computing; during the edge calculation process, the jammer interferes the mobile user data unloading process;

each edge computing server is connected with one base station, and each base station serves a group of mobile users distributed randomly; there are a plurality of tasks with different requirements for the mobile user to select for processing, each user can execute one type of task; each mobile user periodically generates computing tasks with different QoS requirements and computing resource requirements; if the task is required to be offloaded to the edge computing equipment, the user firstly sends a resource access request to a base station to be offloaded; after receiving the access authority and the resource allocation result of the corresponding base station, the user unloads the allocated spectrum resources to the edge computing equipment of the associated base station; users set in the overlapping area between the base stations can only offload tasks to one of the base stations when performing task offloading;

the set model rules are as follows: for mobile user i, each task uses a single set of three tuples Composed of->The data size of the calculation task is in units of bits; />Representing the number of CPU cycles required for a computing task, +.>Giving the maximum tolerable delay time of the calculation task; when the t-th calculation task is generated or arrived at a certain mobile user terminal, the mobile user system must determine whether the task needs to be partially, completely unloaded to the base station or calculated locally only; o (o) _i,m (t) indicates whether the base station m is connected with the mobile terminal user at the moment t; when o _i,m When (t) =1, the mobile terminal user is connected with the base station m; on the contrary, o _i,m (t)＝0；

Defining the selected access channel, i.e. the selected base station, and the access frequency point, the unloading proportion, the transmission bandwidth and the transmission power of the mobile user i as respectivelyα _i (α _i ∈[0,1])，B _i (B _i ∈[B _min ,B _max ])，P _i (P _i ∈[P _min ,P _max ]) The method comprises the steps of carrying out a first treatment on the surface of the When the user performs data unloading, if the selected frequency is seriously blocked, the unloading is failed, and the user must switch the frequency; under the condition that the current channel is good, the terminal can realize the unloading of task data by adjusting the unloading proportion, the bandwidth and the transmission power;

in the process of data unloading, each user has a maximum CPU frequency; for local computing, users must ensure that their computing power can complete local tasks within a maximum tolerable delay time; wherein local computation delays The time required for the t-th computing task to be computationally processed for the local user is defined as:

wherein,the computing power assigned to the terminal, denoted MEC server,>representing computing power allocated to the user by the edge computing device; mobile user i maximum tolerated delay +.>Is restricted to->If the latency of the offload mode exceeds the latency of the local computation mode, it is desirable to reduce the local computation by increasing the offload ratio to meet the time constraint; the t-th calculation task uses the locally calculated energy consumption +.>The method comprises the following steps:

wherein the method comprises the steps ofCalculating an energy efficiency coefficient for the effective capacitive switching parameters of the user, locally calculated weighted costsThe definition is as follows:

wherein, gamma is sum ofRespectively are provided withDelay weight coefficient and energy consumption weight coefficient;

for the data offloading mode, each terminal may offload tasks to the base station over a wireless channel; the time delay of the calculation task through the edge calculation equipment comprises wireless transmission of uploading data, calculation of edge calculation and result downloading transmission; for the t-th computing task arrival event, the user's offload transfer rate R _i,m (t) is:

wherein lambda is a demodulation threshold value of the data received by the base station; for the t-th computing task, the total offload delay for the user is expressed as:

Energy consumption when task t arrivesThe method comprises the following steps:

edge computation cost for single userThe definition is as follows:

wherein, gamma is sum ofRespectively a delay weight coefficient and an energy consumption weight coefficient;

while meeting each application requirement, minimizing the long-term weighted sum of data calculation costs; the total cost of mobile network data calculation is as follows:

step 1.2, describing the model and building the architecture using a decentralized partially observable markov decision process:

the state space, the local observation information, the action space and the excitation function are defined respectively:

state space: environmental configuration is described, including the user's movement characteristics (position, orientation), externally perceived spectral information, and time-varying tasks generated by the user terminal; definition x _i (t)、y _i (t) defining P for the azimuth coordinate of user i _N(t),m (t) is part of the spectral information observed by the user, v _N(t) (t) the moving speed and moving direction of the current time slot of the user; the calculation task information of the user i comprises information such as task type, data volume and the like; aiming at the problem of wireless network resource management of MEC server, the environment state S _m (t) is expressed as:

local observation information: o (O) _i,m (t) representing the observation space of the mobile users, the observation information of each user at t time slots being part of the current global state information; the single user observation information in the communication network is limited, so that the mobile user cannot acquire global information, and the local observation value of the mobile user in the time slot t consists of two parts; part of the information is external environment information such as local channel state, position information and the like which are currently observed by a mobile user at the time t; another part is the information of the task type and data volume required to be transmitted by the user, and the mobile user is at the time t Observed value of gap is O _i,m (t)：

Action space: a (t) is defined as the action selected by the user in time slot t, definition A _i,m An action selected for the user at time slot t; action decision a _i,m Including channel access decisionsAnd data offloading decision->Namely>

Excitation function: r represents rewards for performing a behavior in the ambient state S; r is (r) _t ＝R(S _t ,A _t ) Is when the user is in state S _t Instant rewards when performing the operation; the channel access reward function for a user is as follows:

wherein r is _m Is a defined prize value, c is a channel switching factor; beta _th Representing a channel quality threshold;for the user a set of channels is selectable +.>Intervention strategy for user channel->For interfered channels->The channel strategy selected for the last slot, the reward function for the user to use the continuous offload decision is as follows:

wherein C is _sp Failure loss of data unloading; the overall rewards function for a single user is:

wherein mu _i Is the return weight of the user, and is used for guaranteeing the fairness of the user.

Optionally, step 2, setting a multi-user mobile data unloading neural network model based on the distributed hybrid heterogeneous decision, includes:

the double-layer deep Q network structure is set to comprise a high-level network for coordinating and combining with discrete action learning and a plurality of low-level networks for learning continuous parameter coordination strategies; respectively training a high-level network and a low-level network, wherein both follow a centralized training and a distributed execution paradigm;

The advanced network for coordinating the combined discrete action learning is a network architecture of an upper layer for multi-user centralized discrete channel access, adopts an A2C network deployment and base station part for deciding a channel, and is used for excavating relevant data information with unobvious user channel characteristics while reducing information entropy; the state matrix is downsampled in the pooling layer; and outputting the channel access strategy of each user downwards; the lower network is used for a network architecture of each user distributed continuous data unloading decision, a distributed PPO network structure is deployed on a mobile user, the state and each discrete action are mapped to corresponding continuous parameters, and the continuous user data unloading decision under the optimal discrete user channel access is found;

the proposed network is seamlessly integrated based on the network structure of the A2C algorithm and the near-end policy optimization algorithm; for each user, the gradients of its Actor and Critic neural network are uploaded to the base station in an asynchronous update mode; the network gradients of different user terminals collected by the base station are fused, and the fused network parameters are issued to all users, so that the integration and sharing of the gradients of the Actor and Critic neural networks are realized;

in a hierarchical neural network, a state-encoding network is shared between higher and lower networks.

Optionally, step 3, performing initialization processing on the observation matrix and the network of the base station and each user, including:

initializing a neural network: setting the size of all experience playback pools L in each user to phi; initializing a random value by a network parameter theta; the target Q-network parameter is θ ^- =θ; filling the experience playback pool L using random exploration; setting learning parameters;

the user initializes the observation matrix: all matrices are zeroed out O _i,m The method comprises the steps of carrying out a first treatment on the surface of the Per-user actor network parameters θ ^u And critic network parameter θ ^ν Initializing a random value; initializing a step counter i=0 and a global shared counter h=0, initializing a global agent actor gradient θ ^u' And critic network parameter θ ^ν' The highest iteration number of each network update is i _max The method comprises the steps of carrying out a first treatment on the surface of the Setting the cumulative gradient to zero: dθ ^ν ≡0 and dθ ^u ζ0, initializing round number w=0.

Optionally, before reaching the preset round number, training the higher-layer network and the lower-layer network in the neural network model respectively, including:

step 4.1, when w is less than or equal to w _m In each round, when i-i _s ≠i _max When (1): counter for current step update (i≡i+1); each user perceives the external environment O _i,m (t)；

Discrete decision for channel access exploration using greedy algorithm Data offloading decision->Initially performing a random exploration, each mobile user selecting action a _i,m ＝π _i (O _i,m ): performing edge calculation decision a at the user _i,m Immediately after that, the user gets the current prize r _i,m ＝R(O _i,m ,a _i,m ) The method comprises the steps of carrying out a first treatment on the surface of the The user receives the current strategy a _i,m Is (r) the prize r _i,m And obtain the corresponding input state and obtain the new state O _i,m (t+1) the base station makes each user share its channel access action information with other users by collecting information +.>When the user finds a good data offloading decision, these actions are reported to the base station, which again updates the higher layer network channel access policy>Simultaneously storing decision experience information a _i,m And includes all information ((O) _i,m ,a _i,m ,r _i,m ,W _t ) Transmitting the user observation state to the MEC server, and synchronously updating the user observation state and the channel state of the next time slot;

step 4.2, when φ=i-1, i _max When the network model is designed, the actor network and the critic network of the designed network model are updated, and the actor network parameter theta is calculated ^u And critic network parameter θ ^ν Updating and synchronizing the gradient θ of all mobile users ^u ＝θ ^u' And theta ^ν ＝θ ^ν' And count global shares updating by the device H++1; continuing to update the network parameters according to the current network model parameters (updated network parameters θ ^u And theta ^ν Continuing to explore the environment; sample training is carried out according to rewards of the current strategy and the environmental state at the next moment, and training experience is less than O _i,m (t),a _i,m (t),r _i,m (t),O _i,m (t+1),W _t The method comprises the steps of storing the data in an experience playback pool L, and randomly selecting small samples from the experience playback pool for training and updating parameters of the neural network to realize fitting of the neural network; training the high-level network and the low-level network respectively;

based on data offloading and channel conditions, a policy function update expression is defined as follows:

wherein r is _t (θ) is a probability ratio, andepsilon is a hyper-parameter.

Optionally, step 5, the mobile user searches the environment according to the trained model, updates the user observation state and the channel state, and includes:

when w > w _m When the method is used for returning: the base station directly transmits all network fitting parameter information to the user, and the user loads a trained model; the mobile user continues to explore the environment according to the trained model, and each user respectively fine-tunes own actor network parameter theta according to the environment ^u And critic network parameter θ ^ν Adapting to channel variations.

Compared with the prior art, the application has the remarkable advantages that: (1) Different from the traditional game theory algorithm and the multi-user learning method, the method is a multi-user anti-interference intelligent learning data unloading method, an A2C algorithm structure and a PPO algorithm are introduced to build and design the proposed method architecture and network architecture, and a double-layer multi-user reinforcement learning method structure is designed: the upper layer output channel is accessed to a discrete strategy, the lower layer output data is unloaded for continuous decision, multidimensional resource management in a multi-user network is reasonably realized, and the resource utilization rate is further improved. In addition, the algorithm only uses part of external state information perceived by a user to fit the built neural network, and the system overhead of the whole network can be greatly reduced on the premise of completing a data calculation task; (2) The method has the advantages of complete model and clear physical meaning, and the proposed multi-user intelligent data unloading method based on the distributed mixed heterogeneous decision can effectively solve the proposed model, (3) can effectively cope with external dynamic interference and internal network interference, and well describe a multi-user anti-interference mobile data unloading scene model in a mobile network.

Drawings

FIG. 1 is a system model diagram facing an edge computation model in a mobile offload network in an embodiment of the application.

Fig. 2 is an overall block diagram of multi-user intelligent data offloading based on distributed hybrid heterogeneous decisions in an embodiment of the present application.

FIG. 3 is a diagram of a structural model of multi-user intelligent data offloading based on distributed hybrid heterogeneous decisions in an embodiment of the present application.

Fig. 4 is a graph showing the comparison of the performance of the intelligent unloading algorithm against dynamic random period interference in the embodiment 1 of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Embodiments of the present application will be described first with reference to the accompanying drawings.

specifically, step 1 performs multidimensional resource management modeling in a mobile offload network according to a set multi-purpose edge computing model, including:

step 1.1, modeling an edge computing scenario in a mobile offload network: according to the external information perceived and acquired by the independent user and the decision dimension of the user, an edge calculation model is established, and the method specifically comprises the following steps:

each edge computing server is connected with one base station, and each base station serves a group of mobile users distributed randomly; there are a plurality of tasks with different requirements for the mobile user to select for processing, each user can execute one type of task; each mobile user periodically generates computing tasks with different QoS requirements and computing resource requirements; if the task is required to be offloaded to the edge computing equipment, the user firstly sends a resource access request to a base station to be offloaded; after receiving the access authority and the resource allocation result of the corresponding base station, the user unloads the allocated spectrum resources to the edge computing equipment of the associated base station; here, multiple base stations are not considered for parallel task division, and users in the overlapping area between the base stations can only offload tasks to one of the base stations when performing task offloading;

the set model rules are as follows: for mobile user i, each task uses a single set of three tuples Composed of->The data size of the calculation task is in units of bits; />Representing the number of CPU cycles required for a computing task, +.>Giving the maximum tolerable delay time of the calculation task; when the t-th calculation task is generated or arrived at a certain mobile user terminal, the mobile user system must determine whether the task needs to be partially or completely unloaded to the base station or calculated locally according to the relevant characteristics of the mobile user system; o (o) _i,m (t) indicates whether the base station m is connected with the mobile terminal user at the moment t; when o _i,m When (t) =1, the mobile terminal user is connected with the base station m; on the contrary, o _i,m (t)＝0；

Defining the selected access channel of mobile user i, i.e. selected base station, access frequency point and unloadingCarrier ratio, transmission bandwidth and transmission power are respectivelyα _i (α _i ∈[0,1])，B _i (B _i ∈[B _min ,B _max ])，P _i (P _i ∈[P _min ,P _max ]) The method comprises the steps of carrying out a first treatment on the surface of the When the user performs data unloading, if the selected frequency is seriously blocked, the unloading is failed, and the user must switch the frequency; under the condition that the current channel is good, the terminal can realize the unloading of task data by adjusting the unloading proportion, the bandwidth and the transmission power;

in the process of data unloading, each user has a maximum CPU frequency due to limited computing resources of the communication equipment; for local computing, users must ensure that their computing power can complete local tasks within a maximum tolerable delay time; wherein local computation delays The time required for the t-th computing task to be computationally processed for the local user is defined as:

wherein,the computing power assigned to the terminal, denoted MEC server,>representing computing power allocated to the user by the edge computing device; mobile user i maximum tolerated delay +.>Is restricted to->If the delay of the unloaded modeExceeding the delay of the local calculation mode, it is necessary to reduce the local calculation amount by increasing the unloading ratio in order to satisfy the time constraint; the t-th calculation task uses the locally calculated energy consumption +.>The method comprises the following steps:

wherein the method comprises the steps ofCalculating an energy efficiency coefficient for the effective capacitance switching parameter of the user, and correlating with the CPU structure; furthermore, the cost of the local computing task is mainly related to the local computing delay and local energy consumption, and the weighted cost of the local computing is ∈>The definition is as follows:

for the data offloading mode, each terminal may offload tasks to the base station over a wireless channel; the time delay of the calculation task through the edge calculation equipment comprises wireless transmission of uploading data, calculation of edge calculation and result downloading transmission; since the data size of the calculation result after the task processing is smaller, the process of obtaining the result from the base station by the user is ignored by default. Selecting a reasonable access base station for a user based on the channel state, radio communication resources and computing resources of the mobile offload network, and determining the allocated radio and computing resources (channel selection Select and data offload decisions). Thus, for the t-th computing task arrival event, the user's offload transfer rate R _i,m (t) is:

energy consumption when task t arrivesThe method comprises the following steps:

edge computation cost for single userThe definition is as follows:

in addition, in order to avoid the system overhead caused by frequent exchange of information between the user and the base station, a multidimensional resource management scheme based on a distributed strategy is considered. While meeting each application requirement, minimizing the long-term weighted sum of data calculation costs; the total cost of mobile network data calculation is as follows:

local observation information: o (O) _i,m (t) representing the observation space of the mobile users, the observation information of each user at t time slots being part of the current global state information; the single user observation information in the communication network is limited, so that the mobile user cannot acquire global information, and the local observation value of the mobile user in the time slot t consists of two parts; part of the information is external environment information such as local channel state, position information and the like which are currently observed by a mobile user at the time t; the information of task type and data volume required to be transmitted by another part of users is that the considered area is in the coverage of the base station, and the observation value of the mobile users in t time slots is O _i,m (t)：

Excitation function: r represents rewards for performing a behavior in the ambient state S; r is (r) _t ＝R(S _t ,A _t ) Is when the user is in state S _t Instant rewards when performing the operation; to minimize overhead, a corresponding bonus function is designed. Firstly, aiming at a channel access function, a plurality of users are encouraged to successfully transmit simultaneously, and the non-transmission behavior caused by collision or bad channel conditions is punished. The other is a fairness-based individual rewards function. The individual reward function ensures fairness for the user by setting different weights. Thus, the channel access reward function for a user is as follows:

wherein r is _m Is a defined prize value and is primarily related to channel quality. c is a channel switching factor; beta _th Representing a channel quality threshold;for the user a set of channels is selectable +.>Intervention strategy for user channel->For interfered channels->The channel strategy selected for the last slot, the reward function for the user to use the continuous offload decision is as follows:

And 2, setting a multi-user mobile data unloading neural network model based on the distributed hybrid heterogeneous decision according to modeling information of the multi-user network scene.

Specifically, step 2 includes:

the communication system structure and the neural network model structure of the multi-user layered learning solution hybrid decision space are designed for solving the sparse rewarding problem, and the method is specifically as follows:

introducing a hierarchical learning theory, and determining a mixed heterogeneous hierarchical decision communication system structure consisting of multiple layers of action classifications: the initialized built neural network is a double-layer deep Q network structure with the capability of detecting interference rules and predicting channel state change, and is set to be a high-level network comprising a coordinated and combined discrete action learning and a plurality of low-level networks for learning continuous parameter coordination strategies; respectively training a high-level network and a low-level network, wherein both follow a centralized training and a distributed execution paradigm; the designed hierarchical neural network architecture based on the AC network model comprises the following two algorithm network architecture modules:

the advanced network for coordinating the combined discrete action learning is a network architecture of an upper layer for multi-user centralized discrete channel access, adopts an A2C network deployment and base station part for deciding a channel, and is used for excavating relevant data information with unobvious user channel characteristics while reducing information entropy; the state matrix is downsampled in the pooling layer; and outputting the channel access strategy of each user downwards; the lower network is used for a network architecture of each user distributed continuous data unloading decision, a distributed PPO network is deployed on a mobile user, the state and each discrete action are mapped to corresponding continuous parameters, and the continuous user data unloading decision under the optimal discrete user channel access is found;

The network architecture consists of a coordinated joint discrete action advanced network and a plurality of low-level networks that learn continuous actions. The neural network trains the high-level network and the low-level network respectively, and the method comprises a plurality of parallel sub-networks and has a global critic network for updating the strategy parameters of the sub-networks. The proposed algorithm network structure is seamlessly integrated based on the network structure of the A2C algorithm (processing discrete action space) and the near-end policy optimization (PPO) algorithm (processing continuous action space); for each user, the gradients of its Actor and Critic neural network are uploaded to the base station in an asynchronous update mode; the network gradients of different user terminals collected by the base station are fused, and the fused network parameters are issued to all users, so that the integration and sharing of the gradients of the Actor and Critic neural networks are realized;

in a hierarchical neural network architecture, a state-encoded network is shared between higher and lower networks. The proposed hierarchical architecture aims to reduce the exchange of information between user agents in the underlying network and to enable them to update the network independently based on local observations.

And step 3, initializing.

Specifically, step 3 includes:

Initializing a neural network: sizing the playback pool L for all experiences in individual usersSetting phi; initializing a random value by a network parameter theta; the target Q-network parameter is θ ^- =θ; filling the experience playback pool L using random exploration; setting learning parameters;

the user initializes the observation matrix: all matrices are zeroed out O _i,m The method comprises the steps of carrying out a first treatment on the surface of the Per-user actor network parameters θ ^u And critic network parameter θ ^v Initializing a random value to explore the actions of the training phase; initializing a step counter i=0 and a global shared counter h=0, initializing a global agent actor gradient θ ^u′ And critic network parameter θ ^v′ The highest iteration number of each network update is i _max The method comprises the steps of carrying out a first treatment on the surface of the Setting the cumulative gradient to zero: dθ ^ν ≡0 and dθ ^u ζ0, initializing round number w=0.

And step 4, respectively training a higher-layer network and a lower-layer network in the refreshing warp network model before reaching the preset round number.

Specifically, step 4 includes:

step 4.1, when w is less than or equal to w _m In each round, when i-i _s ≠i _max When (1): counter for current step update (i≡i+1); each user perceives the external environment O _i，m (t)；

Discrete decision for channel access exploration using greedy algorithmData offloading decision- >Initially performing a random exploration, each mobile user selecting action a _i，m ＝π _i (O _i，m ): performing edge calculation decision a at the user _i，m Immediately after that, the user gets the current prize r _i，m ＝R(O _i，m ，a _i，m ) The method comprises the steps of carrying out a first treatment on the surface of the The user receives the current strategy a _i，m Is (r) the prize r _i，m And obtain the corresponding input state and obtain the new state O _i，m (t+1) the base station makes each user share its channel access action information with other users by collecting informationWhen the user finds a good data offloading decision, these actions are reported to the base station, which again updates the higher layer network channel access policy>Simultaneously storing decision experience information a _i，m And includes all information ((O) _i，m ，a _i，m ，r _i，m ，W _t ) Transmitting the user observation state to the MEC server, and synchronously updating the user observation state and the channel state of the next time slot; />

Step 4.2, when φ=i-1, i _max At the time, updating the designed actor network and critic network based on the AC model, wherein the actor network parameter theta ^u And critic network parameter θ ^v Updating and synchronizing the gradient θ of all mobile users ^u ＝θ ^u′ And theta ^v ＝θ ^v′ And count global shares updating by the device H++1; continuing to update the network parameters according to the current network model parameters (updated network parameters θ ^u And theta ^v Continuing to explore the environment; sample training is carried out according to rewards of the current strategy and the environmental state at the next moment, and training experience is adopted <O _i，m (t)，a _i，m (t)，r _i，m (t)，O _i，m (t+1)，W _t The method comprises the steps of storing the data in an experience playback pool L, and randomly selecting small samples from the experience playback pool for training and updating parameters of the neural network to realize fitting of the neural network; training the high-level network and the low-level network respectively;

wherein r is _i (θ) is a probability ratio, andepsilon is a hyper-parameter.

And 5, the mobile user searches the environment according to the trained model, and updates the user observation state and the channel state, wherein the method comprises the following steps:

when w > w _m When the method is used for returning: the base station directly transmits all network fitting parameter information to the user, and the user loads a trained model; the mobile user continues to explore the environment according to the trained model, and each user respectively fine-tunes own actor network parameter theta according to the environment ^u And critic network parameter θ ^v Adapting to channel variations.

The invention provides an edge calculation model in a mobile unloading network and a multi-user mobile data unloading method based on a distributed hybrid heterogeneous decision, which aims to provide a scheme for solving the problem of multi-user anti-interference data unloading in the mobile unloading network. The invention combines ideas such as Markov game, multi-user reinforcement learning, multi-domain anti-interference and the like, takes the frequency spectrum waterfall diagram of the receiving end and the channel gain information as the input state of learning, considers the channel switching and bandwidth adjustment expenditure in instant rewarding, and avoids frequent channel switching and bandwidth adjustment. And simultaneously, an LSTM network is introduced to redesign the neural network, then real-time information such as frequency, time and bandwidth is input into the designed neural network structure, and the time-frequency domain joint decision adapting to the current frequency spectrum environment is obtained through the iteration of an algorithm. Simulation results show that the method has extremely strong effectiveness and adaptability in a dynamic interference environment. The superiority of the proposed method is verified compared with other anti-interference methods.

The drawings provided by embodiments of the present application are specifically described below:

fig. 1 is a diagram of a multi-user antijam data offload model in a mobile offload network. In the model, a mobile unloading network comprises a base station, edge computing equipment, a plurality of mobile users and a plurality of jammers, wherein the mobile users need to perform task computation according to data unloading, and the jammers interfere the data unloading process of the mobile users. Wherein each edge computation server is connected to a base station, each base station serving a set of randomly distributed mobile subscribers. There are a variety of tasks with different requirements for mobile users to choose to process, each user can perform one type of task. Each mobile user periodically generates computing tasks with different QoS requirements and computing resource requirements.

FIG. 2 is an overall block diagram of multi-user intelligent data offloading based on distributed hybrid heterogeneous decisions in the present application. The action space has a hierarchical structure of multiple layers of action classifications. The action decision structure can have more than two layers, and the lower structure can be continuous action. The communication architecture based on hierarchical learning presented herein is mainly applied in the field of wireless communication. In a wireless communication scenario, a communication channel with a good channel state is the basis for ensuring inter-user communication for users. Thus, the higher layer network decision is defined herein as a joint discrete spectrum access policy that selects the preferred channel, and the lower layer network as a multi-dimensional continuous decision that adjusts the offloading policy.

Fig. 3 is a diagram of an algorithmic structure model of the proposed method. .

Example 1

The embodiment of the invention is specifically described below, the system simulation adopts python language, and the parameter setting does not affect the generality based on a TensorFlow deep learning framework. The embodiment verifies the effectiveness of the proposed model and method, the external environment parameters are set to be three interference modes of double-sweep interference, comb-like and sweep interference, interference switching is required to be carried out according to a random dynamic mode, and the scanning speed is set to be 100MHz/s. The comb interference is divided into 3 fixed frequency bands of 0-8MHz, 16-24 MHz and 32-40 MHz. The simulation is that 5 mobile users and two base stations are adopted, the working frequency domains between the base stations are not overlapped, but the power influence ranges of the base stations are partially overlapped. The mobile user moves at a constant speed in a direction far from the base station or close to the base station, and the speed is set to be 1m/s during the period. The number of channels available in the system is 10. The maximum transmission bandwidth of the channel transmission is 4MHz, the user and the jammer are opposed in the 40Mhz frequency band, and the user can switch the channel, the bandwidth, the power and the unloading ratio in each time slot. Both the communication signal and the interference signal employ raised cosine waveforms. And sets the demodulation threshold th for all frequencies to 2dB. Setting system is in general The normalized noise power N0 of the signal environment follows a gaussian distribution, the mean μ=1, and the variance σ=0.2. And a user task generating model is introduced to generate a 200Kb-600Kb (uniformly distributed) computing task, and the task arrival process accords with Poisson distribution. User computing power (CPU cycle number) of 1.5X10 ⁸ cycles/s. The computing power of the edge computing device assigned to each user by the base station is 1×10 ⁹ cycles/s. The communication coverage of the base station is 400m. The interference influence range of the jammer is 1km. The user's calculated energy efficiency coefficient is

Parameter DQN method the parameter DQN method uses a centralized controller to issue decisions for all users, and the data offloading policy mode is the same as the proposed method. Because the decision dimension of the method is complex, a binary unloading decision model is adopted to simplify the decision dimension and improve the performance of the parameter DQN algorithm.

Distributed DQN offloading method: the third scheme is the distributed DQN algorithm. DDQN cannot make continuous decisions and therefore binary offload model decisions are employed. The distributed DQN algorithm discretizes the choice of bandwidth and power.

Random unloading method: random the algorithm randomly selects the access channel, transmission power and data offload bandwidth.

Fig. 4 is a graph showing the comparison of the performance of the intelligent offloading method against dynamic random period interference in embodiment 1 of the present invention. The effectiveness and practicality of the proposed method can be seen by means of fig. 4. Energy efficient scheduling of communication resources and data offloading is achieved.

As can be seen from fig. 4, the proposed method has the lowest energy consumption and the fastest convergence speed in the iterative process, and can make targeted dynamic decision adjustment to the external environment of the independent user. This is because the proposed method first performs a preliminary network fit on the centrally uploaded data. And the preliminarily fitted network parameters are sent to each UE network for network training according to the channel environment of each user, so that the pertinence and the effectiveness of the method are improved.

The method reasonably distributes spectrum resources and computing resources in the communication network by mining the transmission probability and potential data arrival distribution rules of the dynamic interference environment. It uses a hierarchical learning architecture to prioritize multidimensional decisions that affect channel access and data offloading. And the data unloading problem is optimized on the premise of selecting the optimal access channel, so that the method performance and the calculation success rate are further improved, and the system cost is reduced. The multi-user anti-interference data unloading model based on the mixed heterogeneous decision fully considers the problems of dynamic change of interference environment, channel fading, multi-user access conflict, complex and huge decision space and the like in the anti-interference data unloading problem, and has more practical significance than the traditional model; the multi-user mobile data unloading method based on the distributed hybrid heterogeneous decision not only ensures the autonomy and the collaboration of the nodes under the condition of limited equipment energy and computing capacity, but also realizes the lightweight design of the algorithm, and effectively solves the problem of data unloading in a dynamic interference environment.

The embodiments of the present application described above do not limit the scope of the present application.

Claims

1. A multi-user intelligent data offloading method based on distributed hybrid heterogeneous decision, the method comprising:

2. The method of claim 1, wherein step 1 performs multidimensional resource management modeling in a mobile offload network based on the set multi-purpose edge computing model, comprising:

step 1.1, modeling an edge computing scenario in a mobile offload network:

the set model rules are as follows: for mobile user i, each task uses a single set of three tuplesComposed of- >The data size of the calculation task is in units of bits; />Representing the number of CPU cycles required for a computing task, +.>Giving the maximum tolerable delay time of the calculation task; when the t-th calculation task is generated or arrived at a certain mobile user terminal, the mobile user system must determine whether the task needs to be partially, completely unloaded to the base station or calculated locally only; o (o) _i,m (t) indicates whether the base station m is connected with the mobile terminal user at the moment t; when o _i,m When (t) =1, the mobile terminal user is connected with the base station m; on the contrary, o _i,m (t)＝0；

wherein the method comprises the steps ofFor the effective capacitive switching parameters of the user, i.e. for calculating the energy efficiency coefficient, a locally calculated weighting cost +.>The definition is as follows:

wherein, gamma is sum ofRespectively delay weight coefficient and energyConsumption weight coefficient;

Energy consumption when task t arrivesThe method comprises the following steps:

edge computation cost for single userThe definition is as follows:

local observation information: o (O) _i,m (t) representing the observation space of the mobile users, the observation information of each user at t time slots being part of the current global state information; the single user observation information in the communication network is limited, so that the mobile user cannot acquire global information, and the local observation value of the mobile user in the time slot t consists of two parts; part is the local channel state and position information observed by the mobile user at time t External environment information; the other part is the information of the task type and the data volume required to be transmitted by the user, and the observation value of the mobile user in the t time slot is O _i,m (t)：

3. The method of claim 1, wherein step 2 of setting a multi-user mobile data offloading neural network model based on distributed hybrid heterogeneous decisions comprises:

4. The method according to claim 1, characterized in that step 3, the initializing process for the observation matrix and the network of the base station and the respective users, comprises:

5. The method of claim 4, wherein training the higher layer network and the lower layer network in the neural network model, respectively, before the preset number of rounds is reached, comprises:

Discrete decision for channel access exploration using greedy algorithmData offloading decision->Initially performing a random exploration, each mobile user selecting action a _i,m ＝π _i (O _i,m ): performing edge calculation decision a at the user _i,m Immediately after that, the user gets the current prize r _i,m ＝R(O _i,m ,a _i,m ) The method comprises the steps of carrying out a first treatment on the surface of the The user receives the current strategy a _i,m Is (r) the prize r _i,m And obtain the corresponding input state and obtain the new state O _i,m (t+1) the base station makes each user share its channel access action information with other users by collecting information +.>When the user finds a good data offloading decision, these actions are reported to the base station, which again updates the higher layer network channel access policy>Simultaneously storing decision experience information a _i,m And includes all information ((O) _i,m ,a _i,m ,r _i,m ,W _t ) Transmitting the user observation state to the MEC server, and synchronously updating the user observation state and the channel state of the next time slot;

wherein r is _t (θ) is a probability ratio, andepsilon is a hyper-parameter.

6. The method of claim 5, wherein step 5, the mobile user exploring the environment based on the trained model, updating the user observation state and the channel state, comprises: