CN111465032B

CN111465032B - Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Info

Publication number: CN111465032B
Application number: CN202010221507.5A
Authority: CN
Inventors: 王力立; 张戈; 奚思遥; 肖强; 黄成�; 单梁
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2023-04-21
Anticipated expiration: 2040-03-26
Also published as: CN111465032A

Abstract

The invention discloses a task unloading method and system based on an A3C algorithm in a multi-wireless body area network environment. The method comprises the following steps: determining a network architecture of a multi-wireless body area network, and initializing network parameters; training a task classifier by using the sampled physiological data to obtain a stable classifier model; training the network resource allocation problem by adopting an A3C algorithm based on deep reinforcement learning to obtain a converged decision network; task unloading is carried out according to the obtained model: and at each moment, firstly, classifying tasks by using a classifier model, and then, accessing a user channel and distributing computing resources of an edge server according to a decision network. The method improves the time delay and the energy consumption performance of task unloading of the multi-wireless body area network, and can be widely applied to the actual application scene of the body area network such as telemedicine and health monitoring.

Description

Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Technical Field

The invention belongs to the field of wireless communication networks, and particularly relates to a task unloading method and system based on an A3C algorithm in a multi-wireless body area network environment.

Background

The wireless body area network is a wireless sensor network taking a human body as a monitoring object. Because of mobility of the human body, inter-network interference is more likely to occur between multiple body area networks, and how to collect and manage data between multiple networks is an important direction of body area network research. The current research shows that the body area network has the characteristics of mobility, high computation and low time delay, and the task unloading can be assisted by edge computation, namely, a base station provided with an edge server is placed at the edge of a plurality of networks to perform unified collection and processing of tasks. Because the special body area network of the monitored object has stricter requirements on time delay and energy consumption, a reasonable task unloading method must be designed to ensure low time delay and low energy consumption of data transmission.

In the existing research related to data transmission of the multi-body area network and the data center, most of algorithm researches are based on the research of a generalized communication network, and no attempt is made to conduct targeted research by combining the data characteristics and the user characteristics of the body area network. In fact, physiological data monitored by the body area network has very important practical significance, and meanwhile, the moving track of the body area network user has own characteristics. Existing offloading methods do not take these characteristics into account and therefore often fail to meet the stringent latency and energy requirements of wireless body area networks.

Disclosure of Invention

The invention aims to provide a task unloading method and a task unloading system in a multi-wireless body area network environment, so that the task state and the movement characteristic of a user can be fully considered when the system performs task unloading, and smaller system time delay and energy consumption are achieved.

The technical solution for realizing the purpose of the invention is as follows: a task offloading method based on an A3C algorithm in a multi-wireless body area network environment comprises the following steps:

step 1, constructing a network architecture of a plurality of wireless body area networks, and initializing network parameters;

step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier;

step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network;

and step 4, task unloading of the multi-wireless body area network is carried out according to the obtained task classifier and the decision network.

Further, the network architecture of the wireless body area networks in step 1, wherein the network parameters include a user set

Base station set->

RGMM movement model parameters of user, base station position l _s ＝(x _s ,y _s ) Channel gain h _d,s (t), data transfer Rate R _d,s Task class beta _d E {0,1}, task offload energy consumption e _d And time delay t of task unloading _d 。

Further, the training classifier in step 2, the specific process of obtaining the user task classifier includes:

step 2-1, estimating a stable interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof _up And a lower limit x _low The method comprises the following steps of:

/>

in the method, in the process of the invention,

sum s _x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t _α,n-1 Representing the t-distribution coefficient when the sample size is n;

step 2-2, adding a label to the corresponding physiological data sample aiming at each physiological characteristic, wherein the method specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data samples outside the stable interval to represent an urgent task.

And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class.

Further, in step 3, the training of the resource allocation problem during task offloading by using the A3C algorithm includes the following specific steps:

step 3-1, fundingThe source allocation problem is converted into a Markov decision problem, and the Markov decision problem model, namely the decision network, specifically comprises: state S _t Action a _t And prize value r _t ；

State S _t Set to { b _d (t),β _d (t),l _d (t),E _d (t) } wherein the first two items b _d (t)、β _d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l _d (t) is the location status of user d; fourth item E _d (t) is an energy state;

action a _t Set to alpha _d,s E {0,1} and f _d,s ，α _d,s Indicating whether or not to offload the task of user d to base station s, f _d,s Indicating the computing resources allocated to user d by base station s,

will award value r _t The method comprises the following steps:

wherein K is _d To be systematic benefit, t _static And e _static Respectively representing time delay and energy consumption under a static allocation method, t _d And e _d Respectively representing the time and total energy consumption of user d's task completion,

the weight factors of time delay and energy consumption respectively meet

And->

Step 3-2, training the decision network, specifically including: based on the determined state s _t By a decision networkDetermining action a in this state _t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r _t Obtaining an empirical sequence (s _t ,a _t ,r _t ) Defining a dominance function A (s _t ,a _t ) Representing state s _t Lower motion a _t Is an advantage of the following:

wherein Q(s) _t ,a _t ) As a function of Q value, V (s _t ) As a function of value, gamma is a discount factor, pi _ω A decision-making offloading method;

iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:

in the formula, pi _w (s _t ,a _t ) Represented in state s _t Lower selection action a _t θ is a parameter of the decision network, E is a mean function _w Is a gradient operator.

Further, in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.

A task offloading system based on an A3C algorithm in a multi-wireless body area network environment, the system comprising:

the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;

the task classifier generating module is used for acquiring physiological data of a user, training the classifier according to the data and obtaining a task classifier;

the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;

and the task unloading module is used for carrying out task unloading of the multi-wireless body area network according to the obtained task classifier and the decision network.

Compared with the prior art, the invention has the remarkable advantages that: 1) The data characteristics in the wireless body area network and the movement characteristics of the user are comprehensively considered, so that the time delay and the energy consumption of system task unloading are reduced; 2) The A3C algorithm based on deep reinforcement learning is adopted to optimize the task unloading process of the multi-wireless body area network, and the intelligent autonomous dynamic unloading of the system can be realized under the condition that the system environment is unknown.

The invention is described in further detail below with reference to the accompanying drawings.

Drawings

Fig. 1 is a flowchart of a task offloading method based on an A3C algorithm in a multi-wireless body area network environment in one embodiment.

FIG. 2 is a flow diagram of training a task classifier in one embodiment.

Fig. 3 is a diagram of a multi-wireless body area network architecture in one embodiment.

FIG. 4 is a graph of training benefit variation for the A3C algorithm in one embodiment.

FIG. 5 is a graph of training benefit variation based on a greedy algorithm in one embodiment.

Detailed Description

In one embodiment, in conjunction with fig. 1, there is provided a task offloading method based on an A3C algorithm in a multi-wireless body area network environment, the method comprising the steps of:

Further, in one embodiment, the network architecture of the plurality of wireless body area networks in step 1, the network parameters include a user set

Base station set->

Further, in one embodiment, in combination with fig. 2, training the classifier in step 2, a user task classifier is obtained, which specifically includes:

in the method, in the process of the invention,

Further, in one embodiment, in step 3, the resource allocation problem during task offloading is trained by using an A3C algorithm, so as to obtain a decision network, and the specific process includes:

step 3-1, converting the resource allocation problem into a Markov decision problem, wherein the Markov decision problem model, namely the decision network, specifically comprises: state S _t Action a _t And prize value r _t ；

will award value r _t The method comprises the following steps:

the weight factors of time delay and energy consumption respectively meet

And->

Step 3-2, training the decision network, specifically including: based on the determined state s _t Determining action a in this state by the decision network _t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r _t Obtaining an empirical sequence (s _t ,a _t ,r _t ) Defining a dominance function A (s _t ,a _t ) Representing state s _t Lower motion a _t Is an advantage of the following:

Further, in one embodiment, in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.

Further, in one embodiment, the task classifier generating module includes:

a stationary interval setting unit for estimating a stationary interval of each physiological feature using the t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof _up And a lower limit x _low The method comprises the following steps of:

in the method, in the process of the invention,

the task labeling unit is used for adding labels for the corresponding physiological data samples aiming at each physiological characteristic, and specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data samples outside the stable interval to represent an urgent task.

The classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into the support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class.

Further, in one embodiment, the decision network generation module includes:

the decision network construction unit is configured to convert the resource allocation problem into a markov decision problem, where the markov decision problem model, i.e., the decision network specifically includes: state S _t Action a _t And prize value r _t ；

will award value r _t The method comprises the following steps:

/>

the weight factors of time delay and energy consumption respectively meet

And->

The decision network training unit is used for training the decision network and specifically comprises the following steps: based on the determined state s _t Determining action a in this state by the decision network _t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r _t Obtaining an empirical sequence (s _t ,a _t ,r _t ) Defining a dominance function A (s _t ,a _t ) Representing state s _t Lower motion a _t Is an advantage of the following:

in the formula, pi _w (s _t ,a _t ) Represented in state s _t Lower selection action a _t And θ is a parameter of the decision network,e is a mean function, v _w Is a gradient operator.

In one embodiment, as a specific example, the present invention is further described and verified, and the specific contents include:

firstly, a multi-wireless body area network system is established according to the architecture of fig. 3, and network parameters are initialized. And then carrying out the calculation of the stable section, the addition of the data label and the training of the classifier in the step 2 according to the acquired physiological data of the human body. Training of a task offloading method based on an A3C algorithm is performed from these data sets.

Status s of task offloading problem in the embodiment according to step 3-1 described above _t Action a _t Prize r _t Modeling, there is a more stringent requirement for latency for body area networks targeting health monitoring, so the weight factors for latency and energy consumption in step 3-1 are set to

And then training the decision network by adopting an A3C algorithm according to the step 3-2. The parameters in the algorithm are set as follows: discount factor γ=0.99, learning rate is 0.001.

During the training phase, after each task load is completed, the state vector s of the system is calculated _t Inputting the vector into a decision network, outputting the unloading method at the next moment to unload the task, feeding back the time delay and the energy consumption to the decision network in the form of rewarding values, recording the values and calculating the dominance function A(s) _t ,a _t ) And then updating parameters of the decision network until the average rewards converge.

Fig. 4 and 5 are graphs showing the system delay and energy consumption benefit after the conventional unloading method and the A3C-based unloading method (A3C-based Offloading and Joint Resource Allocation, AOJRA) according to the present embodiment are adopted respectively. The conventional offloading method is a greedy concept-based offloading method (Greedy Offloading and Joint Resource Allocation, GOJRA).

In fig. 4, the system benefit of the AOJRA method starts training around 0.8 in 3000 training, and rapidly improves with constant training, stabilizing around 7 at about 2000 training cycles. According to the definition of the system benefit function in step 3-1, the benefit value at 7 represents a total benefit of 7 for system latency and energy consumption relative to the SORA method. Considering that the number of system users is 20 in the embodiment, the total benefit is averagely distributed to each user to be 0.35, and compared with an SORA method, the AOJRA method of the invention averagely improves the time delay and the energy consumption performance of each user by 35%. Through similar analysis, the GOJRA method in FIG. 5 can improve the delay and energy consumption performance of each user by 29% on average compared with the SORA method.

Compared with the traditional GOJRA method, the AOJRA method can improve the time delay and the energy consumption performance of users more, not only considers the influence of channel gain when the task is unloaded, but also further considers the interference among different users when the users simultaneously perform data transmission, and can effectively avoid the increase of time delay and energy consumption caused by network congestion and base station computing resource shortage due to the fact that a large number of users select the same base station to perform data transmission in the same time.

In summary, the method reduces the time delay and the energy consumption of system task unloading under the condition of considering the data characteristics of the wireless body area network and the mobile characteristics of the user. The invention can improve the capability of the wireless body area network to serve human life more quickly, and can be widely applied to the actual application scene of the body area network such as telemedicine and health monitoring.

Claims

1. The task offloading method based on the A3C algorithm in the multi-wireless body area network environment is characterized by comprising the following steps:

step 1, constructing a network architecture of a plurality of wireless body area networks, and initializing network parameters; the network architecture of the wireless body area networks comprises a user set

Base station set->

RGMM movement model parameters of user, base station position l _s ＝(x _s ,y _s ) Channel gain h _d,s (t), data transfer Rate R _d,s Task class beta _d E {0,1}, task offload energy consumption e _d And time delay t of task unloading _d ；

Step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier; the training classifier is used for obtaining a task classifier, and the specific process comprises the following steps:

in the method, in the process of the invention,

step 2-2, adding a label to the corresponding physiological data sample aiming at each physiological characteristic, wherein the method specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; adding a label 1 to the physiological data sample outside the stable interval to represent an urgent task;

step 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class;

step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network; the specific process comprises the following steps:

will award value r _t The method comprises the following steps:

wherein K is _d To be systematic benefit, t _static And e _static Respectively representing time delay and energy consumption under a static allocation method, t _d And e _d The task offloading latency and task offloading energy consumption of user d are represented respectively,

the weight factors of time delay and energy consumption respectively meet

And->

for the Q-value function under decision offloading method, < +.>

For decision offloading method lower value function, +.>

The dominant function is used for deciding and unloading the method;

in the formula, pi _w (s _t ,a _t ) Represented in state s _t Lower selection action a _t And θ is a parameter of the decision network,

as a mean function>

As a gradient operator, J (theta) is a cost function;

2. The task offloading method based on an A3C algorithm in a multi-wireless body area network environment according to claim 1, wherein in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.

3. A task offloading system based on an A3C algorithm in a multi-wireless body area network environment, the system comprising:

the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters; the network architecture of the wireless body area networks comprises a user set

Base station set

The task classifier generating module is used for acquiring physiological data of a user, training the classifier according to the data and obtaining a task classifier; the task classifier generation module includes:

in the method, in the process of the invention,

the task labeling unit is used for adding labels for the corresponding physiological data samples aiming at each physiological characteristic, and specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; adding a label 1 to the physiological data sample outside the stable interval to represent an urgent task;

the classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into the support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting task categories of the data;

the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network; the decision network generation module comprises:

will award value r _t The method comprises the following steps:

the weight factors of time delay and energy consumption respectively meet

And->

The decision network training unit is used for training the decision network and specifically comprises the following steps: based on the determined state s _t Determining action a in this state by the decision network _t I.e. eachThe base station to which the user should access and the computing resources allocated by the base station, and then enter a new state to obtain the prize r _t Obtaining an empirical sequence (s _t ,a _t ,r _t ) Defining a dominance function A (s _t ,a _t ) Representing state s _t Lower motion a _t Is an advantage of the following:

for the Q-value function under decision offloading method, < +.>

For decision offloading method lower value function, +.>

The dominant function is used for deciding and unloading the method;

as a mean function>

As a gradient operator, J (theta) is a cost function;