CN111465032B - Task unloading method and system based on A3C algorithm in multi-wireless body area network environment - Google Patents

Task unloading method and system based on A3C algorithm in multi-wireless body area network environment Download PDF

Info

Publication number
CN111465032B
CN111465032B CN202010221507.5A CN202010221507A CN111465032B CN 111465032 B CN111465032 B CN 111465032B CN 202010221507 A CN202010221507 A CN 202010221507A CN 111465032 B CN111465032 B CN 111465032B
Authority
CN
China
Prior art keywords
task
network
decision
classifier
body area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010221507.5A
Other languages
Chinese (zh)
Other versions
CN111465032A (en
Inventor
王力立
张戈
奚思遥
肖强
黄成�
单梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202010221507.5A priority Critical patent/CN111465032B/en
Publication of CN111465032A publication Critical patent/CN111465032A/en
Application granted granted Critical
Publication of CN111465032B publication Critical patent/CN111465032B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • H04W52/0212Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave
    • H04W52/0216Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave using a pre-established activity schedule, e.g. traffic indication frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a task unloading method and system based on an A3C algorithm in a multi-wireless body area network environment. The method comprises the following steps: determining a network architecture of a multi-wireless body area network, and initializing network parameters; training a task classifier by using the sampled physiological data to obtain a stable classifier model; training the network resource allocation problem by adopting an A3C algorithm based on deep reinforcement learning to obtain a converged decision network; task unloading is carried out according to the obtained model: and at each moment, firstly, classifying tasks by using a classifier model, and then, accessing a user channel and distributing computing resources of an edge server according to a decision network. The method improves the time delay and the energy consumption performance of task unloading of the multi-wireless body area network, and can be widely applied to the actual application scene of the body area network such as telemedicine and health monitoring.

Description

Task unloading method and system based on A3C algorithm in multi-wireless body area network environment
Technical Field
The invention belongs to the field of wireless communication networks, and particularly relates to a task unloading method and system based on an A3C algorithm in a multi-wireless body area network environment.
Background
The wireless body area network is a wireless sensor network taking a human body as a monitoring object. Because of mobility of the human body, inter-network interference is more likely to occur between multiple body area networks, and how to collect and manage data between multiple networks is an important direction of body area network research. The current research shows that the body area network has the characteristics of mobility, high computation and low time delay, and the task unloading can be assisted by edge computation, namely, a base station provided with an edge server is placed at the edge of a plurality of networks to perform unified collection and processing of tasks. Because the special body area network of the monitored object has stricter requirements on time delay and energy consumption, a reasonable task unloading method must be designed to ensure low time delay and low energy consumption of data transmission.
In the existing research related to data transmission of the multi-body area network and the data center, most of algorithm researches are based on the research of a generalized communication network, and no attempt is made to conduct targeted research by combining the data characteristics and the user characteristics of the body area network. In fact, physiological data monitored by the body area network has very important practical significance, and meanwhile, the moving track of the body area network user has own characteristics. Existing offloading methods do not take these characteristics into account and therefore often fail to meet the stringent latency and energy requirements of wireless body area networks.
Disclosure of Invention
The invention aims to provide a task unloading method and a task unloading system in a multi-wireless body area network environment, so that the task state and the movement characteristic of a user can be fully considered when the system performs task unloading, and smaller system time delay and energy consumption are achieved.
The technical solution for realizing the purpose of the invention is as follows: a task offloading method based on an A3C algorithm in a multi-wireless body area network environment comprises the following steps:
step 1, constructing a network architecture of a plurality of wireless body area networks, and initializing network parameters;
step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier;
step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network;
and step 4, task unloading of the multi-wireless body area network is carried out according to the obtained task classifier and the decision network.
Further, the network architecture of the wireless body area networks in step 1, wherein the network parameters include a user set
Figure BDA0002426245780000011
Base station set->
Figure BDA0002426245780000012
RGMM movement model parameters of user, base station position l s =(x s ,y s ) Channel gain h d,s (t), data transfer Rate R d,s Task class beta d E {0,1}, task offload energy consumption e d And time delay t of task unloading d
Further, the training classifier in step 2, the specific process of obtaining the user task classifier includes:
step 2-1, estimating a stable interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
Figure BDA0002426245780000021
/>
Figure BDA0002426245780000022
in the method, in the process of the invention,
Figure BDA0002426245780000023
sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label to the corresponding physiological data sample aiming at each physiological characteristic, wherein the method specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data samples outside the stable interval to represent an urgent task.
And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class.
Further, in step 3, the training of the resource allocation problem during task offloading by using the A3C algorithm includes the following specific steps:
step 3-1, fundingThe source allocation problem is converted into a Markov decision problem, and the Markov decision problem model, namely the decision network, specifically comprises: state S t Action a t And prize value r t
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
Figure BDA0002426245780000025
will award value r t The method comprises the following steps:
Figure BDA0002426245780000024
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d Respectively representing the time and total energy consumption of user d's task completion,
Figure BDA0002426245780000031
the weight factors of time delay and energy consumption respectively meet
Figure BDA0002426245780000032
And->
Figure BDA0002426245780000033
Step 3-2, training the decision network, specifically including: based on the determined state s t By a decision networkDetermining action a in this state t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
Figure BDA0002426245780000034
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
Figure BDA0002426245780000035
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t θ is a parameter of the decision network, E is a mean function w Is a gradient operator.
Further, in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.
A task offloading system based on an A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
the task classifier generating module is used for acquiring physiological data of a user, training the classifier according to the data and obtaining a task classifier;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;
and the task unloading module is used for carrying out task unloading of the multi-wireless body area network according to the obtained task classifier and the decision network.
Compared with the prior art, the invention has the remarkable advantages that: 1) The data characteristics in the wireless body area network and the movement characteristics of the user are comprehensively considered, so that the time delay and the energy consumption of system task unloading are reduced; 2) The A3C algorithm based on deep reinforcement learning is adopted to optimize the task unloading process of the multi-wireless body area network, and the intelligent autonomous dynamic unloading of the system can be realized under the condition that the system environment is unknown.
The invention is described in further detail below with reference to the accompanying drawings.
Drawings
Fig. 1 is a flowchart of a task offloading method based on an A3C algorithm in a multi-wireless body area network environment in one embodiment.
FIG. 2 is a flow diagram of training a task classifier in one embodiment.
Fig. 3 is a diagram of a multi-wireless body area network architecture in one embodiment.
FIG. 4 is a graph of training benefit variation for the A3C algorithm in one embodiment.
FIG. 5 is a graph of training benefit variation based on a greedy algorithm in one embodiment.
Detailed Description
In one embodiment, in conjunction with fig. 1, there is provided a task offloading method based on an A3C algorithm in a multi-wireless body area network environment, the method comprising the steps of:
step 1, constructing a network architecture of a plurality of wireless body area networks, and initializing network parameters;
step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier;
step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network;
and step 4, task unloading of the multi-wireless body area network is carried out according to the obtained task classifier and the decision network.
Further, in one embodiment, the network architecture of the plurality of wireless body area networks in step 1, the network parameters include a user set
Figure BDA0002426245780000043
Base station set->
Figure BDA0002426245780000044
RGMM movement model parameters of user, base station position l s =(x s ,y s ) Channel gain h d,s (t), data transfer Rate R d,s Task class beta d E {0,1}, task offload energy consumption e d And time delay t of task unloading d
Further, in one embodiment, in combination with fig. 2, training the classifier in step 2, a user task classifier is obtained, which specifically includes:
step 2-1, estimating a stable interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
Figure BDA0002426245780000041
Figure BDA0002426245780000042
in the method, in the process of the invention,
Figure BDA0002426245780000051
sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label to the corresponding physiological data sample aiming at each physiological characteristic, wherein the method specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data samples outside the stable interval to represent an urgent task.
And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class.
Further, in one embodiment, in step 3, the resource allocation problem during task offloading is trained by using an A3C algorithm, so as to obtain a decision network, and the specific process includes:
step 3-1, converting the resource allocation problem into a Markov decision problem, wherein the Markov decision problem model, namely the decision network, specifically comprises: state S t Action a t And prize value r t
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
Figure BDA0002426245780000056
will award value r t The method comprises the following steps:
Figure BDA0002426245780000052
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d Respectively representing the time and total energy consumption of user d's task completion,
Figure BDA0002426245780000053
the weight factors of time delay and energy consumption respectively meet
Figure BDA0002426245780000054
And->
Figure BDA0002426245780000055
Step 3-2, training the decision network, specifically including: based on the determined state s t Determining action a in this state by the decision network t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
Figure BDA0002426245780000061
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
Figure BDA0002426245780000062
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t θ is a parameter of the decision network, E is a mean function w Is a gradient operator.
Further, in one embodiment, in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.
A task offloading system based on an A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
the task classifier generating module is used for acquiring physiological data of a user, training the classifier according to the data and obtaining a task classifier;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;
and the task unloading module is used for carrying out task unloading of the multi-wireless body area network according to the obtained task classifier and the decision network.
Further, in one embodiment, the task classifier generating module includes:
a stationary interval setting unit for estimating a stationary interval of each physiological feature using the t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
Figure BDA0002426245780000063
Figure BDA0002426245780000064
in the method, in the process of the invention,
Figure BDA0002426245780000065
sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
the task labeling unit is used for adding labels for the corresponding physiological data samples aiming at each physiological characteristic, and specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data samples outside the stable interval to represent an urgent task.
The classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into the support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class.
Further, in one embodiment, the decision network generation module includes:
the decision network construction unit is configured to convert the resource allocation problem into a markov decision problem, where the markov decision problem model, i.e., the decision network specifically includes: state S t Action a t And prize value r t
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
Figure BDA0002426245780000076
will award value r t The method comprises the following steps:
Figure BDA0002426245780000071
/>
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d Respectively representing the time and total energy consumption of user d's task completion,
Figure BDA0002426245780000072
the weight factors of time delay and energy consumption respectively meet
Figure BDA0002426245780000073
And->
Figure BDA0002426245780000074
The decision network training unit is used for training the decision network and specifically comprises the following steps: based on the determined state s t Determining action a in this state by the decision network t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
Figure BDA0002426245780000075
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
Figure BDA0002426245780000081
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t And θ is a parameter of the decision network,e is a mean function, v w Is a gradient operator.
In one embodiment, as a specific example, the present invention is further described and verified, and the specific contents include:
firstly, a multi-wireless body area network system is established according to the architecture of fig. 3, and network parameters are initialized. And then carrying out the calculation of the stable section, the addition of the data label and the training of the classifier in the step 2 according to the acquired physiological data of the human body. Training of a task offloading method based on an A3C algorithm is performed from these data sets.
Status s of task offloading problem in the embodiment according to step 3-1 described above t Action a t Prize r t Modeling, there is a more stringent requirement for latency for body area networks targeting health monitoring, so the weight factors for latency and energy consumption in step 3-1 are set to
Figure BDA0002426245780000082
And then training the decision network by adopting an A3C algorithm according to the step 3-2. The parameters in the algorithm are set as follows: discount factor γ=0.99, learning rate is 0.001.
During the training phase, after each task load is completed, the state vector s of the system is calculated t Inputting the vector into a decision network, outputting the unloading method at the next moment to unload the task, feeding back the time delay and the energy consumption to the decision network in the form of rewarding values, recording the values and calculating the dominance function A(s) t ,a t ) And then updating parameters of the decision network until the average rewards converge.
Fig. 4 and 5 are graphs showing the system delay and energy consumption benefit after the conventional unloading method and the A3C-based unloading method (A3C-based Offloading and Joint Resource Allocation, AOJRA) according to the present embodiment are adopted respectively. The conventional offloading method is a greedy concept-based offloading method (Greedy Offloading and Joint Resource Allocation, GOJRA).
In fig. 4, the system benefit of the AOJRA method starts training around 0.8 in 3000 training, and rapidly improves with constant training, stabilizing around 7 at about 2000 training cycles. According to the definition of the system benefit function in step 3-1, the benefit value at 7 represents a total benefit of 7 for system latency and energy consumption relative to the SORA method. Considering that the number of system users is 20 in the embodiment, the total benefit is averagely distributed to each user to be 0.35, and compared with an SORA method, the AOJRA method of the invention averagely improves the time delay and the energy consumption performance of each user by 35%. Through similar analysis, the GOJRA method in FIG. 5 can improve the delay and energy consumption performance of each user by 29% on average compared with the SORA method.
Compared with the traditional GOJRA method, the AOJRA method can improve the time delay and the energy consumption performance of users more, not only considers the influence of channel gain when the task is unloaded, but also further considers the interference among different users when the users simultaneously perform data transmission, and can effectively avoid the increase of time delay and energy consumption caused by network congestion and base station computing resource shortage due to the fact that a large number of users select the same base station to perform data transmission in the same time.
In summary, the method reduces the time delay and the energy consumption of system task unloading under the condition of considering the data characteristics of the wireless body area network and the mobile characteristics of the user. The invention can improve the capability of the wireless body area network to serve human life more quickly, and can be widely applied to the actual application scene of the body area network such as telemedicine and health monitoring.

Claims (3)

1. The task offloading method based on the A3C algorithm in the multi-wireless body area network environment is characterized by comprising the following steps:
step 1, constructing a network architecture of a plurality of wireless body area networks, and initializing network parameters; the network architecture of the wireless body area networks comprises a user set
Figure FDA0004112113840000011
Base station set->
Figure FDA0004112113840000012
RGMM movement model parameters of user, base station position l s =(x s ,y s ) Channel gain h d,s (t), data transfer Rate R d,s Task class beta d E {0,1}, task offload energy consumption e d And time delay t of task unloading d
Step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier; the training classifier is used for obtaining a task classifier, and the specific process comprises the following steps:
step 2-1, estimating a stable interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
Figure FDA0004112113840000013
Figure FDA0004112113840000014
in the method, in the process of the invention,
Figure FDA0004112113840000015
sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label to the corresponding physiological data sample aiming at each physiological characteristic, wherein the method specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; adding a label 1 to the physiological data sample outside the stable interval to represent an urgent task;
step 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class;
step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network; the specific process comprises the following steps:
step 3-1, converting the resource allocation problem into a Markov decision problem, wherein the Markov decision problem model, namely the decision network, specifically comprises: state S t Action a t And prize value r t
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
Figure FDA0004112113840000021
will award value r t The method comprises the following steps:
Figure FDA0004112113840000022
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d The task offloading latency and task offloading energy consumption of user d are represented respectively,
Figure FDA0004112113840000023
the weight factors of time delay and energy consumption respectively meet
Figure FDA0004112113840000024
And->
Figure FDA0004112113840000025
Step 3-2, training the decision network, specifically including: based on the determined state s t Determining action a in this state by the decision network t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
Figure FDA0004112113840000026
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;
Figure FDA0004112113840000027
for the Q-value function under decision offloading method, < +.>
Figure FDA0004112113840000028
For decision offloading method lower value function, +.>
Figure FDA0004112113840000029
The dominant function is used for deciding and unloading the method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
Figure FDA00041121138400000210
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t And θ is a parameter of the decision network,
Figure FDA00041121138400000211
as a mean function>
Figure FDA00041121138400000212
As a gradient operator, J (theta) is a cost function;
and step 4, task unloading of the multi-wireless body area network is carried out according to the obtained task classifier and the decision network.
2. The task offloading method based on an A3C algorithm in a multi-wireless body area network environment according to claim 1, wherein in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.
3. A task offloading system based on an A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters; the network architecture of the wireless body area networks comprises a user set
Figure FDA0004112113840000031
Base station set
Figure FDA0004112113840000032
RGMM movement model parameters of user, base station position l s =(x s ,y s ) Channel gain h d,s (t), data transfer Rate R d,s Task class beta d E {0,1}, task offload energy consumption e d And time delay t of task unloading d
The task classifier generating module is used for acquiring physiological data of a user, training the classifier according to the data and obtaining a task classifier; the task classifier generation module includes:
a stationary interval setting unit for estimating a stationary interval of each physiological feature using the t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
Figure FDA0004112113840000033
Figure FDA0004112113840000034
in the method, in the process of the invention,
Figure FDA0004112113840000035
sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
the task labeling unit is used for adding labels for the corresponding physiological data samples aiming at each physiological characteristic, and specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; adding a label 1 to the physiological data sample outside the stable interval to represent an urgent task;
the classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into the support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting task categories of the data;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network; the decision network generation module comprises:
the decision network construction unit is configured to convert the resource allocation problem into a markov decision problem, where the markov decision problem model, i.e., the decision network specifically includes: state S t Action a t And prize value r t
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
Figure FDA0004112113840000041
will award value r t The method comprises the following steps:
Figure FDA0004112113840000042
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d Respectively representing the time and total energy consumption of user d's task completion,
Figure FDA0004112113840000043
the weight factors of time delay and energy consumption respectively meet
Figure FDA0004112113840000044
And->
Figure FDA0004112113840000045
The decision network training unit is used for training the decision network and specifically comprises the following steps: based on the determined state s t Determining action a in this state by the decision network t I.e. eachThe base station to which the user should access and the computing resources allocated by the base station, and then enter a new state to obtain the prize r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
Figure FDA0004112113840000046
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;
Figure FDA0004112113840000047
for the Q-value function under decision offloading method, < +.>
Figure FDA0004112113840000048
For decision offloading method lower value function, +.>
Figure FDA0004112113840000049
The dominant function is used for deciding and unloading the method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
Figure FDA00041121138400000410
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t And θ is a parameter of the decision network,
Figure FDA00041121138400000411
as a mean function>
Figure FDA00041121138400000412
As a gradient operator, J (theta) is a cost function;
and the task unloading module is used for carrying out task unloading of the multi-wireless body area network according to the obtained task classifier and the decision network.
CN202010221507.5A 2020-03-26 2020-03-26 Task unloading method and system based on A3C algorithm in multi-wireless body area network environment Active CN111465032B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010221507.5A CN111465032B (en) 2020-03-26 2020-03-26 Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010221507.5A CN111465032B (en) 2020-03-26 2020-03-26 Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Publications (2)

Publication Number Publication Date
CN111465032A CN111465032A (en) 2020-07-28
CN111465032B true CN111465032B (en) 2023-04-21

Family

ID=71680230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010221507.5A Active CN111465032B (en) 2020-03-26 2020-03-26 Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Country Status (1)

Country Link
CN (1) CN111465032B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241295A (en) * 2020-10-28 2021-01-19 深圳供电局有限公司 Cloud edge cooperative computing unloading method and system based on deep reinforcement learning
CN113645637B (en) * 2021-07-12 2022-09-16 中山大学 Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110557769A (en) * 2019-09-12 2019-12-10 南京邮电大学 C-RAN calculation unloading and resource allocation method based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109219101B (en) * 2018-09-21 2021-09-10 南京理工大学 Route establishing method based on quadratic moving average prediction method in wireless body area network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110557769A (en) * 2019-09-12 2019-12-10 南京邮电大学 C-RAN calculation unloading and resource allocation method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN111465032A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
Xu et al. Edge intelligence: Empowering intelligence to the edge of network
US10802992B2 (en) Combining CPU and special accelerator for implementing an artificial neural network
CN111465032B (en) Task unloading method and system based on A3C algorithm in multi-wireless body area network environment
Imteaj et al. Federated learning for resource-constrained iot devices: Panoramas and state of the art
US8473432B2 (en) Issue resolution in expert networks
CN114219097B (en) Federal learning training and predicting method and system based on heterogeneous resources
Hung Adaptive Fuzzy-GARCH model applied to forecasting the volatility of stock markets using particle swarm optimization
CN113284142B (en) Image detection method, image detection device, computer-readable storage medium and computer equipment
CN111026548A (en) Power communication equipment test resource scheduling method for reverse deep reinforcement learning
CN111079780A (en) Training method of space map convolution network, electronic device and storage medium
CN112835715B (en) Method and device for determining task unloading strategy of unmanned aerial vehicle based on reinforcement learning
CN106793031A (en) Based on the smart mobile phone energy consumption optimization method for gathering competing excellent algorithm
CN115809147B (en) Multi-edge collaborative cache scheduling optimization method, system and model training method
CN111291618A (en) Labeling method, device, server and storage medium
Bebortta et al. DeepMist: Toward deep learning assisted mist computing framework for managing healthcare big data
Lei et al. An improved variable neighborhood search for parallel drone scheduling traveling salesman problem
CN114595396A (en) Sequence recommendation method and system based on federal learning
Abbas et al. Meta-heuristic-based offloading task optimization in mobile edge computing
Chen et al. One for all: Traffic prediction at heterogeneous 5g edge with data-efficient transfer learning
CN110175708B (en) Model and method for predicting food materials in online increment mode
CN109754075B (en) Scheduling method, device, storage medium and device for wireless sensor network node
CN111401551A (en) Weak supervision self-learning method based on reinforcement learning
US12002202B2 (en) Meta-learning for cardiac MRI segmentation
Jaiswal et al. Analyze Classification Act of Data Mining Schemes
Zhou et al. Computing Offloading Based on TD3 Algorithm in Cache-Assisted Vehicular NOMA–MEC Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant