CN111465032B - Task unloading method and system based on A3C algorithm in multi-wireless body area network environment - Google Patents
Task unloading method and system based on A3C algorithm in multi-wireless body area network environment Download PDFInfo
- Publication number
- CN111465032B CN111465032B CN202010221507.5A CN202010221507A CN111465032B CN 111465032 B CN111465032 B CN 111465032B CN 202010221507 A CN202010221507 A CN 202010221507A CN 111465032 B CN111465032 B CN 111465032B
- Authority
- CN
- China
- Prior art keywords
- task
- network
- decision
- classifier
- body area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/22—Traffic simulation tools or models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
- H04W52/0212—Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave
- H04W52/0216—Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave using a pre-established activity schedule, e.g. traffic indication frame
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W84/00—Network topologies
- H04W84/18—Self-organising networks, e.g. ad-hoc networks or sensor networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a task unloading method and system based on an A3C algorithm in a multi-wireless body area network environment. The method comprises the following steps: determining a network architecture of a multi-wireless body area network, and initializing network parameters; training a task classifier by using the sampled physiological data to obtain a stable classifier model; training the network resource allocation problem by adopting an A3C algorithm based on deep reinforcement learning to obtain a converged decision network; task unloading is carried out according to the obtained model: and at each moment, firstly, classifying tasks by using a classifier model, and then, accessing a user channel and distributing computing resources of an edge server according to a decision network. The method improves the time delay and the energy consumption performance of task unloading of the multi-wireless body area network, and can be widely applied to the actual application scene of the body area network such as telemedicine and health monitoring.
Description
Technical Field
The invention belongs to the field of wireless communication networks, and particularly relates to a task unloading method and system based on an A3C algorithm in a multi-wireless body area network environment.
Background
The wireless body area network is a wireless sensor network taking a human body as a monitoring object. Because of mobility of the human body, inter-network interference is more likely to occur between multiple body area networks, and how to collect and manage data between multiple networks is an important direction of body area network research. The current research shows that the body area network has the characteristics of mobility, high computation and low time delay, and the task unloading can be assisted by edge computation, namely, a base station provided with an edge server is placed at the edge of a plurality of networks to perform unified collection and processing of tasks. Because the special body area network of the monitored object has stricter requirements on time delay and energy consumption, a reasonable task unloading method must be designed to ensure low time delay and low energy consumption of data transmission.
In the existing research related to data transmission of the multi-body area network and the data center, most of algorithm researches are based on the research of a generalized communication network, and no attempt is made to conduct targeted research by combining the data characteristics and the user characteristics of the body area network. In fact, physiological data monitored by the body area network has very important practical significance, and meanwhile, the moving track of the body area network user has own characteristics. Existing offloading methods do not take these characteristics into account and therefore often fail to meet the stringent latency and energy requirements of wireless body area networks.
Disclosure of Invention
The invention aims to provide a task unloading method and a task unloading system in a multi-wireless body area network environment, so that the task state and the movement characteristic of a user can be fully considered when the system performs task unloading, and smaller system time delay and energy consumption are achieved.
The technical solution for realizing the purpose of the invention is as follows: a task offloading method based on an A3C algorithm in a multi-wireless body area network environment comprises the following steps:
and step 4, task unloading of the multi-wireless body area network is carried out according to the obtained task classifier and the decision network.
Further, the network architecture of the wireless body area networks in step 1, wherein the network parameters include a user setBase station set->RGMM movement model parameters of user, base station position l s =(x s ,y s ) Channel gain h d,s (t), data transfer Rate R d,s Task class beta d E {0,1}, task offload energy consumption e d And time delay t of task unloading d 。
Further, the training classifier in step 2, the specific process of obtaining the user task classifier includes:
step 2-1, estimating a stable interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
in the method, in the process of the invention,sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label to the corresponding physiological data sample aiming at each physiological characteristic, wherein the method specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data samples outside the stable interval to represent an urgent task.
And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class.
Further, in step 3, the training of the resource allocation problem during task offloading by using the A3C algorithm includes the following specific steps:
step 3-1, fundingThe source allocation problem is converted into a Markov decision problem, and the Markov decision problem model, namely the decision network, specifically comprises: state S t Action a t And prize value r t ;
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
will award value r t The method comprises the following steps:
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d Respectively representing the time and total energy consumption of user d's task completion,the weight factors of time delay and energy consumption respectively meetAnd->
Step 3-2, training the decision network, specifically including: based on the determined state s t By a decision networkDetermining action a in this state t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t θ is a parameter of the decision network, E is a mean function w Is a gradient operator.
Further, in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.
A task offloading system based on an A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
the task classifier generating module is used for acquiring physiological data of a user, training the classifier according to the data and obtaining a task classifier;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;
and the task unloading module is used for carrying out task unloading of the multi-wireless body area network according to the obtained task classifier and the decision network.
Compared with the prior art, the invention has the remarkable advantages that: 1) The data characteristics in the wireless body area network and the movement characteristics of the user are comprehensively considered, so that the time delay and the energy consumption of system task unloading are reduced; 2) The A3C algorithm based on deep reinforcement learning is adopted to optimize the task unloading process of the multi-wireless body area network, and the intelligent autonomous dynamic unloading of the system can be realized under the condition that the system environment is unknown.
The invention is described in further detail below with reference to the accompanying drawings.
Drawings
Fig. 1 is a flowchart of a task offloading method based on an A3C algorithm in a multi-wireless body area network environment in one embodiment.
FIG. 2 is a flow diagram of training a task classifier in one embodiment.
Fig. 3 is a diagram of a multi-wireless body area network architecture in one embodiment.
FIG. 4 is a graph of training benefit variation for the A3C algorithm in one embodiment.
FIG. 5 is a graph of training benefit variation based on a greedy algorithm in one embodiment.
Detailed Description
In one embodiment, in conjunction with fig. 1, there is provided a task offloading method based on an A3C algorithm in a multi-wireless body area network environment, the method comprising the steps of:
and step 4, task unloading of the multi-wireless body area network is carried out according to the obtained task classifier and the decision network.
Further, in one embodiment, the network architecture of the plurality of wireless body area networks in step 1, the network parameters include a user setBase station set->RGMM movement model parameters of user, base station position l s =(x s ,y s ) Channel gain h d,s (t), data transfer Rate R d,s Task class beta d E {0,1}, task offload energy consumption e d And time delay t of task unloading d 。
Further, in one embodiment, in combination with fig. 2, training the classifier in step 2, a user task classifier is obtained, which specifically includes:
step 2-1, estimating a stable interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
in the method, in the process of the invention,sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label to the corresponding physiological data sample aiming at each physiological characteristic, wherein the method specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data samples outside the stable interval to represent an urgent task.
And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class.
Further, in one embodiment, in step 3, the resource allocation problem during task offloading is trained by using an A3C algorithm, so as to obtain a decision network, and the specific process includes:
step 3-1, converting the resource allocation problem into a Markov decision problem, wherein the Markov decision problem model, namely the decision network, specifically comprises: state S t Action a t And prize value r t ;
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
will award value r t The method comprises the following steps:
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d Respectively representing the time and total energy consumption of user d's task completion,the weight factors of time delay and energy consumption respectively meetAnd->
Step 3-2, training the decision network, specifically including: based on the determined state s t Determining action a in this state by the decision network t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t θ is a parameter of the decision network, E is a mean function w Is a gradient operator.
Further, in one embodiment, in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.
A task offloading system based on an A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
the task classifier generating module is used for acquiring physiological data of a user, training the classifier according to the data and obtaining a task classifier;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;
and the task unloading module is used for carrying out task unloading of the multi-wireless body area network according to the obtained task classifier and the decision network.
Further, in one embodiment, the task classifier generating module includes:
a stationary interval setting unit for estimating a stationary interval of each physiological feature using the t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
in the method, in the process of the invention,sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
the task labeling unit is used for adding labels for the corresponding physiological data samples aiming at each physiological characteristic, and specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data samples outside the stable interval to represent an urgent task.
The classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into the support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class.
Further, in one embodiment, the decision network generation module includes:
the decision network construction unit is configured to convert the resource allocation problem into a markov decision problem, where the markov decision problem model, i.e., the decision network specifically includes: state S t Action a t And prize value r t ;
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
will award value r t The method comprises the following steps:
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d Respectively representing the time and total energy consumption of user d's task completion,the weight factors of time delay and energy consumption respectively meetAnd->
The decision network training unit is used for training the decision network and specifically comprises the following steps: based on the determined state s t Determining action a in this state by the decision network t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t And θ is a parameter of the decision network,e is a mean function, v w Is a gradient operator.
In one embodiment, as a specific example, the present invention is further described and verified, and the specific contents include:
firstly, a multi-wireless body area network system is established according to the architecture of fig. 3, and network parameters are initialized. And then carrying out the calculation of the stable section, the addition of the data label and the training of the classifier in the step 2 according to the acquired physiological data of the human body. Training of a task offloading method based on an A3C algorithm is performed from these data sets.
Status s of task offloading problem in the embodiment according to step 3-1 described above t Action a t Prize r t Modeling, there is a more stringent requirement for latency for body area networks targeting health monitoring, so the weight factors for latency and energy consumption in step 3-1 are set toAnd then training the decision network by adopting an A3C algorithm according to the step 3-2. The parameters in the algorithm are set as follows: discount factor γ=0.99, learning rate is 0.001.
During the training phase, after each task load is completed, the state vector s of the system is calculated t Inputting the vector into a decision network, outputting the unloading method at the next moment to unload the task, feeding back the time delay and the energy consumption to the decision network in the form of rewarding values, recording the values and calculating the dominance function A(s) t ,a t ) And then updating parameters of the decision network until the average rewards converge.
Fig. 4 and 5 are graphs showing the system delay and energy consumption benefit after the conventional unloading method and the A3C-based unloading method (A3C-based Offloading and Joint Resource Allocation, AOJRA) according to the present embodiment are adopted respectively. The conventional offloading method is a greedy concept-based offloading method (Greedy Offloading and Joint Resource Allocation, GOJRA).
In fig. 4, the system benefit of the AOJRA method starts training around 0.8 in 3000 training, and rapidly improves with constant training, stabilizing around 7 at about 2000 training cycles. According to the definition of the system benefit function in step 3-1, the benefit value at 7 represents a total benefit of 7 for system latency and energy consumption relative to the SORA method. Considering that the number of system users is 20 in the embodiment, the total benefit is averagely distributed to each user to be 0.35, and compared with an SORA method, the AOJRA method of the invention averagely improves the time delay and the energy consumption performance of each user by 35%. Through similar analysis, the GOJRA method in FIG. 5 can improve the delay and energy consumption performance of each user by 29% on average compared with the SORA method.
Compared with the traditional GOJRA method, the AOJRA method can improve the time delay and the energy consumption performance of users more, not only considers the influence of channel gain when the task is unloaded, but also further considers the interference among different users when the users simultaneously perform data transmission, and can effectively avoid the increase of time delay and energy consumption caused by network congestion and base station computing resource shortage due to the fact that a large number of users select the same base station to perform data transmission in the same time.
In summary, the method reduces the time delay and the energy consumption of system task unloading under the condition of considering the data characteristics of the wireless body area network and the mobile characteristics of the user. The invention can improve the capability of the wireless body area network to serve human life more quickly, and can be widely applied to the actual application scene of the body area network such as telemedicine and health monitoring.
Claims (3)
1. The task offloading method based on the A3C algorithm in the multi-wireless body area network environment is characterized by comprising the following steps:
step 1, constructing a network architecture of a plurality of wireless body area networks, and initializing network parameters; the network architecture of the wireless body area networks comprises a user setBase station set->RGMM movement model parameters of user, base station position l s =(x s ,y s ) Channel gain h d,s (t), data transfer Rate R d,s Task class beta d E {0,1}, task offload energy consumption e d And time delay t of task unloading d ;
Step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier; the training classifier is used for obtaining a task classifier, and the specific process comprises the following steps:
step 2-1, estimating a stable interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
in the method, in the process of the invention,sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label to the corresponding physiological data sample aiming at each physiological characteristic, wherein the method specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; adding a label 1 to the physiological data sample outside the stable interval to represent an urgent task;
step 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task class;
step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network; the specific process comprises the following steps:
step 3-1, converting the resource allocation problem into a Markov decision problem, wherein the Markov decision problem model, namely the decision network, specifically comprises: state S t Action a t And prize value r t ;
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
will award value r t The method comprises the following steps:
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d The task offloading latency and task offloading energy consumption of user d are represented respectively,the weight factors of time delay and energy consumption respectively meetAnd->
Step 3-2, training the decision network, specifically including: based on the determined state s t Determining action a in this state by the decision network t I.e. the base station to which each user should access and the computing resources allocated by the base station, and then enter a new state to obtain rewards r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;for the Q-value function under decision offloading method, < +.>For decision offloading method lower value function, +.>The dominant function is used for deciding and unloading the method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t And θ is a parameter of the decision network,as a mean function>As a gradient operator, J (theta) is a cost function;
and step 4, task unloading of the multi-wireless body area network is carried out according to the obtained task classifier and the decision network.
2. The task offloading method based on an A3C algorithm in a multi-wireless body area network environment according to claim 1, wherein in step 4, task offloading of the multi-wireless body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: at each moment, the trained task classifier is utilized to classify tasks, and then the state of the multi-body area network system is input into a decision network according to classification results, and the network outputs the results of user channel access base station and base station computing resource allocation.
3. A task offloading system based on an A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters; the network architecture of the wireless body area networks comprises a user setBase station setRGMM movement model parameters of user, base station position l s =(x s ,y s ) Channel gain h d,s (t), data transfer Rate R d,s Task class beta d E {0,1}, task offload energy consumption e d And time delay t of task unloading d ;
The task classifier generating module is used for acquiring physiological data of a user, training the classifier according to the data and obtaining a task classifier; the task classifier generation module includes:
a stationary interval setting unit for estimating a stationary interval of each physiological feature using the t-distribution; for a certain physiological characteristic x, the upper limit x of the plateau section thereof up And a lower limit x low The method comprises the following steps of:
in the method, in the process of the invention,sum s x Respectively, the mean value and standard deviation corresponding to x, the number of physiological data samples corresponding to the physiological characteristic x is n, and t α,n-1 Representing the t-distribution coefficient when the sample size is n;
the task labeling unit is used for adding labels for the corresponding physiological data samples aiming at each physiological characteristic, and specifically comprises the following steps: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; adding a label 1 to the physiological data sample outside the stable interval to represent an urgent task;
the classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into the support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting task categories of the data;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network; the decision network generation module comprises:
the decision network construction unit is configured to convert the resource allocation problem into a markov decision problem, where the markov decision problem model, i.e., the decision network specifically includes: state S t Action a t And prize value r t ;
State S t Set to { b d (t),β d (t),l d (t),E d (t) } wherein the first two items b d (t)、β d (t) two quantities related to task data, which respectively represent the data quantity of the task and a task category mark; third item l d (t) is the location status of user d; fourth item E d (t) is an energy state;
action a t Set to alpha d,s E {0,1} and f d,s ,α d,s Indicating whether or not to offload the task of user d to base station s, f d,s Indicating the computing resources allocated to user d by base station s,
will award value r t The method comprises the following steps:
wherein K is d To be systematic benefit, t static And e static Respectively representing time delay and energy consumption under a static allocation method, t d And e d Respectively representing the time and total energy consumption of user d's task completion,the weight factors of time delay and energy consumption respectively meetAnd->
The decision network training unit is used for training the decision network and specifically comprises the following steps: based on the determined state s t Determining action a in this state by the decision network t I.e. eachThe base station to which the user should access and the computing resources allocated by the base station, and then enter a new state to obtain the prize r t Obtaining an empirical sequence (s t ,a t ,r t ) Defining a dominance function A (s t ,a t ) Representing state s t Lower motion a t Is an advantage of the following:
wherein Q(s) t ,a t ) As a function of Q value, V (s t ) As a function of value, gamma is a discount factor, pi ω A decision-making offloading method;for the Q-value function under decision offloading method, < +.>For decision offloading method lower value function, +.>The dominant function is used for deciding and unloading the method;
iteratively updating the decision network parameters until the rewarding function of the decision network converges, wherein an iterative updating formula is as follows:
in the formula, pi w (s t ,a t ) Represented in state s t Lower selection action a t And θ is a parameter of the decision network,as a mean function>As a gradient operator, J (theta) is a cost function;
and the task unloading module is used for carrying out task unloading of the multi-wireless body area network according to the obtained task classifier and the decision network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010221507.5A CN111465032B (en) | 2020-03-26 | 2020-03-26 | Task unloading method and system based on A3C algorithm in multi-wireless body area network environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010221507.5A CN111465032B (en) | 2020-03-26 | 2020-03-26 | Task unloading method and system based on A3C algorithm in multi-wireless body area network environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111465032A CN111465032A (en) | 2020-07-28 |
CN111465032B true CN111465032B (en) | 2023-04-21 |
Family
ID=71680230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010221507.5A Active CN111465032B (en) | 2020-03-26 | 2020-03-26 | Task unloading method and system based on A3C algorithm in multi-wireless body area network environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111465032B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112241295A (en) * | 2020-10-28 | 2021-01-19 | 深圳供电局有限公司 | Cloud edge cooperative computing unloading method and system based on deep reinforcement learning |
CN113645637B (en) * | 2021-07-12 | 2022-09-16 | 中山大学 | Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110557769A (en) * | 2019-09-12 | 2019-12-10 | 南京邮电大学 | C-RAN calculation unloading and resource allocation method based on deep reinforcement learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109219101B (en) * | 2018-09-21 | 2021-09-10 | 南京理工大学 | Route establishing method based on quadratic moving average prediction method in wireless body area network |
-
2020
- 2020-03-26 CN CN202010221507.5A patent/CN111465032B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110557769A (en) * | 2019-09-12 | 2019-12-10 | 南京邮电大学 | C-RAN calculation unloading and resource allocation method based on deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN111465032A (en) | 2020-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Edge intelligence: Empowering intelligence to the edge of network | |
US10802992B2 (en) | Combining CPU and special accelerator for implementing an artificial neural network | |
CN111465032B (en) | Task unloading method and system based on A3C algorithm in multi-wireless body area network environment | |
Imteaj et al. | Federated learning for resource-constrained iot devices: Panoramas and state of the art | |
US8473432B2 (en) | Issue resolution in expert networks | |
CN114219097B (en) | Federal learning training and predicting method and system based on heterogeneous resources | |
Hung | Adaptive Fuzzy-GARCH model applied to forecasting the volatility of stock markets using particle swarm optimization | |
CN113284142B (en) | Image detection method, image detection device, computer-readable storage medium and computer equipment | |
CN111026548A (en) | Power communication equipment test resource scheduling method for reverse deep reinforcement learning | |
CN111079780A (en) | Training method of space map convolution network, electronic device and storage medium | |
CN112835715B (en) | Method and device for determining task unloading strategy of unmanned aerial vehicle based on reinforcement learning | |
CN106793031A (en) | Based on the smart mobile phone energy consumption optimization method for gathering competing excellent algorithm | |
CN115809147B (en) | Multi-edge collaborative cache scheduling optimization method, system and model training method | |
CN111291618A (en) | Labeling method, device, server and storage medium | |
Bebortta et al. | DeepMist: Toward deep learning assisted mist computing framework for managing healthcare big data | |
Lei et al. | An improved variable neighborhood search for parallel drone scheduling traveling salesman problem | |
CN114595396A (en) | Sequence recommendation method and system based on federal learning | |
Abbas et al. | Meta-heuristic-based offloading task optimization in mobile edge computing | |
Chen et al. | One for all: Traffic prediction at heterogeneous 5g edge with data-efficient transfer learning | |
CN110175708B (en) | Model and method for predicting food materials in online increment mode | |
CN109754075B (en) | Scheduling method, device, storage medium and device for wireless sensor network node | |
CN111401551A (en) | Weak supervision self-learning method based on reinforcement learning | |
US12002202B2 (en) | Meta-learning for cardiac MRI segmentation | |
Jaiswal et al. | Analyze Classification Act of Data Mining Schemes | |
Zhou et al. | Computing Offloading Based on TD3 Algorithm in Cache-Assisted Vehicular NOMA–MEC Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |