CN111465032A - Task unloading method and system based on A3C algorithm in multi-wireless body area network environment - Google Patents

Task unloading method and system based on A3C algorithm in multi-wireless body area network environment Download PDF

Info

Publication number
CN111465032A
CN111465032A CN202010221507.5A CN202010221507A CN111465032A CN 111465032 A CN111465032 A CN 111465032A CN 202010221507 A CN202010221507 A CN 202010221507A CN 111465032 A CN111465032 A CN 111465032A
Authority
CN
China
Prior art keywords
task
network
classifier
body area
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010221507.5A
Other languages
Chinese (zh)
Other versions
CN111465032B (en
Inventor
王力立
张戈
奚思遥
肖强
黄成�
单梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202010221507.5A priority Critical patent/CN111465032B/en
Publication of CN111465032A publication Critical patent/CN111465032A/en
Application granted granted Critical
Publication of CN111465032B publication Critical patent/CN111465032B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • H04W52/0212Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave
    • H04W52/0216Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave using a pre-established activity schedule, e.g. traffic indication frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a task unloading method and a task unloading system based on an A3C algorithm in a multi-wireless body area network environment. The method comprises the following steps: determining a network architecture of a multi-wireless body area network, and initializing network parameters; training a task classifier by using the sampled physiological data to obtain a stable classifier model; training the network resource allocation problem by adopting an A3C algorithm based on deep reinforcement learning to obtain a convergent decision network; and (3) task unloading according to the obtained model: and at each moment, firstly, carrying out task classification by using a classifier model, and then carrying out user channel access and edge server computing resource allocation according to a decision network. The method improves the time delay and energy consumption performance of multi-wireless body area network task unloading, and can be widely applied to the practical application scenes of body area networks such as remote medical treatment, health monitoring and the like.

Description

Task unloading method and system based on A3C algorithm in multi-wireless body area network environment
Technical Field
The invention belongs to the field of wireless communication networks, and particularly relates to a task unloading method and system based on an A3C algorithm in a multi-wireless body area network environment.
Background
The wireless body area network is a wireless sensor network taking a human body as a monitoring object. Because the human body has mobility, internetwork interference is more easily generated among a plurality of body area networks, and how to collect and manage data among the plurality of networks is an important direction for researching the body area networks. The current research shows that the body area network has the characteristics of mobility, intensive calculation, low time delay and the like, and the task unloading can be completed by edge calculation in an auxiliary way, namely, base stations equipped with edge servers are placed at the edges of a plurality of networks to perform unified collection and processing of tasks. Because the specific body area network of the monitored object has stricter requirements on time delay and energy consumption, a reasonable task unloading method must be designed to ensure low time delay and low energy consumption of data transmission.
In the existing research on data transmission between a multi-body area network and a data center, most algorithm research is based on a generalized communication network, and no attempt is made to carry out targeted research by combining the data characteristics and the user characteristics of the body area network. In fact, however, the physiological data monitored by the body area network has very important practical significance, and the movement track of the body area network user has characteristics of the body area network user. The existing unloading method does not consider the characteristics, so that the strict requirements of time delay and energy consumption of the wireless body area network cannot be met.
Disclosure of Invention
The invention aims to provide a task unloading method and a task unloading system in a multi-wireless body area network environment, so that the task state and the moving characteristic of a user can be fully considered when the system unloads tasks, and the aim of achieving smaller system time delay and energy consumption is fulfilled.
The technical solution for realizing the purpose of the invention is as follows: a method for task offloading based on A3C algorithm in a multi-wireless body area network environment, the method comprising the steps of:
step 1, constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier;
step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network;
and 4, unloading the tasks of the multi-wireless body area network according to the obtained task classifier and the decision network.
Further, in the network architecture of the multiple wireless body area networks in step 1, the network parameters include a user set
Figure BDA0002426245780000011
Base station set
Figure BDA0002426245780000012
RGMM mobility model parameters of the subscriber, base station location ls=(xs,ys) Channel gain hd,s(t) data transfer rate Rd,sTask category βd∈ {0,1}, task offload energy consumption edAnd task offload delay td
Further, the training classifier in step 2 obtains a user task classifier, and the specific process includes:
step 2-1, estimating a stationary interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the stationary intervalupAnd a lower limit xlowRespectively as follows:
Figure BDA0002426245780000021
Figure BDA0002426245780000022
in the formula (I), the compound is shown in the specification,
Figure BDA0002426245780000023
and sxRespectively, the mean value and the standard deviation corresponding to x, n is the number of the physiological data samples corresponding to the physiological characteristic x, tα,n-1Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label for each physiological characteristic corresponding to the physiological characteristic, specifically comprising: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data sample outside the stable interval to represent an emergency task.
And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task type of the data.
Further, in step 3, the resource allocation problem during task offloading is trained by using an A3C algorithm, and the specific process includes:
step 3-1, the resource allocation problem is converted into a Markov decision problem, and a Markov decision problem model, namely a decision network, specifically comprises the following steps: state StAnd action atAnd a prize value rt
Will state StIs set as { bd(t),βd(t),ld(t),Ed(t) }, in which the first two terms bd(t)、βd(t) two quantities related to the task data, respectively representing the data quantity of the task and the task category flag; third item ld(t) is the location status of user d; fourth item Ed(t) is an energy state;
will act atIs arranged as αd,s∈ {0,1} and fd,s,αd,sIndicating whether to offload the task of user d to base station s, fd,sRepresenting the computational resources allocated by base station s to user d,
Figure BDA0002426245780000025
will award the value rtThe method comprises the following steps:
Figure BDA0002426245780000024
in the formula, KdFor the benefit of the system, tstaticAnd estaticRespectively representing the time delay and energy consumption under the static allocation method, tdAnd edRespectively representing the time for user d to complete the task and the total energy consumption,
Figure BDA0002426245780000031
weight factors of time delay and energy consumption respectively
Figure BDA0002426245780000032
And is
Figure BDA0002426245780000033
Step 3-2, training the decision network, specifically comprising: according to a determined state stDetermining the action a in this state by the decision networktI.e. the base station to which each user should access and the calculation resources allocated by the base station, and then enter a new state to obtain the reward rtObtaining an empirical sequence(s)t,at,rt) Defining the dominance function A(s)t,at) Represents a state stLower motion atThe degree of superiority of (c):
Figure BDA0002426245780000034
wherein Q(s)t,at) As a function of Q value, V(s)t) As a function of value, gamma is a discount factor, piωTo decide to offload a method;
iteratively updating the decision network parameters until a reward function of the decision network converges, wherein an iterative updating formula is as follows:
Figure BDA0002426245780000035
in the formula, piw(st,at) Is shown in state stLower selection action atTheta is a parameter of the decision network, E is a mean function, ▽wIs a gradient operator.
Further, in step 4, the task offloading of the multi-radio body area network is performed according to the obtained task classifier and the decision network, and the specific process includes: and at each moment, carrying out task classification by using the trained task classifier, inputting the state of the multi-body-area network system into a decision network according to a classification result, and outputting the results of the user channel access base station and the base station computing resource allocation by the network.
A task offloading system based on A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
the task classifier generating module is used for acquiring physiological data of the user and training a classifier according to the data to obtain a task classifier;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;
and the task unloading module is used for unloading the tasks of the multi-wireless body area network according to the obtained task classifier and the decision network.
Compared with the prior art, the invention has the following remarkable advantages: 1) the data characteristics in the wireless body area network and the mobile characteristics of users are comprehensively considered, and the time delay and the energy consumption of system task unloading are reduced; 2) the A3C algorithm based on deep reinforcement learning is adopted to optimize the task unloading process of the multi-wireless body area network, and intelligent and autonomous dynamic unloading of the system can be realized under the condition that the system environment is unknown.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
Fig. 1 is a flow diagram of a method for task offloading based on the A3C algorithm in a multi-wireless body area network environment, under an embodiment.
FIG. 2 is a flow diagram of training a task classifier in one embodiment.
Figure 3 is a diagram of a multi-wireless body area network architecture in one embodiment.
FIG. 4 is a graph of training benefit variation of the A3C algorithm in one embodiment.
FIG. 5 is a graph of variation in training benefit based on a greedy algorithm in one embodiment.
Detailed Description
In one embodiment, in conjunction with fig. 1, there is provided a method for task offloading based on A3C algorithm in a multi-wireless body area network environment, the method comprising the steps of:
step 1, constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier;
step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network;
and 4, unloading the tasks of the multi-wireless body area network according to the obtained task classifier and the decision network.
Further, in one embodiment, the network architecture of the plurality of wireless body area networks in step 1, the network parameters of which include the user set
Figure BDA0002426245780000043
Base station set
Figure BDA0002426245780000044
RGMM mobility model parameters of the subscriber, base station location ls=(xs,ys) Channel gain hd,s(t) data transfer rate Rd,sTask category βd∈ {0,1}, task offload energy consumption edAnd task offload delay td
Further, in one embodiment, with reference to fig. 2, the training of the classifier in step 2 to obtain the user task classifier includes:
step 2-1, estimating the average of each physiological characteristic by using t-distributionA stable interval; for a certain physiological characteristic x, the upper limit x of the stationary intervalupAnd a lower limit xlowRespectively as follows:
Figure BDA0002426245780000041
Figure BDA0002426245780000042
in the formula (I), the compound is shown in the specification,
Figure BDA0002426245780000051
and sxRespectively, the mean value and the standard deviation corresponding to x, n is the number of the physiological data samples corresponding to the physiological characteristic x, tα,n-1Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label for each physiological characteristic corresponding to the physiological characteristic, specifically comprising: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data sample outside the stable interval to represent an emergency task.
And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task type of the data.
Further, in one embodiment, the resource allocation problem during task offloading is trained by using an A3C algorithm in step 3 to obtain a decision network, and the specific process includes:
step 3-1, the resource allocation problem is converted into a Markov decision problem, and a Markov decision problem model, namely a decision network, specifically comprises the following steps: state StAnd action atAnd a prize value rt
Will state StIs set as { bd(t),βd(t),ld(t),Ed(t) }, in which the first two terms bd(t)、βd(t) two quantities related to the task data, respectively representing the data quantity of the task and the task category flag; third item ld(t) isThe location status of user d; fourth item Ed(t) is an energy state;
will act atIs arranged as αd,s∈ {0,1} and fd,s,αd,sIndicating whether to offload the task of user d to base station s, fd,sRepresenting the computational resources allocated by base station s to user d,
Figure BDA0002426245780000056
will award the value rtThe method comprises the following steps:
Figure BDA0002426245780000052
in the formula, KdFor the benefit of the system, tstaticAnd estaticRespectively representing the time delay and energy consumption under the static allocation method, tdAnd edRespectively representing the time for user d to complete the task and the total energy consumption,
Figure BDA0002426245780000053
weight factors of time delay and energy consumption respectively
Figure BDA0002426245780000054
And is
Figure BDA0002426245780000055
Step 3-2, training the decision network, specifically comprising: according to a determined state stDetermining the action a in this state by the decision networktI.e. the base station to which each user should access and the calculation resources allocated by the base station, and then enter a new state to obtain the reward rtObtaining an empirical sequence(s)t,at,rt) Defining the dominance function A(s)t,at) Represents a state stLower motion atThe degree of superiority of (c):
Figure BDA0002426245780000061
wherein Q(s)t,at) As a function of Q value, V(s)t) As a function of value, gamma is a discount factor, piωTo decide to offload a method;
iteratively updating the decision network parameters until a reward function of the decision network converges, wherein an iterative updating formula is as follows:
Figure BDA0002426245780000062
in the formula, piw(st,at) Is shown in state stLower selection action atTheta is a parameter of the decision network, E is a mean function, ▽wIs a gradient operator.
Further, in one embodiment, the task offloading of the multi-radio body area network is performed according to the obtained task classifier and the decision network in step 4, and the specific process includes: and at each moment, carrying out task classification by using the trained task classifier, inputting the state of the multi-body-area network system into a decision network according to a classification result, and outputting the results of the user channel access base station and the base station computing resource allocation by the network.
A task offloading system based on A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
the task classifier generating module is used for acquiring physiological data of the user and training a classifier according to the data to obtain a task classifier;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;
and the task unloading module is used for unloading the tasks of the multi-wireless body area network according to the obtained task classifier and the decision network.
Further, in one embodiment, the task classifier generating module includes:
a plateau region setting unit forEstimating a stationary interval of each physiological characteristic by using the t-distribution; for a certain physiological characteristic x, the upper limit x of the stationary intervalupAnd a lower limit xlowRespectively as follows:
Figure BDA0002426245780000063
Figure BDA0002426245780000064
in the formula (I), the compound is shown in the specification,
Figure BDA0002426245780000065
and sxRespectively, the mean value and the standard deviation corresponding to x, n is the number of the physiological data samples corresponding to the physiological characteristic x, tα,n-1Representing the t-distribution coefficient when the sample size is n;
the task labeling unit is used for adding a label to the corresponding physiological data sample according to each physiological characteristic, and specifically comprises: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data sample outside the stable interval to represent an emergency task.
And the classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task type of the data.
Further, in one embodiment, the decision network generating module includes:
a decision network construction unit, configured to convert the resource allocation problem into a markov decision problem, where the markov decision problem model, i.e., the decision network specifically includes: state StAnd action atAnd a prize value rt
Will state StIs set as { bd(t),βd(t),ld(t),Ed(t) }, in which the first two terms bd(t)、βd(t) two quantities related to the task data, respectively representing the data quantity of the task and the task category flag;third item ld(t) is the location status of user d; fourth item Ed(t) is an energy state;
will act atIs arranged as αd,s∈ {0,1} and fd,s,αd,sIndicating whether to offload the task of user d to base station s, fd,sRepresenting the computational resources allocated by base station s to user d,
Figure BDA0002426245780000076
will award the value rtThe method comprises the following steps:
Figure BDA0002426245780000071
in the formula, KdFor the benefit of the system, tstaticAnd estaticRespectively representing the time delay and energy consumption under the static allocation method, tdAnd edRespectively representing the time for user d to complete the task and the total energy consumption,
Figure BDA0002426245780000072
weight factors of time delay and energy consumption respectively
Figure BDA0002426245780000073
And is
Figure BDA0002426245780000074
The decision network training unit is used for training a decision network, and specifically comprises: according to a determined state stDetermining the action a in this state by the decision networktI.e. the base station to which each user should access and the calculation resources allocated by the base station, and then enter a new state to obtain the reward rtObtaining an empirical sequence(s)t,at,rt) Defining the dominance function A(s)t,at) Represents a state stLower motion atThe degree of superiority of (c):
Figure BDA0002426245780000075
wherein Q(s)t,at) As a function of Q value, V(s)t) As a function of value, gamma is a discount factor, piωTo decide to offload a method;
iteratively updating the decision network parameters until a reward function of the decision network converges, wherein an iterative updating formula is as follows:
Figure BDA0002426245780000081
in the formula, piw(st,at) Is shown in state stLower selection action atTheta is a parameter of the decision network, E is a mean function, ▽wIs a gradient operator.
In one embodiment, as a specific example, the present invention is further explained and verified, and the specific contents include:
firstly, a multi-wireless body area network system is established according to the architecture of fig. 3, and initialization of network parameters is carried out. And then, according to the collected human physiological data, performing the calculation of the stable interval, the addition of the data label and the training of the classifier in the step 2. From these data sets, training of the task off-loading method based on the A3C algorithm was performed.
According to the step 3-1, the state s of the task unloading problem in the embodiment is obtainedtAnd action atPrize rtModeling is carried out, and the time delay has more severe requirements for the body area network taking health monitoring as the target, so the weight factors of the time delay and the energy consumption in the step 3-1 are set as
Figure BDA0002426245780000082
The decision network is then trained according to step 3-2 using the A3C algorithm. Parameters in the algorithm are set as: the discount factor γ is 0.99, and the learning rate is 0.001.
In the training phase, after each task unloading is finished, the state vector s of the system is calculatedtInputting the vector into decision network, outputting the unloading method at next moment to unload task, delaying timeAnd energy consumption is fed back to the decision network in the form of reward values, these values are recorded and the dominance function A(s) is calculatedt,at) And then updating the parameters of the decision network until the average reward converges.
Fig. 4 and fig. 5 are graphs of system delay and energy consumption benefit changes after the present embodiment respectively adopts the conventional unloading method and the unloading method based on A3C (A3C-based unloading and Joint Resource Allocation, AOJRA). The traditional unloading method is an unloading method based on Greedy thought (GOJRA).
In fig. 4, the system benefit of the AOJRA method is around 0.8 when training starts in 3000 training cycles, and rapidly increases under continuous training, and stabilizes around 7 at about 2000 training cycles. According to the definition of the system benefit function in step 3-1, a benefit value of 7 indicates that the total benefit of system delay and energy consumption is 7 with respect to the SORA method. Considering that the number of system users in the embodiment is 20, the total benefit is averagely allocated to each user and is 0.35, which means that compared with the SORA method, the AOJRA method of the present invention averagely improves the delay and energy consumption performance of each user by 35%. Through similar analysis, the GOJRA method in fig. 5 can improve the delay and power consumption performance of each user by 29% on average compared with the SORA method.
Compared with the traditional GOJRA method, the AOJRA method can improve the time delay and energy consumption performance of users more, not only considers the influence of channel gain during task unloading, but also further considers the mutual interference of different users during data transmission at the same time, and can effectively avoid the network congestion caused by the fact that a large number of users select the same base station to perform data transmission in the same time and the time delay and energy consumption increase caused by the shortage of base station computing resources.
In conclusion, the method reduces the time delay and energy consumption of system task unloading under the condition of considering the data characteristics of the wireless body area network and the mobile characteristics of the user. The invention can improve the capability of the wireless body area network to more rapidly serve human life, and can be widely applied to the practical application scenes of the body area network such as remote medical treatment, health monitoring and the like.

Claims (8)

1. A task unloading method based on A3C algorithm in multi-wireless body area network environment is characterized by comprising the following steps:
step 1, constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
step 2, collecting physiological data of a user, training a classifier according to the data, and obtaining a task classifier;
step 3, training the resource allocation problem during task unloading by using an A3C algorithm to obtain a decision network;
and 4, unloading the tasks of the multi-wireless body area network according to the obtained task classifier and the decision network.
2. The method of claim 1, wherein the network parameters of the network architecture of the plurality of wireless body area networks of step 1 include a set of users
Figure FDA0002426245770000011
Base station set
Figure FDA0002426245770000012
RGMM mobility model parameters of the subscriber, base station location ls=(xs,ys) Channel gain hd,s(t) data transfer rate Rd,sTask category βd∈ {0,1}, task offload energy consumption edAnd task offload delay td
3. The method for task offloading based on A3C algorithm in a multi-wireless body area network environment according to claim 1 or 2, wherein the step 2 of training the classifier to obtain the user task classifier comprises the following specific steps:
step 2-1, estimating a stationary interval of each physiological characteristic by using t-distribution; for a certain physiological characteristic x, the upper limit x of the stationary intervalupAnd a lower limit xlowRespectively as follows:
Figure FDA0002426245770000013
Figure FDA0002426245770000014
in the formula (I), the compound is shown in the specification,
Figure FDA0002426245770000015
and sxRespectively, the mean value and the standard deviation corresponding to x, n is the number of the physiological data samples corresponding to the physiological characteristic x, tα,n-1Representing the t-distribution coefficient when the sample size is n;
step 2-2, adding a label for each physiological characteristic corresponding to the physiological characteristic, specifically comprising: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data sample outside the stable interval to represent an emergency task.
And 2-3, inputting the physiological data sample processed in the step 2-2 into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task type of the data.
4. The method for task offloading based on A3C algorithm in a multi-radio body area network environment according to claim 3, wherein the step 3 trains a resource allocation problem during task offloading by using an A3C algorithm to obtain a decision network, and the specific process includes:
step 3-1, the resource allocation problem is converted into a Markov decision problem, and a Markov decision problem model, namely a decision network, specifically comprises the following steps: state StAnd action atAnd a prize value rt
Will state StIs set as { bd(t),βd(t),ld(t),Ed(t) }, in which the first two terms bd(t)、βd(t) two quantities related to task data, each representing data of a taskVolume and task category flags; third item ld(t) is the location status of user d; fourth item Ed(t) is an energy state;
will act atIs arranged as αd,s∈ {0,1} and fd,s,αd,sIndicating whether to offload the task of user d to base station s, fd,sRepresenting the computational resources allocated by base station s to user d,
Figure FDA0002426245770000021
will award the value rtThe method comprises the following steps:
Figure FDA0002426245770000022
in the formula, KdFor the benefit of the system, tstaticAnd estaticRespectively representing the time delay and energy consumption under the static allocation method, tdAnd edRespectively representing the time for user d to complete the task and the total energy consumption,
Figure FDA0002426245770000023
weight factors of time delay and energy consumption respectively
Figure FDA0002426245770000024
And is
Figure FDA0002426245770000025
Step 3-2, training the decision network, specifically comprising: according to a determined state stDetermining the action a in this state by the decision networktI.e. the base station to which each user should access and the calculation resources allocated by the base station, and then enter a new state to obtain the reward rtObtaining an empirical sequence(s)t,at,rt) Defining the dominance function A(s)t,at) Represents a state stLower motion atThe degree of superiority of (c):
Figure FDA0002426245770000026
wherein Q(s)t,at) As a function of Q value, V(s)t) As a function of value, gamma is a discount factor, piωTo decide to offload a method;
iteratively updating the decision network parameters until a reward function of the decision network converges, wherein an iterative updating formula is as follows:
Figure FDA0002426245770000027
in the formula, piw(st,at) Is shown in state stLower selection action atTheta is a parameter of the decision network, E is a mean function,
Figure FDA0002426245770000028
is a gradient operator.
5. The method for task offloading based on A3C algorithm in an environment of multiple wireless body area networks according to claim 4, wherein the step 4 of task offloading of multiple wireless body area networks according to the obtained task classifier and decision network comprises: and at each moment, carrying out task classification by using the trained task classifier, inputting the state of the multi-body-area network system into a decision network according to a classification result, and outputting the results of the user channel access base station and the base station computing resource allocation by the network.
6. A task offloading system based on A3C algorithm in a multi-wireless body area network environment, the system comprising:
the network construction module is used for constructing network architectures of a plurality of wireless body area networks and initializing network parameters;
the task classifier generating module is used for acquiring physiological data of the user and training a classifier according to the data to obtain a task classifier;
the decision network generation module is used for training the resource allocation problem during task unloading by utilizing an A3C algorithm to obtain a decision network;
and the task unloading module is used for unloading the tasks of the multi-wireless body area network according to the obtained task classifier and the decision network.
7. The system of claim 6, wherein the task classifier generation module comprises:
a stationary interval setting unit for estimating a stationary interval of each physiological characteristic using the t-distribution; for a certain physiological characteristic x, the upper limit x of the stationary intervalupAnd a lower limit xlowRespectively as follows:
Figure FDA0002426245770000031
Figure FDA0002426245770000032
in the formula (I), the compound is shown in the specification,
Figure FDA0002426245770000033
and sxRespectively, the mean value and the standard deviation corresponding to x, n is the number of the physiological data samples corresponding to the physiological characteristic x, tα,n-1Representing the t-distribution coefficient when the sample size is n;
the task labeling unit is used for adding a label to the corresponding physiological data sample according to each physiological characteristic, and specifically comprises: adding a label 0 to the physiological data sample in the stable interval to represent a normal task; and adding a label 1 to the physiological data sample outside the stable interval to represent an emergency task.
And the classifier training unit is used for inputting the physiological data samples processed by the task labeling unit into a support vector machine classifier for training to obtain a task classifier, namely inputting one type of data and outputting the task type of the data.
8. The system for task offloading based on the A3C algorithm in a multi-wireless body area network environment of claim 7, wherein the decision network generation module comprises:
a decision network construction unit, configured to convert the resource allocation problem into a markov decision problem, where the markov decision problem model, i.e., the decision network specifically includes: state StAnd action atAnd a prize value rt
Will state StIs set as { bd(t),βd(t),ld(t),Ed(t) }, in which the first two terms bd(t)、βd(t) two quantities related to the task data, respectively representing the data quantity of the task and the task category flag; third item ld(t) is the location status of user d; fourth item Ed(t) is an energy state;
will act atIs arranged as αd,s∈ {0,1} and fd,s,αd,sIndicating whether to offload the task of user d to base station s, fd,sRepresenting the computational resources allocated by base station s to user d,
Figure FDA0002426245770000041
will award the value rtThe method comprises the following steps:
Figure FDA0002426245770000042
in the formula, KdFor the benefit of the system, tstaticAnd estaticRespectively representing the time delay and energy consumption under the static allocation method, tdAnd edRespectively representing the time for user d to complete the task and the total energy consumption,
Figure FDA0002426245770000043
weight factors of time delay and energy consumption respectively
Figure FDA0002426245770000044
And is
Figure FDA0002426245770000045
The decision network training unit is used for training a decision network, and specifically comprises: according to a determined state stDetermining the action a in this state by the decision networktI.e. the base station to which each user should access and the calculation resources allocated by the base station, and then enter a new state to obtain the reward rtObtaining an empirical sequence(s)t,at,rt) Defining the dominance function A(s)t,at) Represents a state stLower motion atThe degree of superiority of (c):
Figure FDA0002426245770000046
wherein Q(s)t,at) As a function of Q value, V(s)t) As a function of value, gamma is a discount factor, piωTo decide to offload a method;
iteratively updating the decision network parameters until a reward function of the decision network converges, wherein an iterative updating formula is as follows:
Figure FDA0002426245770000047
in the formula, piw(st,at) Is shown in state stLower selection action atTheta is a parameter of the decision network, E is a mean function,
Figure FDA0002426245770000048
is a gradient operator.
CN202010221507.5A 2020-03-26 2020-03-26 Task unloading method and system based on A3C algorithm in multi-wireless body area network environment Active CN111465032B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010221507.5A CN111465032B (en) 2020-03-26 2020-03-26 Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010221507.5A CN111465032B (en) 2020-03-26 2020-03-26 Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Publications (2)

Publication Number Publication Date
CN111465032A true CN111465032A (en) 2020-07-28
CN111465032B CN111465032B (en) 2023-04-21

Family

ID=71680230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010221507.5A Active CN111465032B (en) 2020-03-26 2020-03-26 Task unloading method and system based on A3C algorithm in multi-wireless body area network environment

Country Status (1)

Country Link
CN (1) CN111465032B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241295A (en) * 2020-10-28 2021-01-19 深圳供电局有限公司 Cloud edge cooperative computing unloading method and system based on deep reinforcement learning
CN113645637A (en) * 2021-07-12 2021-11-12 中山大学 Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109219101A (en) * 2018-09-21 2019-01-15 南京理工大学 Method for routing foundation based on Double moving average predicted method in wireless body area network
CN110557769A (en) * 2019-09-12 2019-12-10 南京邮电大学 C-RAN calculation unloading and resource allocation method based on deep reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109219101A (en) * 2018-09-21 2019-01-15 南京理工大学 Method for routing foundation based on Double moving average predicted method in wireless body area network
CN110557769A (en) * 2019-09-12 2019-12-10 南京邮电大学 C-RAN calculation unloading and resource allocation method based on deep reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241295A (en) * 2020-10-28 2021-01-19 深圳供电局有限公司 Cloud edge cooperative computing unloading method and system based on deep reinforcement learning
CN113645637A (en) * 2021-07-12 2021-11-12 中山大学 Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111465032B (en) 2023-04-21

Similar Documents

Publication Publication Date Title
Yu et al. Computation offloading for mobile edge computing: A deep learning approach
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN109302463B (en) Self-organizing cloud architecture and optimization method and system for edge computing
CN112995913B (en) Unmanned aerial vehicle track, user association and resource allocation joint optimization method
CN110928654B (en) Distributed online task unloading scheduling method in edge computing system
CN108924936B (en) Resource allocation method of unmanned aerial vehicle-assisted wireless charging edge computing network
CN107682443A (en) Joint considers the efficient discharging method of the mobile edge calculations system-computed task of delay and energy expenditure
CN114219097B (en) Federal learning training and predicting method and system based on heterogeneous resources
CN113286329B (en) Communication and computing resource joint optimization method based on mobile edge computing
CN112835715B (en) Method and device for determining task unloading strategy of unmanned aerial vehicle based on reinforcement learning
CN114065963A (en) Computing task unloading method based on deep reinforcement learning in power Internet of things
CN111465032A (en) Task unloading method and system based on A3C algorithm in multi-wireless body area network environment
Zhou et al. Computation bits maximization in UAV-assisted MEC networks with fairness constraint
CN113286317B (en) Task scheduling method based on wireless energy supply edge network
CN112988285B (en) Task unloading method and device, electronic equipment and storage medium
Muslim et al. Reinforcement learning based offloading framework for computation service in the edge cloud and core cloud
CN106793031A (en) Based on the smart mobile phone energy consumption optimization method for gathering competing excellent algorithm
CN111026548A (en) Power communication equipment test resource scheduling method for reverse deep reinforcement learning
CN114567895A (en) Method for realizing intelligent cooperation strategy of MEC server cluster
WO2022242468A1 (en) Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium
CN115473896A (en) Electric power internet of things unloading strategy and resource configuration optimization method based on DQN algorithm
Chen et al. Augmented deep reinforcement learning for online energy minimization of wireless powered mobile edge computing
Bouzidi et al. HADAS: Hardware-aware dynamic neural architecture search for edge performance scaling
CN113821346B (en) Edge computing unloading and resource management method based on deep reinforcement learning
Chen et al. Traffic prediction-assisted federated deep reinforcement learning for service migration in digital twins-enabled MEC networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant