CN112822701A - Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene - Google Patents
Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene Download PDFInfo
- Publication number
- CN112822701A CN112822701A CN202011638611.0A CN202011638611A CN112822701A CN 112822701 A CN112822701 A CN 112822701A CN 202011638611 A CN202011638611 A CN 202011638611A CN 112822701 A CN112822701 A CN 112822701A
- Authority
- CN
- China
- Prior art keywords
- neural network
- deep neural
- network model
- modeling
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 75
- 238000003062 neural network model Methods 0.000 title claims abstract description 68
- 238000005457 optimization Methods 0.000 title claims abstract description 32
- 238000013468 resource allocation Methods 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013528 artificial neural network Methods 0.000 claims description 40
- 238000004364 calculation method Methods 0.000 claims description 30
- 239000013598 vector Substances 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 12
- 230000003247 decreasing effect Effects 0.000 claims description 10
- 239000000126 substance Substances 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 7
- 238000000638 solvent extraction Methods 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 5
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000013138 pruning Methods 0.000 description 3
- 240000007651 Rubus glaucus Species 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 235000011034 Rubus glaucus Nutrition 0.000 description 1
- 235000009122 Rubus idaeus Nutrition 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 235000021013 raspberries Nutrition 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a multi-user deep neural network model segmentation and resource allocation optimization method in an edge computing scene, which models a combined optimization problem of deep neural network model segmentation and computing resource allocation on an edge server into a nonlinear integer programming problem by comprehensively analyzing the execution characteristics of a deep neural network model segmentation technology in an edge computing environment, and further provides an iterative alternative optimization algorithm based on step length dynamic adjustment. The algorithm can efficiently obtain the optimal solution of the problem within polynomial time, and has the characteristic of strong robustness to various external influences under a real deployment scene.
Description
Technical Field
The invention relates to the technical field of deep learning, edge calculation and distributed calculation, in particular to a multi-user deep neural network model segmentation and resource allocation optimization method in an edge calculation scene.
Background
With the gradual popularization of 5G technologies and the continuous development of technologies such as mobile artificial intelligence (mfi), internet of Things (IoT), and the like, the number of devices on the edge of a network has increased explosively. Meanwhile, the terminal device at the edge of the network gradually transitions from the consumer role of intelligent application to a special node with both consumers and producers, and continuously generates massive real-time data in the operation process. However, the conventional mobile cloud computing method is limited by the transmission bandwidth of the backbone network and the high transmission delay caused by the long physical distance, and it is difficult to meet the real-time requirement of the new mobile application. Moreover, uploading raw data to a cloud server also inevitably raises concerns for users about potential privacy disclosure issues. In order to solve these problems, Mobile Edge Computing (MEC) provides high-bandwidth, low-latency, and high-privacy data processing and caching services for users by deploying resources at the Edge of the network closer to the users. Due to the superior performance in handling computationally intensive and delay sensitive tasks, mobile edge computing is considered one of the most promising solutions for mobile artificial intelligence technology.
For intelligent application under mobile edge calculation, the current practical deployment mainly has two ideas: the first idea is to compress the model to the extent that the mobile terminal can bear the load by technologies such as model pruning, weight quantization and the like; the second idea is to deploy a part of the computation tasks of the model to other devices by using a distributed deployment technology, so that the costs of computation load, energy loss and the like of the computer are reduced.
The first idea is to optimize the model itself, which has the advantage of not requiring additional device support, but also has several problems that are difficult to solve: firstly, the precision of the compressed model is generally difficult to guarantee theoretically, and not all models are suitable for being compressed; secondly, weight pruning based on structure can not be applied to all models, and general weight pruning based on structure can prevent high-performance parallel optimization of hardware level.
The second idea is to disassemble the model structure, mix and deploy the model structure to multiple devices, and make full use of external computing resources. However, existing approaches all consider from a single user perspective, typically assuming that the resources on the corresponding cloud server or edge server are static, fixed. The actual deployment scenario usually involves a resource-limited multi-user scenario, the distributed deployment challenge is more complicated, and the existing methods fail to provide a low-overhead solution.
Disclosure of Invention
The invention provides a multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene to overcome at least one defect in the prior art, and realizes efficient and low-delay reasoning of a deep learning model on mobile terminal equipment.
In order to solve the technical problems, the invention adopts the technical scheme that: a multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene comprises the following steps:
s1, deep neural network model segmentation modeling step: defining a logic layer comprising a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;
s2, resource allocation and deep neural network model segmentation decision modeling under the edge computing multi-user environment: under the multi-user environment, fitting and estimating the calculation time delay of the deep neural network model segmentation by using a heuristic method, and modeling a problem into a nonlinear integer programming problem;
s3, solving a user response time delay optimization problem: and solving the problem obtained by modeling in the S2 by using an iterative alternative optimization algorithm, and deploying a deep neural network model on the edge server according to the obtained solution.
Further, the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is behind the ith logical layerIs divided; segmentation decision si∈{0,1,2,...,ki}; by usingRepresenting the calculated amount of the neural network before the division point byRepresenting the computational load of the neural network after the segmentation point.
Further, the step S2 specifically includes:
s21, modeling calculation power segmentation of the edge server: the computing power of the minimum allocatable computing resource unit MCRU is denoted as Cmin(ii) a Denote the total number of MCRUs on the edge server by β, and fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
S22, performing time delay modeling on the local equipment:
s23, modeling of calculation time delay of a neural network part unloaded to an edge server by equipment:
where θ is a unit step function, and its expression is:
gamma is an approximate map fitted from real data, representing fiThe part of the computing resource actually reaches the computing capacity CminMultiples of (d);
s24, modeling of transmission delay of the intermediate result:
wherein the content of the first and second substances,indicating that the user equipment i is at the cut-off point siThe size of the intermediate result that needs to be transmitted,representing the uplink bandwidth of user equipment i;
s25, modeling transmission delay returned by the final result:
wherein the content of the first and second substances,indicating the downlink bandwidth of the user equipment i;
s26, deducing and modeling the total time delay of the deep neural network segmentation model of the single user equipment:
combining the step S22, the step S23, the step S24 and the step S25 to perform the delay modeling (1), (2), (3) and (4) on each sub-step, obtaining the deep neural network segmentation model performed by the user equipment to infer that the total delay is:
s27, modeling of global time delay minimization of multi-user equipment:
where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, fiAnd siAre all non-negative integers.
Further, the step S3 specifically includes:
s31, generating an initial feasible solution vector (S, F): whereinRepresenting the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
s32, setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge serverThe value list of the adjustment step length is [ p ]q,pq-1,...,p2,p1,1];
S33, sequentially adjusting feasible solution vectors (S, F) according to each value of the adjustment step length list;
s34, traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
Further, p is 2.
Further, the step S33 specifically includes:
s331, according to the current solution (S, F), traversing all the user equipment i and calculating the F of each user equipment according to the formula (5)iTo calculate the time delay optimallyFinding the user equipment k with the longest calculation time delay, wherein the time delay is recorded as
S332, traversing all unmarked user equipment according to the adjustment step length tau, and calculating the optimal calculation time delay if the user equipment j loses tau resourcesIf it is notTagging the user equipment; if it is notDepriving tau resources from the user equipment j to k, and obtaining a new solution vector (S, F);
s333. repeat step S332 until all user devices are marked.
A multi-user deep neural network model segmentation and resource allocation optimization system under an edge computing scene comprises:
the deep neural network model segmentation modeling module: the device comprises a logic layer used for defining a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;
the resource allocation and deep neural network model segmentation decision modeling module under the edge computing multi-user environment comprises: the method is used for fitting and estimating the computation time delay of the deep neural network model segmentation by using a heuristic method under the multi-user environment, and modeling the problem as a nonlinear integer programming problem;
a user response time delay optimization problem solving module: and solving the problem obtained by modeling in the resource allocation and deep neural network model segmentation decision modeling module in the edge computing multi-user environment by using an iterative alternating optimization algorithm, and deploying a deep neural network model on the edge server according to the solved solution.
Further, the deep neural network model segmentation modeling module specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is segmented after the ith logical layer; segmentation decision si∈{0,1,2,...,ki}; by usingRepresenting the calculated amount of the neural network before the division point byRepresenting the computational load of the neural network after the segmentation point.
Further, the resource allocation and deep neural network model segmentation decision modeling module in the edge computing multi-user environment specifically includes:
an edge server computing power division modeling module for recording the computing power of the minimum allocable computing resource unit MCRU as Cmin(ii) a With beta denoting the total MCRU on the edge serverQuantity of, with fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
The equipment local execution delay modeling module is used for modeling to obtain:
the device is unloaded to the modeling module of the neural network part calculation time delay on the edge server, and is used for modeling to obtain:
where θ is a unit step function, and its expression is:
gamma is an approximate map fitted from real data, representing fiThe part of the computing resource actually reaches the computing capacity CminMultiples of (d);
the modeling module of the intermediate result transmission time delay is used for modeling to obtain:
wherein the content of the first and second substances,indicating that the user equipment i is at the cut-off point siThe size of the intermediate result that needs to be transmitted,representing the uplink bandwidth of user equipment i;
and the transmission delay modeling module with returned final results is used for modeling to obtain:
wherein the content of the first and second substances,indicating the downlink bandwidth of the user equipment i;
a modeling module of the deep neural network segmentation model of the single user equipment for deducing total time delay, which is used for combining the execution time delay modeling (1), (2), (3) and (4) of each sub-step in the steps S22, S23, S24 and S25 to obtain the time delay of the deep neural network segmentation model of the user equipment as follows:
modeling for global latency minimization of a multi-user device for modeling yielding:
wherein equation (7) represents the total resources of the edge serverThe number is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, fiAnd siAre all non-negative integers.
Further, the user response delay optimization problem solving module includes:
generate initial feasible solution vector (S, F) module: whereinRepresenting the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
a module for setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge serverThe value list of the adjustment step length is [ p ]q,pq-1,...,p2,p1,1];
Sequentially adjusting the feasible solution vectors (S, F) according to each value of the adjustment step length list;
traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
Compared with the prior art, the beneficial effects are:
1. the method for segmenting the multi-user deep neural network model and optimizing the resource allocation in the edge computing scene improves the computing efficiency of the deep neural network segmentation technology in the multi-user scene by means of parallelly segmenting the multi-user neural network and unloading the segmented multi-user deep neural network to an edge server and combining an iterative alternative optimization solution to solve an optimal allocation scheme, and realizes efficient and low-delay reasoning of a deep learning model on mobile-end equipment.
2. The method considers the multi-user multi-selection deep neural network segmentation, estimates the execution time delay of the single-user equipment through a heuristic function, solves a combined scheme of optimal computation unloading and resource allocation by utilizing an iterative alternative optimization algorithm, and has stronger generalization capability and practicability;
3. the invention provides a data-driven fitting method for more accurately modeling and estimating the calculation power distribution of the multi-core CPU in a real scene, and the method has higher practicability.
Drawings
FIG. 1 is a flow chart of the multi-user deep neural network segmentation optimization algorithm execution steps disclosed in the present invention;
FIG. 2 is a graph of task execution latency versus the number of computing resources of an edge server at an average bandwidth of 5 Mb/s;
FIG. 3 is a graph of task execution latency versus the number of computing resources of an edge server at high bandwidth (10 Mb/s for mobile devices, 100Mb/s for fixed devices);
FIG. 4 is a relationship between task execution latency and average bandwidth under computing resources of 2 CPU cores;
fig. 5 is a relationship between task execution latency and average bandwidth under the computing resources of 7 CPU cores.
Detailed Description
The embodiment discloses a multi-user deep neural network model segmentation and resource allocation optimization method in an edge computing scene, which estimates the execution time delay of user equipment through a heuristic function and solves a combined scheme of optimal computation unloading and resource allocation by utilizing an iterative alternative optimization algorithm.
The experimental environment of this embodiment is specifically as follows, a workstation equipped with an eight-core 3.7GHz intel processor and a 16G memory is used as an edge server to provide computing offload services for user equipment. The user equipment consists of two raspberry pi development boards and two nvidia jetsonnano nano. On the edge server side, a virtual server is constructed by using a Docker container technology to independently provide the DNN partitioning-based computing offload service for the user equipment. A plurality of CPU cores (regarded as allocable computing resources) are respectively allocated to different containers, and a minimum allocable computing resource unit (MCRU) is set to 0.1 core. The edge server cooperates with 4 user equipments. Two raspberries running the MobilenetV2 model are wirelessly connected to an edge server via Wi-Fi, representing low-performance mobile devices (e.g., smart phones, smart wearable devices). Two NVIDIA jetsonno devices running model VGG19 are connected to the edge server through a wired lan, representing higher performance fixed devices (e.g., intelligent routers, intelligent home devices).
A multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene comprises the following steps:
s1, deep neural network model segmentation modeling step: for the VGG19 model, defining a logic layer containing a plurality of deep neural network concept layers, and abstracting the deep neural network model into a computational graph model containing a plurality of continuous tandem tasks by taking the logic layer as a minimum partition unit;
s2, resource allocation and deep neural network model segmentation decision modeling under the edge computing multi-user environment: under the multi-user environment, fitting and estimating the calculation time delay of the deep neural network model segmentation by using a heuristic method, and modeling the problem into a nonlinear integer programming problem by combining with a calculation graph model in S1;
s3, solving a user response time delay optimization problem: and solving the problem obtained by modeling in the S2 by using an iterative alternative optimization algorithm, and deploying a deep neural network model on the edge server according to the obtained solution. The specific steps of the iterative alternative optimization algorithm are shown in fig. 1.
Further, the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers(ii) a The deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is segmented after the ith logical layer; segmentation decision si∈{0,1,2,...,ki}; by usingRepresenting the calculated amount of the neural network before the division point byRepresenting the computational load of the neural network after the segmentation point.
Further, the step S2 specifically includes:
s21, modeling calculation power segmentation of the edge server: the computing power of the minimum allocatable computing resource unit MCRU is denoted as Cmin(ii) a Denote the total number of MCRUs on the edge server by β, and fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
S22, performing time delay modeling on the local equipment:
s23, modeling of calculation time delay of a neural network part unloaded to an edge server by equipment:
where θ is a unit step function, and its expression is:
gamma is an approximate map fitted from real data, representing fiThe part of the computing resource actually reaches the computing capacity CminMultiples of (d);
s24, modeling of transmission delay of the intermediate result:
wherein the content of the first and second substances,indicating that the user equipment i is at the cut-off point siThe size of the intermediate result that needs to be transmitted,representing the uplink bandwidth of user equipment i;
s25, modeling transmission delay returned by the final result:
wherein the content of the first and second substances,indicating the downlink bandwidth of the user equipment i;
s26, deducing and modeling the total time delay of the deep neural network segmentation model of the single user equipment:
combining the step S22, the step S23, the step S24 and the step S25 to perform the delay modeling (1), (2), (3) and (4) on each sub-step, obtaining the deep neural network segmentation model performed by the user equipment to infer that the total delay is:
s27, modeling of global time delay minimization of multi-user equipment:
where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, fiAnd siAre all non-negative integers.
Further, the step S3 specifically includes:
s31, generating an initial feasible solution vector (S, F): whereinRepresenting the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
s32, setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge serverThe value list of the adjustment step length is [ p ]q,pq-1,...,p2,p1,1];
S33, sequentially adjusting feasible solution vectors (S, F) according to each value of the adjustment step length list;
s34, traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
Further, p is 2.
Further, the step S33 specifically includes:
s331, according to the current solution (S, F), traversing all the user equipment i and calculating the F of each user equipment according to the formula (5)iTo calculate the time delay optimallyFinding the user equipment k with the longest calculation time delay, wherein the time delay is recorded as
S332, traversing all unmarked user equipment according to the adjustment step length tau, and calculating the optimal calculation time delay if the user equipment j loses tau resourcesIf it is notTagging the user equipment; if it is notDepriving tau resources from the user equipment j to k, and obtaining a new solution vector (S, F);
s333. repeat step S332 until all user devices are marked.
Fig. 2 and fig. 3 show the influence of the resource abundance of the edge server on the final calculation delay under different bandwidths, and experimental results show that the solution obtained by the scheme disclosed by the invention can obtain the optimal effect.
Fig. 4 and 5 show the influence of network bandwidth on the final computation delay under different computation resource quantities, and experimental results show that the solution obtained by the scheme disclosed by the invention can obtain the optimal effect.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. A multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene is characterized by comprising the following steps:
s1, deep neural network model segmentation modeling step: defining a logic layer comprising a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;
s2, resource allocation and deep neural network model segmentation decision modeling under the edge computing multi-user environment: under the multi-user environment, fitting and estimating the calculation time delay of the deep neural network model segmentation by using a heuristic method, and modeling a problem into a nonlinear integer programming problem;
s3, solving a user response time delay optimization problem: and solving the problem obtained by modeling in the S2 by using an iterative alternative optimization algorithm, and deploying a deep neural network model on the edge server according to the obtained solution.
2. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 1, wherein the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is segmented after the ith logical layer; segmentation decision si∈{0,1,2,…,ki}; by usingRepresenting the calculated amount of the neural network before the division point byRepresenting the computational load of the neural network after the segmentation point.
3. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 2, wherein the step S2 specifically includes:
s21, modeling calculation power segmentation of the edge server: the computing capacity of the minimum allocatable computing resource unit MCRU is recorded as Cmin(ii) a Denote the total number of MCRUs on the edge server by β, and fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
S22, performing time delay modeling on the local equipment:
s23, modeling of calculation time delay of a neural network part unloaded to an edge server by equipment:
where θ is a unit step function, and its expression is:
gamma is an approximate mapping fitted through realistic data, representing that fi computing resources actually achieve computing power CminMultiples of (d);
s24, modeling of transmission delay of the intermediate result:
wherein the content of the first and second substances,indicating that the user equipment i is at the cut-off point siThe size of the intermediate result that needs to be transmitted,representing the uplink bandwidth of user equipment i;
s25, modeling transmission delay returned by the final result:
wherein the content of the first and second substances,indicating the downlink bandwidth of the user equipment i;
s26, deducing and modeling the total time delay of the deep neural network segmentation model of the single user equipment:
combining the step S22, the step S23, the step S24 and the step S25 to perform the delay modeling (1), (2), (3) and (4) on each sub-step, obtaining the deep neural network segmentation model performed by the user equipment to infer that the total delay is:
s27, modeling of global time delay minimization of multi-user equipment:
where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; n in formula (9) represents a natural numberSet, fiAnd siAre all non-negative integers.
4. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 3, wherein the step S3 specifically includes:
s31, generating an initial feasible solution vector (S, F): whereinRepresenting the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
s32, setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge serverThe value list of the adjustment step length is [ p ]q,pq-1,...,p2,p1,1];
S33, sequentially adjusting feasible solution vectors (S, F) according to each value of the adjustment step length list;
s34, traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
5. The method for the multi-user deep neural network model segmentation and resource allocation optimization in the edge computing scenario as claimed in claim 4, wherein p is 2.
6. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 4, wherein the step S33 specifically includes:
s331, according to the current solution (S, F), traversing all the user equipment i and calculating the F of each user equipment according to the formula (5)iTo calculate the time delay optimallyFinding the user equipment k with the longest calculation time delay, wherein the time delay is recorded as
S332, traversing all unmarked user equipment according to the adjustment step length tau, and calculating the optimal calculation time delay if the user equipment j loses tau resourcesIf it is notTagging the user equipment; if it is notDepriving tau resources from the user equipment j to k, and obtaining a new solution vector (S, F);
s333. repeat step S332 until all user devices are marked.
7. The multi-user deep neural network model segmentation and resource allocation optimization system under the edge computing scene is characterized by comprising the following steps:
the deep neural network model segmentation modeling module: the device comprises a logic layer used for defining a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;
the resource allocation and deep neural network model segmentation decision modeling module under the edge computing multi-user environment comprises: the method is used for fitting and estimating the computation time delay of the deep neural network model segmentation by using a heuristic method under the multi-user environment, and modeling the problem as a nonlinear integer programming problem;
a user response time delay optimization problem solving module: and solving the problem obtained by modeling in the resource allocation and deep neural network model segmentation decision modeling module in the edge computing multi-user environment by using an iterative alternating optimization algorithm, and deploying a deep neural network model on the edge server according to the solved solution.
8. The system for optimizing multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 7, wherein the deep neural network model segmentation modeling module specifically comprises: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is segmented after the ith logical layer; segmentation decision si∈{0,1,2,…,ki}; by usingRepresenting the calculated amount of the neural network before the division point byRepresenting the computational load of the neural network after the segmentation point.
9. The system for optimizing the partitioning and resource allocation of the multi-user deep neural network model under the edge computing scenario according to claim 8, wherein the decision modeling module for partitioning and resource allocation of the multi-user deep neural network model under the edge computing multi-user environment specifically comprises:
edge server computation partitioning modeling module for partitioning computation power of minimum allocable computational resource unit MCRUIs marked as Cmin(ii) a Denote the total number of MCRUs on the edge server by β, and fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
The equipment local execution delay modeling module is used for modeling to obtain:
the device is unloaded to the modeling module of the neural network part calculation time delay on the edge server, and is used for modeling to obtain:
where θ is a unit step function, and its expression is:
gamma is an approximate map fitted from real data, representing fiThe part of the computing resource actually reaches the computing capacity CminMultiples of (d);
the modeling module of the intermediate result transmission time delay is used for modeling to obtain:
wherein the content of the first and second substances,representing user equipmenti at the point of tangency siThe size of the intermediate result that needs to be transmitted,representing the uplink bandwidth of user equipment i;
and the transmission delay modeling module with returned final results is used for modeling to obtain:
wherein the content of the first and second substances,indicating the downlink bandwidth of the user equipment i;
a modeling module of the deep neural network segmentation model of the single user equipment for deducing total time delay, which is used for combining the execution time delay modeling (1), (2), (3) and (4) of each sub-step in the steps S22, S23, S24 and S25 to obtain the time delay of the deep neural network segmentation model of the user equipment as follows:
modeling for global latency minimization of a multi-user device for modeling yielding:
where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, fiAnd siAre all non-negative integers.
10. The system for optimizing multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 9, wherein the user response delay optimization problem solving module comprises:
generate initial feasible solution vector (S, F) module: whereinRepresenting the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
a module for setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge serverThe value list of the adjustment step length is [ p ]q,pq-1,…,p2,p1,1];
Sequentially adjusting the feasible solution vectors (S, F) according to each value of the adjustment step length list;
traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011638611.0A CN112822701A (en) | 2020-12-31 | 2020-12-31 | Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011638611.0A CN112822701A (en) | 2020-12-31 | 2020-12-31 | Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112822701A true CN112822701A (en) | 2021-05-18 |
Family
ID=75857638
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011638611.0A Pending CN112822701A (en) | 2020-12-31 | 2020-12-31 | Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112822701A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111312368A (en) * | 2020-01-20 | 2020-06-19 | 广西师范大学 | Method for accelerating medical image processing speed based on edge calculation |
CN113315669A (en) * | 2021-07-28 | 2021-08-27 | 江苏电力信息技术有限公司 | Cloud edge cooperation-based throughput optimization machine learning inference task deployment method |
CN113987692A (en) * | 2021-12-29 | 2022-01-28 | 华东交通大学 | Deep neural network partitioning method for unmanned aerial vehicle and edge computing server |
CN114064280A (en) * | 2021-11-20 | 2022-02-18 | 东南大学 | Edge collaborative inference method under multiple constraints |
CN115277452A (en) * | 2022-07-01 | 2022-11-01 | 中铁第四勘察设计院集团有限公司 | ResNet self-adaptive acceleration calculation method based on edge-end cooperation and application |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110996393A (en) * | 2019-12-12 | 2020-04-10 | 大连理工大学 | Single-edge computing server and multi-user cooperative computing unloading and resource allocation method |
CN112148492A (en) * | 2020-09-28 | 2020-12-29 | 南京大学 | Service deployment and resource allocation method considering multi-user mobility |
-
2020
- 2020-12-31 CN CN202011638611.0A patent/CN112822701A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110996393A (en) * | 2019-12-12 | 2020-04-10 | 大连理工大学 | Single-edge computing server and multi-user cooperative computing unloading and resource allocation method |
CN112148492A (en) * | 2020-09-28 | 2020-12-29 | 南京大学 | Service deployment and resource allocation method considering multi-user mobility |
Non-Patent Citations (1)
Title |
---|
XIN TANG等: "Joint Multiuser DNN Partitioning and Computational Resource Allocation for Collaborative Edge Intelligence", 《IEEE INTERNET OF THINGS JOURNAL》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111312368A (en) * | 2020-01-20 | 2020-06-19 | 广西师范大学 | Method for accelerating medical image processing speed based on edge calculation |
CN113315669A (en) * | 2021-07-28 | 2021-08-27 | 江苏电力信息技术有限公司 | Cloud edge cooperation-based throughput optimization machine learning inference task deployment method |
CN114064280A (en) * | 2021-11-20 | 2022-02-18 | 东南大学 | Edge collaborative inference method under multiple constraints |
CN113987692A (en) * | 2021-12-29 | 2022-01-28 | 华东交通大学 | Deep neural network partitioning method for unmanned aerial vehicle and edge computing server |
CN115277452A (en) * | 2022-07-01 | 2022-11-01 | 中铁第四勘察设计院集团有限公司 | ResNet self-adaptive acceleration calculation method based on edge-end cooperation and application |
CN115277452B (en) * | 2022-07-01 | 2023-11-28 | 中铁第四勘察设计院集团有限公司 | ResNet self-adaptive acceleration calculation method based on edge-side coordination and application |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112822701A (en) | Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene | |
CN107995660B (en) | Joint task scheduling and resource allocation method supporting D2D-edge server unloading | |
CN111918339B (en) | AR task unloading and resource allocation method based on reinforcement learning in mobile edge network | |
CN109246761B (en) | Unloading method based on alternating direction multiplier method considering delay and energy consumption | |
CN110968426B (en) | Edge cloud collaborative k-means clustering model optimization method based on online learning | |
CN110519370B (en) | Edge computing resource allocation method based on facility site selection problem | |
CN110968366B (en) | Task unloading method, device and equipment based on limited MEC resources | |
CN108416465B (en) | Workflow optimization method in mobile cloud environment | |
CN113615137B (en) | CDN optimization platform | |
CN110162390B (en) | Task allocation method and system for fog computing system | |
CN112686374A (en) | Deep neural network model collaborative reasoning method based on adaptive load distribution | |
KR102668157B1 (en) | Apparatus and method for dynamic resource allocation in cloud radio access networks | |
CN114265631A (en) | Mobile edge calculation intelligent unloading method and device based on federal meta-learning | |
US20210111736A1 (en) | Variational dropout with smoothness regularization for neural network model compression | |
Li et al. | Computation offloading and service allocation in mobile edge computing | |
CN110167031B (en) | Resource allocation method, equipment and storage medium for centralized base station | |
CN111158893B (en) | Task unloading method, system, equipment and medium applied to fog computing network | |
Malazi et al. | Distributed service placement and workload orchestration in a multi-access edge computing environment | |
Zhang et al. | An online learning-based task offloading framework for 5G small cell networks | |
CN116302481B (en) | Resource allocation method and system based on sparse knowledge graph link prediction | |
CN114745386B (en) | Neural network segmentation and unloading method in multi-user edge intelligent scene | |
CN110944335B (en) | Resource allocation method and device for virtual reality service | |
KR20220085824A (en) | High quality video super high resolution with micro-structured masks | |
Selmy et al. | Energy efficient resource management for Cloud Computing Environment | |
CN114116052A (en) | Edge calculation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210518 |
|
RJ01 | Rejection of invention patent application after publication |