CN117829208A - Multi-outlet neural network design method and device for edge scene dynamic resources - Google Patents

Multi-outlet neural network design method and device for edge scene dynamic resources Download PDF

Info

Publication number
CN117829208A
CN117829208A CN202410167266.9A CN202410167266A CN117829208A CN 117829208 A CN117829208 A CN 117829208A CN 202410167266 A CN202410167266 A CN 202410167266A CN 117829208 A CN117829208 A CN 117829208A
Authority
CN
China
Prior art keywords
branch
neural network
branches
outlet
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410167266.9A
Other languages
Chinese (zh)
Inventor
高艺
董玮
李福�
黄家名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202410167266.9A priority Critical patent/CN117829208A/en
Publication of CN117829208A publication Critical patent/CN117829208A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A design method and device of a multi-outlet neural network facing edge scene dynamic resources, the method comprises the following steps: (1) Giving a pre-training neural network, generating an effective branch library, and screening; (2) The branches in the branch library are subjected to self-merging to further reduce memory occupation, and after self-merging, the accuracy of the branches is quickly recovered through few sample knowledge distillation and retraining to form a branch candidate library; (3) Performing depth-first search on the branch candidate library based on the priority, and finding out an optimal branch combination meeting the memory and time requirements; (4) And after receiving the selected branch combination, the on-device scheduler realizes the branch instant updating of the pre-trained neural network through the SBRAN component. The invention converts the traditional single-outlet neural network into the multi-outlet neural network with heterogeneous and dynamic characteristics, and the neural network can be more flexibly adapted to the change of memory resources, thereby improving the operation efficiency and accuracy on the edge equipment.

Description

Multi-outlet neural network design method and device for edge scene dynamic resources
Technical Field
The invention provides a multi-outlet neural network design method and device for dynamic resources of an edge scene, and particularly focuses on application of edge equipment in a dynamic change environment of memory resources.
Background
With the rapid development of edge computing in the modern technology field, the demand for computing power to efficiently utilize edge devices is increasing. The core advantage of edge computing is the ability to process data close to the data source, thereby reducing latency and increasing response speed. However, edge devices are often limited by low computational power and limited memory resources, which greatly limit their ability to handle complex tasks. In such environments, management and optimization of memory resources becomes particularly critical.
The edge devices are flooded with a large amount of uncertainty in computing resources, including but not limited to CPU/GPU cores and VRAM. The uncertainty nature of these resources means that they may be affected by a variety of factors, such as parallel tasks, energy limitations, etc., resulting in fluctuations in their performance at different points in time. In this context, the application of Deep Neural Networks (DNNs) faces a very challenging task, namely how to maintain efficient reasoning performance with constantly changing memory resources. Multi-outlet neural networks are considered a potential solution because they allow results to be output ahead of time in the reasoning process on demand, providing greater flexibility and accuracy in resource constrained situations.
While multi-outlet neural networks have shown potential in processing non-deterministic computing resources, the prior art still has limitations in several critical areas. In particular, the accuracy of intermediate reasoning results, as well as isomorphism and static structure of the export model, limit the effectiveness of these networks in dynamic environments. Therefore, it becomes particularly important to efficiently determine the structure, number and layout of heterogeneous branches for a given pre-trained neural network, and to dynamically update the branches according to changes in device memory.
Disclosure of Invention
In order to overcome the limitation of the prior art, the invention provides an innovative design method and device for a multi-outlet neural network, and particularly aims at the situation of dynamic change of memory resources. The core of the method is to design and realize a system which can automatically convert the traditional single-outlet neural network into a heterogeneous and dynamic multi-outlet neural network. The essence of the method is that the neural network can be more flexibly adapted to the change of the memory resources through the conversion, so that the operation efficiency and accuracy on the edge equipment are improved.
The first aspect of the invention provides a multi-outlet neural network design method for edge scene dynamic resources, which comprises the following steps:
(1) Giving a pre-training neural network, generating an effective branch library, and screening;
(2) The branches in the branch library are subjected to self-merging to further reduce memory occupation, and after self-merging, the accuracy of the branches is quickly recovered through few sample knowledge distillation and retraining to form a branch candidate library;
(3) Performing depth-first search on the branch candidate library based on the priority, and finding out an optimal branch combination meeting the memory and time requirements;
(4) And after receiving the selected branch combination, the on-device scheduler realizes the branch instant updating of the pre-trained neural network through the SBRAN component.
The step 1 specifically includes:
step 1.1: and a plurality of convolution layer structures and full connection layer structures which can be combined to form effective branches are selected in advance, and a branch library is formed according to different convolution layer structures and full connection layer structures and different outlet points.
Step 1.2: and training branches of the branch library on a given pre-training neural network to obtain configuration information of the branches, wherein the configuration information comprises weights, parameter numbers, reasoning accuracy and execution completion time.
Step 1.3: screening the branch library, and screening the standard: branches with unsatisfactory execution accuracy and branches with lower accuracy and longer execution time than the reasoning time of the pre-trained neural network are deleted.
In the step 2, through weight sharing, the memory occupied by the branches of the branch library is further reduced, which specifically includes:
step 2.1: the traditional weight sharing scheme selects a plurality of branches to share weight, but the sharing benefit is not ideal due to the different layers and the different numbers of neurons among the branches. The invention provides branch self-merging, which can maximize the saved memory space under the condition of almost not losing precision.
Step 2.1.1: errors between different neurons in the branches are calculated. Different from the Hessian error calculation method, a large amount of memory is occupied. In order to enable all branches to be combined more efficiently, the invention adopts L2-Norm for error calculation.
Step 2.1.2: the neuron pairs to be merged are selected based on the error. All neurons are grouped according to the error magnitude, and are ordered according to the difference magnitude between the neuron pairs, and the first 1/3 neuron pairs with lower errors are combined.
Step 2.1.3: a shared weight is calculated for each neuron pair to be fused. Also, the method of Hessian is not adopted, and the average value of the weights of two neurons is directly calculated as the shared weight, so that the method is simple and effective.
Step 2.2: the branch self-merging inevitably brings a certain degree of precision loss, and the precision of the branch is quickly recovered by adopting the distillation and retraining of less sample knowledge.
Step 2.2.1: and constructing a proper loss function for retraining. The loss function L consists of three parts: the supervision information MSE from the tag, KL divergence from distillation, and L1-Norm loss of shared weights, y is the tag,is the prediction of the ith branch, y c Is the final prediction, W s 、/>Alpha and beta are settable tuning parameters corresponding to the shared weight and the back-propagation updated weight, respectively. PreferablyThe present invention suggests parameter settings α=0.5, β=1.
Step 2.2.2: for self-merging branches, the branches are added into a branch library after screening according to the screening standard.
Wherein, in step 3, the memory budget M and the time budget T of the device are given budget The invention provides a depth-first search DFS based on priority to select an optimal branch combination from a branch library, which specifically comprises the following steps:
step 3.1: the invention provides the accuracy and the area p=Acc.Time of the area enclosed by the Time as the performance index, acc is the accuracy of the branch, and Time is the cut-off Time minus the branch reasoning completion Time. Promotion of current branch combination performance per branch unit parameterAs the priority of the branches, the same branch has different priorities under different current branch combinations, and the same branch needs to be calculated according to the current branch combinations.
Step 3.2: the branch transmission rate is R, and there are two limitations to the DFS that can be calculated: memory budget M and search time T search And determining the branch selection sequence of the DFS of each layer according to the priority order of the branches. At search time T search Finally find a branch combination B with optimal performance meeting the memory requirement selected
Wherein step 4 combines branches B via an SBRAN component selected Updating the memory to realize memory elastic reasoning.
The second aspect of the present invention relates to a multi-outlet neural network design device for edge-oriented dynamic resources, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for implementing the multi-outlet neural network design method for edge-oriented dynamic resources of the present invention when executing the executable codes.
A third aspect of the present invention relates to a computer readable storage medium having stored thereon a program which, when executed by a processor, implements the edge scene dynamic resource oriented multi-outlet neural network design method of the present invention.
The core of the invention is to design and implement an innovative system that can convert a traditional single-outlet neural network into a multi-outlet neural network with heterogeneous and dynamic characteristics. Through the conversion, the neural network can be more flexibly adapted to the change of the memory resources, so that the operation efficiency and accuracy are improved on the edge equipment.
The invention has the advantages that: the method can automatically convert the single-outlet neural network into the multi-outlet neural network, and is particularly suitable for the edge computing environment. This innovation solves the challenges of designing and deploying efficient heterogeneous branches under varying memory conditions and significantly reduces the memory overhead generated by dynamic branches. The invention provides a powerful scheme for efficiently utilizing the non-deterministic computing capability on the equipment with limited resources by reducing the memory consumption while realizing high-precision reasoning, and has wide application prospect.
Drawings
In order to more clearly illustrate the examples of the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of the operation of the method of the present invention.
Fig. 2 is a data structure diagram in the method of the present invention.
Fig. 3 is a flow chart of priority-based DFS in the method of the present invention.
Fig. 4 is a schematic diagram of priority-based DFS in the method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
Example 1
Referring to fig. 1, the invention provides a multi-outlet neural network design method for dynamic resources of an edge scene. The core of this approach is to design and implement an innovative system that can convert a traditional single-outlet neural network into a multi-outlet neural network with heterogeneous and dynamic characteristics. Through the conversion, the neural network can be more flexibly adapted to the change of the memory resources, so that the operation efficiency and accuracy are improved on the edge equipment.
The method specifically comprises the following steps:
step 1: given the pre-trained neural network, branch candidates are generated, as shown in fig. 2.
Step 1.1: and a plurality of convolution layer structures and full connection layer structures which can be combined to form effective branches are selected in advance, and a branch library is formed according to different convolution layer structures and full connection layer structures and different outlet points.
Step 1.2: and training branches of the branch library on a given pre-training neural network to obtain configuration information of the branches, wherein the configuration information comprises weights, parameter numbers, reasoning accuracy and execution completion time.
Step 1.3: screening the branch library, and screening the standard: branches with unsatisfactory execution accuracy and branches with lower accuracy and longer execution time than the reasoning time of the pre-trained neural network are deleted.
Step 2: the memory occupied by the branches of the branch library can be further reduced through weight sharing.
Step 2.1: the traditional weight sharing scheme selects a plurality of branches to share weight, but the sharing benefit is not ideal due to the different layers and the different numbers of neurons among the branches. The invention provides branch self-merging, which can maximize the saved memory space under the condition of almost not losing precision.
Step 2.1.1: errors between different neurons in the branches are calculated. Different from the Hessian error calculation method, a large amount of memory is occupied. In order to enable all branches to be combined more efficiently, the invention adopts L2-Norm for error calculation.
Step 2.1.2: the neuron pairs to be merged are selected based on the error. All neurons are grouped according to the error magnitude, and are ordered according to the difference magnitude between the neuron pairs, and the first 1/3 neuron pairs with lower errors are combined.
Step 2.1.3: a shared weight is calculated for each neuron pair to be fused. Also, the method of Hessian is not adopted, and the average value of the weights of two neurons is directly calculated as the shared weight, so that the method is simple and effective.
Step 2.2: the branch self-merging inevitably brings a certain degree of precision loss, and the precision of the branch is quickly recovered by adopting the distillation and retraining of less sample knowledge.
Step 2.2.1: and constructing a proper loss function for retraining. The loss function L consists of three parts: the supervision information MSE from the tag, KL divergence from distillation, and L1-Norm loss of shared weights, y is the tag,is the prediction of the ith branch, y c Is the final prediction, W s 、/>The present invention suggests parameter settings α=0.5, β=1, corresponding to the shared weight and the back propagation updated weight, respectively.
Step 2.2.2: for self-merging branches, the branches are added into a branch library after screening according to the screening standard.
Step 3: memory budget M and time budget T for a given device budget The present invention proposes to select the optimal branch combination from the branch library based on the priority-based depth-first search DFS, as shown in fig. 3, 4.
Step 3.1: the invention provides the accuracy and the area p=Acc.Time of the area enclosed by the Time as the performance index, acc is the accuracy of the branch, and Time is the cut-off Time minus the branch reasoning completion Time. Promotion of current branch combination performance per branch unit parameterAs the priority of the branches, the same branch has different priorities under different current branch combinations, and the same branch needs to be calculated according to the current branch combinations.
Step 3.2: the branch transmission rate is R, and there are two limitations to the DFS that can be calculated: memory budget M and search time T search And determining the branch selection sequence of the DFS of each layer according to the priority order of the branches. At search time T search Finally find a branch combination B with optimal performance meeting the memory requirement selected
Step 4: the scheduler on the device predicts the dynamic change and time of the memory through the memory budget component and the delay budget component, and based on the steps 1-3, the AI task on the device realizes the memory dynamic reasoning,
example 2
The embodiment relates to a multi-outlet neural network design device facing to an edge scene dynamic resource, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the multi-outlet neural network design method facing to the edge scene dynamic resource of the embodiment 1 when executing the executable codes.
Example 3
The present embodiment relates to a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the multi-outlet neural network design method for edge-scene-oriented dynamic resources of embodiment 1.
The invention considers the dynamic change of the internal memory of the edge equipment and realizes the branch update from the server to the equipment. Even under the condition of dynamic change of memory, the weak device can still effectively perform model reasoning, so that efficient and reliable AI task execution is realized in the edge computing environment with limited resources.
The embodiments described in the present specification are merely examples of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, and the scope of protection of the present invention and equivalent technical means that can be conceived by those skilled in the art based on the inventive concept.

Claims (8)

1. A design method of a multi-outlet neural network facing edge scene dynamic resources comprises the following steps:
(1) Giving a pre-training neural network, generating an effective branch library, and screening;
(2) The branches in the branch library are subjected to self-merging to further reduce memory occupation, and after self-merging, the accuracy of the branches is quickly recovered through few sample knowledge distillation and retraining to form a branch candidate library;
(3) Performing depth-first search on the branch candidate library based on the priority, and finding out an optimal branch combination meeting the memory and time requirements;
(4) And after receiving the selected branch combination, the on-device scheduler realizes the branch instant updating of the pre-trained neural network through the SBRAN component.
2. The method for designing a multi-outlet neural network for dynamic resources of an edge scene as claimed in claim 1, wherein the step 1 specifically comprises:
step 1.1: a plurality of convolution layer structures and full-connection layer structures which can be combined to form effective branches are selected in advance, and a branch library is formed according to different convolution layer structures and full-connection layer structures and different outlet points;
step 1.2: training branches of a branch library on a given pre-training neural network to obtain configuration information of the branches, wherein the configuration information comprises weights, parameters, reasoning accuracy and execution completion time;
step 1.3: screening the branch library, and screening the standard: branches with unsatisfactory execution accuracy and branches with lower accuracy and longer execution time than the reasoning time of the pre-trained neural network are deleted.
3. The method for designing a multi-outlet neural network for dynamic resources of an edge scene as claimed in claim 1, wherein the step 2 specifically comprises:
step 2.1: providing branch self-merging, and maximizing saved memory space under the condition of almost not losing precision, wherein the method specifically comprises the following steps:
step 2.1.1: calculating errors among different neurons in the branches, and calculating the errors by adopting L2-Norm;
step 2.1.2: selecting neuron pairs to be merged according to the error; all neurons are grouped according to the error magnitude, sorting is carried out according to the difference magnitude between the neuron pairs, and the front 1/3 neuron pairs with lower errors are combined;
step 2.1.3: calculating the sharing weight of each neuron pair to be fused; directly calculating the average value of the two neuron weights as a shared weight;
step 2.2: in order to avoid the precision loss caused by the self-merging of the branches, the precision of the branches is quickly recovered by adopting the distillation and retraining of less sample knowledge;
step 2.2.1: constructing a proper loss function for retraining; the loss function L consists of three parts: the supervision information MSE from the tag, KL divergence from distillation, and L1-Norm loss of shared weights, y is the tag,is the prediction of the ith branch, y c Is the final prediction, W s 、/>Alpha and beta are settable adjustment parameters corresponding to the shared weight and the back propagation updated weight respectively;
step 2.2.2: for self-merging branches, the branches are added into a branch library after screening according to the screening standard.
4. The method for designing a multi-outlet neural network for dynamic resources of an edge scene according to claim 1, wherein α=0.5 and β=1 are set in step 2.2.1.
5. The method for designing a multi-outlet neural network for dynamic resources of an edge scene as set forth in claim 1, wherein the memory budget M and the time budget T of the given device in the step 3 budget It is proposed to select an optimal branch combination from a branch library by priority-based depth-first search DFS, specifically comprising:
step 3.1: providing the accuracy and the area p=Acc·Time of the area enclosed by the Time as performance indexes, wherein Acc is the accuracy of the branch, and Time is the cut-off Time minus the branch reasoning completion Time; promotion of current branch combination performance per branch unit parameterAs the priority of the branch, the same branch has different priorities under different current branch combinations, and the same branch needs to be calculated according to the current branch combinations;
step 3.2: let the branch transmission rate be R, there are two limitations in calculating DFS: memory budget M and search time T search Determining branch selection of each layer of DFS according to priority ordering of branchesSelecting an order; at search time T search Finally find a branch combination B with optimal performance meeting the memory requirement selected
6. The method for designing a multi-outlet neural network for dynamic resources of an edge scene as recited in claim 1, wherein the branching combination B is implemented in step 4 by an SBRAN component selected Updating the memory to realize memory elastic reasoning.
7. A multi-outlet neural network design device for edge-scene-oriented dynamic resources, comprising a memory and one or more processors, wherein the memory stores executable code, and the one or more processors are configured to implement the multi-outlet neural network design method for edge-scene-oriented dynamic resources of any one of claims 1-6 when the executable code is executed.
8. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements the edge-scene dynamic resource-oriented multi-outlet neural network design method of any one of claims 1-6.
CN202410167266.9A 2024-02-06 2024-02-06 Multi-outlet neural network design method and device for edge scene dynamic resources Pending CN117829208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410167266.9A CN117829208A (en) 2024-02-06 2024-02-06 Multi-outlet neural network design method and device for edge scene dynamic resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410167266.9A CN117829208A (en) 2024-02-06 2024-02-06 Multi-outlet neural network design method and device for edge scene dynamic resources

Publications (1)

Publication Number Publication Date
CN117829208A true CN117829208A (en) 2024-04-05

Family

ID=90519348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410167266.9A Pending CN117829208A (en) 2024-02-06 2024-02-06 Multi-outlet neural network design method and device for edge scene dynamic resources

Country Status (1)

Country Link
CN (1) CN117829208A (en)

Similar Documents

Publication Publication Date Title
CN110490309B (en) Operator fusion method for neural network and related product thereof
CN111309479B (en) Method, device, equipment and medium for realizing task parallel processing
WO2020108371A1 (en) Partitioning of deep learning inference with dynamic offloading
CN108701250B (en) Data fixed-point method and device
CN114915630B (en) Task allocation method, network training method and device based on Internet of Things equipment
CN110717574B (en) Neural network operation method and device and heterogeneous intelligent chip
Gao et al. Deep neural network task partitioning and offloading for mobile edge computing
WO2021158267A1 (en) Computational graph optimization
CN117271101B (en) Operator fusion method and device, electronic equipment and storage medium
CN114936708A (en) Fault diagnosis optimization method based on edge cloud collaborative task unloading and electronic equipment
Sheeba et al. An efficient fault tolerance scheme based enhanced firefly optimization for virtual machine placement in cloud computing
US10558500B2 (en) Scheduling heterogenous processors
Zhang et al. A locally distributed mobile computing framework for dnn based android applications
CN117829208A (en) Multi-outlet neural network design method and device for edge scene dynamic resources
CN116304212A (en) Data processing system, method, equipment and storage medium
CN116303219A (en) Grid file acquisition method and device and electronic equipment
Shalini Lakshmi et al. A predictive context aware collaborative offloading framework for compute-intensive applications
Wang et al. Multi-Granularity Decomposition for Componentized Multimedia Applications based on Graph Clustering
Cavallo et al. A LAHC-based job scheduling strategy to improve big data processing in geo-distributed contexts
CN110415162B (en) Adaptive graph partitioning method facing heterogeneous fusion processor in big data
Rad et al. An intelligent algorithm for mapping of applications on parallel reconfigurable systems
CN113988277A (en) Neural network mapping method, device and equipment for storage and computation integrated chip
JPWO2013058396A1 (en) Task placement apparatus and task placement method
Bengre et al. A learning-based scheduler for high volume processing in data warehouse using graph neural networks
KR102661026B1 (en) Inference method using dynamic resource-based adaptive deep learning model and deep learning model inference device performing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination