CN117829208A

CN117829208A - Multi-outlet neural network design method and device for edge scene dynamic resources

Info

Publication number: CN117829208A
Application number: CN202410167266.9A
Authority: CN
Inventors: 高艺; 董玮; 李福�; 黄家名
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2024-02-06
Filing date: 2024-02-06
Publication date: 2024-04-05

Abstract

A design method and device of a multi-outlet neural network facing edge scene dynamic resources, the method comprises the following steps: (1) Giving a pre-training neural network, generating an effective branch library, and screening; (2) The branches in the branch library are subjected to self-merging to further reduce memory occupation, and after self-merging, the accuracy of the branches is quickly recovered through few sample knowledge distillation and retraining to form a branch candidate library; (3) Performing depth-first search on the branch candidate library based on the priority, and finding out an optimal branch combination meeting the memory and time requirements; (4) And after receiving the selected branch combination, the on-device scheduler realizes the branch instant updating of the pre-trained neural network through the SBRAN component. The invention converts the traditional single-outlet neural network into the multi-outlet neural network with heterogeneous and dynamic characteristics, and the neural network can be more flexibly adapted to the change of memory resources, thereby improving the operation efficiency and accuracy on the edge equipment.

Description

Multi-outlet neural network design method and device for edge scene dynamic resources

Technical Field

The invention provides a multi-outlet neural network design method and device for dynamic resources of an edge scene, and particularly focuses on application of edge equipment in a dynamic change environment of memory resources.

Background

With the rapid development of edge computing in the modern technology field, the demand for computing power to efficiently utilize edge devices is increasing. The core advantage of edge computing is the ability to process data close to the data source, thereby reducing latency and increasing response speed. However, edge devices are often limited by low computational power and limited memory resources, which greatly limit their ability to handle complex tasks. In such environments, management and optimization of memory resources becomes particularly critical.

The edge devices are flooded with a large amount of uncertainty in computing resources, including but not limited to CPU/GPU cores and VRAM. The uncertainty nature of these resources means that they may be affected by a variety of factors, such as parallel tasks, energy limitations, etc., resulting in fluctuations in their performance at different points in time. In this context, the application of Deep Neural Networks (DNNs) faces a very challenging task, namely how to maintain efficient reasoning performance with constantly changing memory resources. Multi-outlet neural networks are considered a potential solution because they allow results to be output ahead of time in the reasoning process on demand, providing greater flexibility and accuracy in resource constrained situations.

While multi-outlet neural networks have shown potential in processing non-deterministic computing resources, the prior art still has limitations in several critical areas. In particular, the accuracy of intermediate reasoning results, as well as isomorphism and static structure of the export model, limit the effectiveness of these networks in dynamic environments. Therefore, it becomes particularly important to efficiently determine the structure, number and layout of heterogeneous branches for a given pre-trained neural network, and to dynamically update the branches according to changes in device memory.

Disclosure of Invention

In order to overcome the limitation of the prior art, the invention provides an innovative design method and device for a multi-outlet neural network, and particularly aims at the situation of dynamic change of memory resources. The core of the method is to design and realize a system which can automatically convert the traditional single-outlet neural network into a heterogeneous and dynamic multi-outlet neural network. The essence of the method is that the neural network can be more flexibly adapted to the change of the memory resources through the conversion, so that the operation efficiency and accuracy on the edge equipment are improved.

The first aspect of the invention provides a multi-outlet neural network design method for edge scene dynamic resources, which comprises the following steps:

(1) Giving a pre-training neural network, generating an effective branch library, and screening;

(2) The branches in the branch library are subjected to self-merging to further reduce memory occupation, and after self-merging, the accuracy of the branches is quickly recovered through few sample knowledge distillation and retraining to form a branch candidate library;

(3) Performing depth-first search on the branch candidate library based on the priority, and finding out an optimal branch combination meeting the memory and time requirements;

(4) And after receiving the selected branch combination, the on-device scheduler realizes the branch instant updating of the pre-trained neural network through the SBRAN component.

The step 1 specifically includes:

step 1.1: and a plurality of convolution layer structures and full connection layer structures which can be combined to form effective branches are selected in advance, and a branch library is formed according to different convolution layer structures and full connection layer structures and different outlet points.

Step 1.2: and training branches of the branch library on a given pre-training neural network to obtain configuration information of the branches, wherein the configuration information comprises weights, parameter numbers, reasoning accuracy and execution completion time.

Step 1.3: screening the branch library, and screening the standard: branches with unsatisfactory execution accuracy and branches with lower accuracy and longer execution time than the reasoning time of the pre-trained neural network are deleted.

In the step 2, through weight sharing, the memory occupied by the branches of the branch library is further reduced, which specifically includes:

step 2.1: the traditional weight sharing scheme selects a plurality of branches to share weight, but the sharing benefit is not ideal due to the different layers and the different numbers of neurons among the branches. The invention provides branch self-merging, which can maximize the saved memory space under the condition of almost not losing precision.

Step 2.1.1: errors between different neurons in the branches are calculated. Different from the Hessian error calculation method, a large amount of memory is occupied. In order to enable all branches to be combined more efficiently, the invention adopts L2-Norm for error calculation.

Step 2.1.2: the neuron pairs to be merged are selected based on the error. All neurons are grouped according to the error magnitude, and are ordered according to the difference magnitude between the neuron pairs, and the first 1/3 neuron pairs with lower errors are combined.

Step 2.1.3: a shared weight is calculated for each neuron pair to be fused. Also, the method of Hessian is not adopted, and the average value of the weights of two neurons is directly calculated as the shared weight, so that the method is simple and effective.

Step 2.2: the branch self-merging inevitably brings a certain degree of precision loss, and the precision of the branch is quickly recovered by adopting the distillation and retraining of less sample knowledge.

Step 2.2.1: and constructing a proper loss function for retraining. The loss function L consists of three parts: the supervision information MSE from the tag, KL divergence from distillation, and L1-Norm loss of shared weights, y is the tag,is the prediction of the ith branch, y ^c Is the final prediction, W ^s 、/>Alpha and beta are settable tuning parameters corresponding to the shared weight and the back-propagation updated weight, respectively. PreferablyThe present invention suggests parameter settings α=0.5, β=1.

Step 2.2.2: for self-merging branches, the branches are added into a branch library after screening according to the screening standard.

Wherein, in step 3, the memory budget M and the time budget T of the device are given _budget The invention provides a depth-first search DFS based on priority to select an optimal branch combination from a branch library, which specifically comprises the following steps:

step 3.1: the invention provides the accuracy and the area p=Acc.Time of the area enclosed by the Time as the performance index, acc is the accuracy of the branch, and Time is the cut-off Time minus the branch reasoning completion Time. Promotion of current branch combination performance per branch unit parameterAs the priority of the branches, the same branch has different priorities under different current branch combinations, and the same branch needs to be calculated according to the current branch combinations.

Step 3.2: the branch transmission rate is R, and there are two limitations to the DFS that can be calculated: memory budget M and search time T _search And determining the branch selection sequence of the DFS of each layer according to the priority order of the branches. At search time T _search Finally find a branch combination B with optimal performance meeting the memory requirement _selected 。

Wherein step 4 combines branches B via an SBRAN component _selected Updating the memory to realize memory elastic reasoning.

The second aspect of the present invention relates to a multi-outlet neural network design device for edge-oriented dynamic resources, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for implementing the multi-outlet neural network design method for edge-oriented dynamic resources of the present invention when executing the executable codes.

A third aspect of the present invention relates to a computer readable storage medium having stored thereon a program which, when executed by a processor, implements the edge scene dynamic resource oriented multi-outlet neural network design method of the present invention.

The core of the invention is to design and implement an innovative system that can convert a traditional single-outlet neural network into a multi-outlet neural network with heterogeneous and dynamic characteristics. Through the conversion, the neural network can be more flexibly adapted to the change of the memory resources, so that the operation efficiency and accuracy are improved on the edge equipment.

The invention has the advantages that: the method can automatically convert the single-outlet neural network into the multi-outlet neural network, and is particularly suitable for the edge computing environment. This innovation solves the challenges of designing and deploying efficient heterogeneous branches under varying memory conditions and significantly reduces the memory overhead generated by dynamic branches. The invention provides a powerful scheme for efficiently utilizing the non-deterministic computing capability on the equipment with limited resources by reducing the memory consumption while realizing high-precision reasoning, and has wide application prospect.

Drawings

In order to more clearly illustrate the examples of the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of the operation of the method of the present invention.

Fig. 2 is a data structure diagram in the method of the present invention.

Fig. 3 is a flow chart of priority-based DFS in the method of the present invention.

Fig. 4 is a schematic diagram of priority-based DFS in the method of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Example 1

Referring to fig. 1, the invention provides a multi-outlet neural network design method for dynamic resources of an edge scene. The core of this approach is to design and implement an innovative system that can convert a traditional single-outlet neural network into a multi-outlet neural network with heterogeneous and dynamic characteristics. Through the conversion, the neural network can be more flexibly adapted to the change of the memory resources, so that the operation efficiency and accuracy are improved on the edge equipment.

The method specifically comprises the following steps:

step 1: given the pre-trained neural network, branch candidates are generated, as shown in fig. 2.

Step 2: the memory occupied by the branches of the branch library can be further reduced through weight sharing.

Step 2.2.1: and constructing a proper loss function for retraining. The loss function L consists of three parts: the supervision information MSE from the tag, KL divergence from distillation, and L1-Norm loss of shared weights, y is the tag,is the prediction of the ith branch, y ^c Is the final prediction, W ^s 、/>The present invention suggests parameter settings α=0.5, β=1, corresponding to the shared weight and the back propagation updated weight, respectively.

Step 3: memory budget M and time budget T for a given device _budget The present invention proposes to select the optimal branch combination from the branch library based on the priority-based depth-first search DFS, as shown in fig. 3, 4.

Step 4: the scheduler on the device predicts the dynamic change and time of the memory through the memory budget component and the delay budget component, and based on the steps 1-3, the AI task on the device realizes the memory dynamic reasoning,

example 2

The embodiment relates to a multi-outlet neural network design device facing to an edge scene dynamic resource, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the multi-outlet neural network design method facing to the edge scene dynamic resource of the embodiment 1 when executing the executable codes.

Example 3

The present embodiment relates to a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the multi-outlet neural network design method for edge-scene-oriented dynamic resources of embodiment 1.

The invention considers the dynamic change of the internal memory of the edge equipment and realizes the branch update from the server to the equipment. Even under the condition of dynamic change of memory, the weak device can still effectively perform model reasoning, so that efficient and reliable AI task execution is realized in the edge computing environment with limited resources.

The embodiments described in the present specification are merely examples of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, and the scope of protection of the present invention and equivalent technical means that can be conceived by those skilled in the art based on the inventive concept.

Claims

1. A design method of a multi-outlet neural network facing edge scene dynamic resources comprises the following steps:

2. The method for designing a multi-outlet neural network for dynamic resources of an edge scene as claimed in claim 1, wherein the step 1 specifically comprises:

step 1.1: a plurality of convolution layer structures and full-connection layer structures which can be combined to form effective branches are selected in advance, and a branch library is formed according to different convolution layer structures and full-connection layer structures and different outlet points;

step 1.2: training branches of a branch library on a given pre-training neural network to obtain configuration information of the branches, wherein the configuration information comprises weights, parameters, reasoning accuracy and execution completion time;

3. The method for designing a multi-outlet neural network for dynamic resources of an edge scene as claimed in claim 1, wherein the step 2 specifically comprises:

step 2.1: providing branch self-merging, and maximizing saved memory space under the condition of almost not losing precision, wherein the method specifically comprises the following steps:

step 2.1.1: calculating errors among different neurons in the branches, and calculating the errors by adopting L2-Norm;

step 2.1.2: selecting neuron pairs to be merged according to the error; all neurons are grouped according to the error magnitude, sorting is carried out according to the difference magnitude between the neuron pairs, and the front 1/3 neuron pairs with lower errors are combined;

step 2.1.3: calculating the sharing weight of each neuron pair to be fused; directly calculating the average value of the two neuron weights as a shared weight;

step 2.2: in order to avoid the precision loss caused by the self-merging of the branches, the precision of the branches is quickly recovered by adopting the distillation and retraining of less sample knowledge;

step 2.2.1: constructing a proper loss function for retraining; the loss function L consists of three parts: the supervision information MSE from the tag, KL divergence from distillation, and L1-Norm loss of shared weights, y is the tag,is the prediction of the ith branch, y ^c Is the final prediction, W ^s 、/>Alpha and beta are settable adjustment parameters corresponding to the shared weight and the back propagation updated weight respectively;

4. The method for designing a multi-outlet neural network for dynamic resources of an edge scene according to claim 1, wherein α=0.5 and β=1 are set in step 2.2.1.

5. The method for designing a multi-outlet neural network for dynamic resources of an edge scene as set forth in claim 1, wherein the memory budget M and the time budget T of the given device in the step 3 _budget It is proposed to select an optimal branch combination from a branch library by priority-based depth-first search DFS, specifically comprising:

step 3.1: providing the accuracy and the area p=Acc·Time of the area enclosed by the Time as performance indexes, wherein Acc is the accuracy of the branch, and Time is the cut-off Time minus the branch reasoning completion Time; promotion of current branch combination performance per branch unit parameterAs the priority of the branch, the same branch has different priorities under different current branch combinations, and the same branch needs to be calculated according to the current branch combinations;

step 3.2: let the branch transmission rate be R, there are two limitations in calculating DFS: memory budget M and search time T _search Determining branch selection of each layer of DFS according to priority ordering of branchesSelecting an order; at search time T _search Finally find a branch combination B with optimal performance meeting the memory requirement _selected ；

6. The method for designing a multi-outlet neural network for dynamic resources of an edge scene as recited in claim 1, wherein the branching combination B is implemented in step 4 by an SBRAN component _selected Updating the memory to realize memory elastic reasoning.

7. A multi-outlet neural network design device for edge-scene-oriented dynamic resources, comprising a memory and one or more processors, wherein the memory stores executable code, and the one or more processors are configured to implement the multi-outlet neural network design method for edge-scene-oriented dynamic resources of any one of claims 1-6 when the executable code is executed.

8. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements the edge-scene dynamic resource-oriented multi-outlet neural network design method of any one of claims 1-6.