CN113673684A

CN113673684A - Edge end DNN model loading system and method based on input pruning

Info

Publication number: CN113673684A
Application number: CN202110973801.6A
Authority: CN
Inventors: 连佳欣; 那俊; 张瀚铎; 张斌
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2021-11-19

Abstract

The invention discloses an edge DNN model loading system and method based on input pruning, wherein the system comprises a management module and a compression module, wherein the management module comprises a sample management module, a model training module and a model management module; the compression module comprises a compression mode selection module, a data compression module, a model compression and retraining module and a compression log recording module; by adopting the system, a Bayesian network is constructed and trained based on sample data, relevance analysis is carried out on input and output of a model according to the Bayesian network, and a bidirectional list representing the influence degree of input is constructed according to pareto optimality so as to optimize higher time complexity caused by deleting input attributes one by one; and finally, the similarity between the input data attributes is analyzed by combining a clustering algorithm, and the individualized and intelligent compression loading of the model is completed according to the two compression strategies and the derived compression strategies thereof, so that the overall performance of the edge end and the number of the deployed models are improved.

Description

Edge end DNN model loading system and method based on input pruning

Technical Field

The invention relates to the technical field of network edge terminal intellectualization, in particular to an edge terminal DNN model loading system and method based on input pruning.

Background

With the development of the edge intelligent technology, a large amount of data generated at the edge end of the equipment is not required to be sent to a cloud server for centralized processing, and the data can be processed with high efficiency and low delay through a cloud-edge cooperation mode or an edge-edge cooperation mode, so that the deployment or operation of a deep neural network model at the edge end becomes possible. However, the number of parameters of deep neural network models is extremely large, and some models reach even millions of orders of magnitude. There are several deficiencies in training and testing deep neural network models that contain a large number of parameters:

(1) the constructed model parameters need to be trained to achieve the expected effect, which consumes a large amount of computing resources and needs equipment with strong computing power to improve the training speed and the testing speed.

(2) The persistence of a large number of parameters of the deep neural network also requires the occupation of high-capacity memory or disk resources.

In view of the above disadvantages, training and testing of the neural network at present usually run under a high performance server or cluster, however, the application of the neural network to the edge smart device is limited due to the limited storage and computation resources of the edge smart device itself. Therefore, compressing DNN models efficiently while ensuring computational accuracy to make them suitable for running on edge devices has become a hot issue for edge intelligence field research.

At present, common model compression methods for solving these problems, such as network pruning and the like, mainly develop a small-sized and fast-running neural network by removing values in redundant weight tensors. However, the conventional model compression method needs to manually set the number of deleted parameters, retrain the model, further set the deleted parameters according to the effect of the model, repeat the steps until the expected effect of the model compression is achieved, and finally, the result of the model compression needs to be subjected to a large amount of experimental tests, so that huge workload is brought to developers. And this is not very friendly for some applications since the automation of model compression cannot be achieved by setting the threshold value in advance. In addition, none of the conventional model compression methods can reduce the size of the data, which still puts a significant memory and bandwidth pressure on the model. The model compression of the edge end needs to not only consider the size and capacity of the model, but also reduce the input data quantity as much as possible, and allow the model of the deep neural network to be compressed by processing the data generated by the edge end in a personalized manner.

Therefore, in order to make the model compression technology better applied to the deep neural network model so that the model can be successfully deployed and operated on the edge device with limited resources, a model compression method needs to be further improved, which ensures that not only the size of the compression model can be realized, the data volume of the edge end and the number of the sensors used are reduced, the overall performance of the edge end and the number of the deployment models are improved, but also the cost of data transmission can be further reduced, and the network band block and the resource storage pressure are saved. In addition, currently, no mature model compression system serving the edge end can meet the requirements of personalization and intellectualization.

Disclosure of Invention

In view of the above-mentioned deficiencies of the prior art, the present invention provides an edge-end DNN model loading system and method based on input pruning.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an edge-end DNN model loading system based on input pruning comprises a management module and a compression module;

the management module comprises a sample management module, a model training module and a model management module;

the sample management module is used for collecting data to construct a sample and using the sample as input data of a training model;

the sample is data collected by the edge device, and the data is processed and integrated, and the method specifically comprises the following steps: and performing missing value completion and normalization operation on the collected data, storing the data into a file, and then further storing the description information of the data and the file storage path into a database.

The model training module trains the model according to the constructed sample;

further, the model training module records and stores basic information of the model, including evaluation indexes, model size, model name, model storage path, model input attribute and output attribute number information, and provides reference basis for user compression service.

The model management module is used for providing model management for a user, and the user can conveniently check the trained evaluation indexes;

furthermore, the model management module also provides the functions of downloading and uploading the model.

The compression module comprises a compression mode selection module, a data compression module, a model compression and retraining module and a compression log recording module;

the compression mode selection module selects different compression modes according to the relevance among different data;

the compression mode in the compression mode selection module comprises: the compression mode comprises a compression mode based on an input and output association pruning strategy, a compression mode based on an input and output similarity pruning strategy, a compression mode based on an input and output association and then based on an input and output similarity pruning strategy and a compression mode based on an input and output association pruning strategy.

The data compression module generates correspondingly compressed data and stores the data in a set path, meanwhile, data information is stored persistently, and the compressed data is used as input data of the model compression and retraining module;

the model compression and retraining module executes model compression operation according to the compressed data and a compression model threshold value set by a user, constructs a compressed model and executes retraining operation on the compressed model;

the compressed model threshold includes a model size threshold and a model accuracy threshold.

And after the compression log recording module executes the model compression method, the basic information of the model and the updating time, accuracy and recall rate of the model are stored in the compression log for a user to compare and check.

On the other hand, the invention also provides a method for loading a model by adopting the edge-end DNN model loading system based on input pruning, which comprises the following steps:

step 1: performing missing value completion and normalization processing on the data collected by the edge end, storing the data into a file, and further storing the data stored into the file into a sample information table of a database;

step 2: acquiring data in a sample information table, setting relevant parameters of a training model by a user, training the specified model, and storing basic information of the trained model, including evaluation indexes, model size, model name, model storage path, model input attribute and output attribute number information, in a model information table of a database;

and step 3: according to data stored in the sample information table, a Bayesian network is constructed to analyze the relevance between the input and the output of the model, and the input influence degree of each input attribute on the output result is calculated, so that a compression mode based on the relevance pruning strategy between the input and the output is constructed;

the step 3 comprises the following substeps:

step 3.1: designing a Bayesian network structure according to initial model input data in a sample information table, training the Bayesian network, and setting a model size and a model accuracy threshold;

step 3.2: calculating the influence degree of each input attribute of the model on the output according to the trained Bayesian network;

step 3.3: a bi-directional list representing the degree of influence of the input is constructed using a multi-objective optimization algorithm pareto optimality.

And 4, step 4: analyzing the correlation of the input attributes by using a clustering algorithm according to the input influence degree of each input attribute obtained in the step 3 to construct a compression mode based on an inter-input relation pruning strategy;

the step 4 comprises the following substeps:

step 4.1: performing clustering analysis by using a k-means clustering algorithm according to the input influence degree of each input attribute obtained in the step 3, and clustering and sorting the attributes with smaller input influence degree difference and higher similarity;

step 4.2: and on the basis of the step 4.1, compressing the input data according to the set pruning number and proportion.

And 5: arranging and combining the compression modes constructed in the step 3 and the step 4 to obtain four different compression modes, selecting different compression modes to compress the input data according to the relevance among different data, and compressing the model trained in the step 2 according to the compressed input data by using the Kolmogorov theorem;

step 6: retraining the model compressed in the step 5;

and 7: judging whether the model retraining result obtained in the step 6 reaches the threshold of the model size and the model accuracy set by the user, if so, determining that the compression is successful, storing the compressed model and data, and recording a compression log; otherwise, the compression is determined to be failed, and the failure reason and the related parameters are returned to provide parameter basis for the user to re-compress;

and 8: and loading the successfully compressed model to the edge terminal.

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in:

1. in order to enable the deep neural network model to be successfully deployed and efficiently run on the edge device, the invention needs to generate a personalized and intelligent model loading system which can meet the requirements of users and reduce the model compression processing time. The system is mainly applied to the edge end, so that a user can select different model compression modes according to specific scenes and the characteristics of model training data, the workload of model compression developers is reduced while the user is facilitated, a model loading tool which is convenient to operate, high in compression efficiency and obvious in compression effect is finally realized, and the overall performance of the edge end and the number of deployed models are improved.

2. According to the method, relevance analysis is carried out on the input and the output of the model according to the Bayesian network, and the bidirectional list representing the influence degree of the input is constructed according to the pareto optimal, so that higher time complexity caused by deleting the input attributes one by one is optimized.

3. According to the method, relevance analysis is carried out on input and output of a model according to a Bayesian network, and a compression mode based on a relevance pruning strategy between the input and the output is constructed; analyzing the similarity between the input data attributes by adopting a clustering algorithm, and constructing a compression mode based on a relation pruning strategy among the inputs; and the personalized and intelligent compression of the model is completed according to the two compression strategies and the derived compression strategies.

Drawings

Fig. 1 is a schematic structural diagram of an input pruning-based edge-end DNN model loading system provided in an embodiment of the present invention;

fig. 2 is a flowchart of a method for loading a model by using an input pruning based edge DNN model loading system in the embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

As shown in fig. 1, the edge-end DNN model loading system based on input pruning in this embodiment is as follows:

the system comprises a management module and a compression module;

The model training module trains the model according to the constructed sample;

On the other hand, the embodiment also provides a method for loading a model by using the above input pruning based edge DNN model loading system, and a flow of the method is shown in fig. 2, and includes the following steps:

the step 3 comprises the following substeps:

In this embodiment, the input attribute needs to be pruned according to the input influence degree list, and the input attribute with weak relevance to the output result is deleted, which corresponds to the deletion operation of a certain node of the list.

the step 4 comprises the following substeps:

step 6: retraining the model compressed in the step 5;

and 8: and loading the successfully compressed model to the edge terminal.

In this embodiment, the system requires the user to select the model and the corresponding data for performing the compression operation, set the compression mode to be used, fill in the necessary compression parameters, and automatically perform the compression loading operation of the model. The user can correspondingly adjust the compression strategy of the model according to the requirement of the user or the feedback of the compression loading operation.

Claims

1. An edge-end DNN model loading system based on input pruning is characterized in that: the system comprises a management module and a compression module;

the model training module trains the model according to the constructed sample;

2. The input pruning-based edge-end DNN model loading system of claim 1, wherein: the sample is data collected by the edge device, and the data is processed and integrated, and the method specifically comprises the following steps: and performing missing value completion and normalization operation on the collected data, storing the data into a file, and then further storing the description information of the data and the file storage path into a database.

3. The input pruning-based edge-end DNN model loading system of claim 1, wherein: the model training module is also used for recording and storing basic information of the model, including evaluation indexes, model size, model name, model storage path, model input attribute and output attribute number information, and providing reference basis for user compression service.

4. The input pruning-based edge-end DNN model loading system of claim 1, wherein: the model management module also provides the functions of downloading and uploading the model.

5. The input pruning-based edge-end DNN model loading system of claim 1, wherein: the compression mode in the compression mode selection module comprises: the compression mode comprises a compression mode based on an input and output association pruning strategy, a compression mode based on an input and output similarity pruning strategy, a compression mode based on an input and output association and then based on an input and output similarity pruning strategy and a compression mode based on an input and output association pruning strategy.

6. The input pruning-based edge-end DNN model loading system of claim 1, wherein: the compressed model threshold includes a model size threshold and a model accuracy threshold.

7. The method for model loading by using the input pruning based edge-end DNN model loading system of any one of the preceding claims 1 to 6, characterized by comprising the following steps:

step 6: retraining the model compressed in the step 5;

and 8: and loading the successfully compressed model to the edge terminal.

8. The method for model loading using an input pruning based edge-end DNN model loading system according to claim 7, wherein the step 3 comprises the sub-steps of:

9. The method for model loading using an input pruning based edge-end DNN model loading system according to claim 7, wherein the step 4 comprises the sub-steps of: