CN114579207A

CN114579207A - Model file layered loading calculation method of convolutional neural network

Info

Publication number: CN114579207A
Application number: CN202210304705.7A
Authority: CN
Inventors: 沈志熙; 徐赞林
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-06-03
Anticipated expiration: 2042-03-22
Also published as: CN114579207B

Abstract

The invention relates to a model file layered loading calculation method of a convolutional neural network, S1 loads a model file into an external storage device such as a hard disk or an SD card of an embedded device, and the size of the model file is set to be m_w(ii) a S2 records the memory size of the embedded device as m_aRecording the maximum operation memory of the detection program as m_bThen the embedded device can allocate the memory size m for storing the model file_cIs m_c＝(m_a‑m_b) X α; s3 if m_w<＝m_cDirectly loading the model file into the memory of the embedded device at one time, and turning to S5; s4 if m_w>m_cModel of lawFiles need to be loaded in multiple times; s5 performs a forward calculation process of the algorithm model. The method utilizes the thought of step-by-step loading to analyze and calculate the model files with different sizes by combining with the memory space of the specific embedded equipment, thereby ensuring that the step-by-step loading of the model files is realized by the least memory access times, considering the requirement of real-time performance and breaking through the limitation of smaller storage space of the embedded equipment.

Description

Model file layered loading calculation method of convolutional neural network

Technical Field

The invention relates to the field of computer vision, in particular to a model file layered loading calculation method of a convolutional neural network.

Background

With the continuous development of deep learning in the field of target recognition and detection, networks such as VGG, GoogleNet, ResNet and the like have been developed to a deeper network layer number from AlexNet so as to seek better detection accuracy. Related researchers extract deep level features of a detection target by methods of increasing the number of convolution layers, increasing the number of convolution kernels and the like; although the deep network model is superior in many problems, the deep network model is limited in time and space in practical application, the large and deep network model has huge computation amount, even with the help of a graphic processor, the deep network model is difficult to embed and develop on equipment with limited computing resources and storage resources, and the deep network model is difficult to meet many scene requirements in daily life in time; and the high-performance computer has higher production and maintenance cost and is not suitable for popularization and promotion in large quantity. Therefore, in many current applications, especially in application deployment of mobile terminals and embedded systems, such as automatic driving, fatigue detection, robots, and the like, which are limited by integration equipment and processing speed, neural network model files with different sizes from tens of megabytes to hundreds of megabytes cannot be loaded and calculated, model compression research has been carried out, lightweight deep neural networks are continuously proposed, however, compression of models at one step can cause loss of detection accuracy, and simple compression is undoubtedly not preferable.

For the deep neural network, a large number of parameters of the model are concentrated in the convolutional layer, and how to avoid the conflict between the large number of model parameters of the deep neural network and the limited computing and storing capacity of the embedded device makes the deep neural network break through the application limit of the deep neural network model to a certain extent is the first problem of the deep neural network in the development and application of the embedded terminal.

Disclosure of Invention

Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the technical problem that the deep neural network is difficult to embed and develop on equipment with limited computing resources and storage resources is solved.

In order to solve the technical problems, the invention adopts the following technical scheme: a model file layered loading calculation method of a convolutional neural network comprises the following steps:

s1, loading the model file into an external storage device such as a hard disk or an SD card of the embedded device, and assuming that the size of the model file is m_w；

S2, recording the memory size of the embedded device as m_aRecording the maximum operation memory of the detection program as m_bThen the embedded device can allocate the memory size m for storing the model file_cIs m_c＝(m_a-m_b) X α, α is a margin factor, and is set to 0.9;

s3, if m_w＜＝m_cDirectly loading the model file into a memory of the embedded device at one time, and turning to S5;

s4, if m_w＞m_cThen the model file needs to be loaded in several times;

and S5, performing a forward calculation process of the convolutional neural network.

As an improvement, the step of fractional loading in S4 is specifically as follows:

the number of times of planned loading of the model file is n,

the model file size for the first n-1 planned single loads is then

Then

The model file size of the nth scheduled load is

In order to ensure that the model file loaded each time is all parameters of N layers, the size of the loaded model file is recorded as m_o，m_oThe initial value is 0; marking L as the L-th convolution layer of the model, wherein the initial value of L is 1;

let n be the nth time (n is the initial value)Is 1) the size of the actually loaded model file is m_t，m_tThe initial value is the memory size occupied by the Lth convolutional layer parameter; memory m_LThe memory size occupied by the Lth convolution layer parameter; the process of actually loading the model file for the nth time is as follows:

if

And L is<N_oIf L is equal to L +1, update m_t＝m_t+m_LRepeating the first step, wherein N_oRepresenting the number of model convolution layers;

② if

Then, if L is L-1, m is updated_t＝m_t-m_L；

③ if

Then [ m ] of the model file is loaded_o,m_o+m_t]Part, update m_o＝m_o+m_t；

Fourthly, if m_o<m_wUpdating L to be L +1, and turning to the first step, namely starting the process of actually loading the model file for the (n + 1) th time;

wu if m_o＝m_wAnd if so, finishing the loading of the whole model file.

As an improvement, the forward calculation process of the convolutional neural network performed in S5 includes the following specific steps: if the last step is S4, starting the forward calculation process of the whole convolutional neural network layer by layer according to the model files loaded in the memory, and after the model files in the memory participate in calculation, turning to S4 to carry out next loading; if the last step is S3, the forward calculation process is performed directly until the end.

Compared with the prior art, the invention has at least the following advantages:

1. the invention uses the thought of step-by-step loading to analyze and calculate the model files with different sizes by combining the memory space of the specific embedded device, thereby ensuring that the step-by-step loading of the model files is realized by the least memory access times, considering the requirement of real-time performance and breaking through the limitation of smaller memory space of the embedded device.

2. The method is characterized in that a forward calculation process of the deep convolutional neural network is carried out by applying a layered calculation idea, and is different from the existing end-to-end neural network operation process with good real-time performance.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

Detailed Description

The present invention will be described in further detail below.

The invention provides a model file layered loading calculation method of a convolutional neural network, which is characterized in that a model file is dynamically loaded into a memory of an embedded device according to different storage capacities and different algorithm models of different devices, and is different from the traditional one-time loading method, so that a large and deep convolutional neural network can be embedded and developed on a mobile terminal with limited calculation and storage capacities, the technical problem that the deep neural network is difficult to be embedded and developed on devices with limited calculation resources and storage resources is solved, the problem that the deep convolutional neural network is difficult to fall into an actual application scene is solved to a certain extent, and the method has a good application prospect. The idea of the invention is to dynamically load and compute according to the idea of layering (i.e. each layer is considered as a whole).

The invention provides a model file layered loading calculation method of a convolutional neural network, which dynamically adjusts the process of loading a model file into an internal memory of embedded equipment according to a specific algorithm model and adopts a layered calculation method to carry out the operation process of the algorithm model. Loading the trained model file into the memory of the embedded device by using a dynamic loading method, wherein the convolution kernel parameters of each convolution layer are stored in the model file, the convolution kernels are generally an s × s matrix, the parameter number of a single convolution layer is the product of s × s and the convolution kernel number of the layer, the model file stores the parameters layer by layer one by one according to a one-dimensional array format, and the dynamic loading method comprises the following specific steps:

the model file is a text file used for storing convolution kernel parameters of each convolution layer in the convolutional neural network, wherein the convolution kernel parameters are stored in a floating point number format.

The training process of the convolutional neural network is a process of continuously updating convolutional kernel parameters, the final purpose is to optimize the performance of the network, which is to visually reflect that the loss value is converged, and the convolutional kernel parameters at the moment are stored to obtain a trained model file. The convolutional neural network is trained by the existing method.

Referring to fig. 1, a model file hierarchical loading calculation method of a convolutional neural network includes the following steps:

s1, loading the model file into an external storage device such as a hard disk or an SD card of the embedded device, and assuming that the size of the model file is m_w。

S2, recording the memory size of the embedded device as m_aRecording the maximum operation memory of the detection program as m_bThen the embedded device can allocate the memory size m for storing the model file_cIs m_c＝(m_a-m_b) And x α, α is a margin factor and is set to 0.9.

S3, if m_w＜＝m_cThen, the model file is directly loaded into the memory of the embedded device at one time, and the process goes to S5.

S4, if m_w＞m_cThen the model file needs to be loaded in multiple times.

Specifically, the step of sub-loading in S4 is as follows:

according to the forward propagation principle of the convolutional neural network, the input image data passes through the backhaul bo layer by layerAnd (5) carrying out convolution operation on the convolution layer in ne until the convolution layer reaches the detection head, and outputting a result after processing. Therefore, it is required to ensure that each loaded model file is a complete parameter of N layers, where N is between 1 and the number of model convolution layers N_oIn the meantime.

The number of times of planned loading of the model file is n,

the model file size for the first n-1 planned single loads is then

Then

The model file size of the nth scheduled load is

Because the forward calculation process of the convolutional neural network is carried out layer by layer, in order to ensure that the model file loaded each time is all parameters of N layers, N is any integer between 1 and the maximum number of convolutional layers, and the size of the loaded model file is recorded as m_o，m_oThe initial value is 0; marking L as the L-th convolution layer of the model, wherein the initial value of L is 1;

let the size of the model file actually loaded n times (n is 1 as the initial value) be m_t，m_tThe initial value is the memory size occupied by the Lth convolutional layer parameter; memory m_LThe memory size occupied by the Lth convolution layer parameter; the process of actually loading the model file for the nth time is as follows:

if

② if

Then, if L is L-1, m is updated_t＝m_t-m_L；

③ if

Then [ m ] of the model file is loaded_o,m_o+m_t]Part, update m_o＝m_o+m_t；

wu if m_o＝m_wAnd if so, finishing the loading of the whole model file.

Specifically, the forward calculation process of the convolutional neural network performed in S5 includes the following steps: if the last step is S4, starting the forward calculation process of the whole convolutional neural network layer by layer according to the model files loaded in the memory, and after the model files in the memory participate in calculation, turning to S4 to carry out next loading; if the last step is S3, the forward calculation process is performed directly until the end.

The invention provides a model file layered loading calculation method of a convolutional neural network, which can apply a deep convolutional neural network to embedded equipment with extremely limited memory resources on the premise of not considering real-time performance, and breaks through the limitation of the traditional neural network based on a high-performance computer; if the real-time performance is high, the algorithm model can be lightened firstly, and then the method provided by the invention is combined, so that the application in the embedded terminal is realized.

Memory: and the memory requirement is used for measuring the memory occupation condition of each convolution layer of the convolution neural network.

Parameters: i.e. the number of parameters, is used to scale a convolutional neural network.

In the following, the convolutional neural network VGG16 is taken as an example to analyze the original memory requirement and parameter, and compare the memory requirement and parameter when the method of the present invention is used. Since the last three layers of VGG16 are fully connected, we only list the first 13 convolutional layers here. Table 1 shows the number of parameters and memory requirements of each convolutional layer of VGG 16.

TABLE 1

As can be seen from table 1, if a conventional convolutional neural network loading method is adopted, that is, the entire model file needs to be loaded into the memory at one time, 56MB of memory size needs to be occupied, which makes migration and development of most embedded devices with small memory very difficult. By adopting the dynamic loading and hierarchical calculation method provided by the invention, the minimum memory requirement is only 12MB, which is reduced by 78% compared with the prior art, and the effect is very obvious.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. A model file layered loading calculation method of a convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:

S2, recording the memory size of the embedded device as m_aRecording the maximum operation memory of the detection program as m_bThen the embedded device can allocate the memory size m for storing the model file_cIs m_c＝(m_a-m_b) X α, α being a margin factor, typically between 0.7 and 0.9 depending on the computational overhead setting of the device resources;

s4, if m_w＞m_cThen the model file needs to be loaded in several times;

2. The model file hierarchical loading computation method of a convolutional neural network as claimed in claim 1, characterized in that: the step of fractional loading in S4 is specifically as follows:

the number of times of planned loading of the model file is n,

the model file size for the first n-1 planned single loads is

Then the

The model file size of the nth scheduled load is

let the size of the model file actually loaded n times (n is 1 as the initial value) be m_t，m_tThe initial value is the memory size occupied by the Lth convolutional layer parameter; memory m_LThe memory size occupied by the Lth convolution layer parameter; n th time practiceThe process of loading the model file is as follows:

if

② if

Then L is equal to L-1, update m_t＝m_t-m_L；

③ if

Then [ m ] of the model file is loaded_o,m_o+m_t]Part, update m_o＝m_o+m_t；

wu if m_o＝m_wAnd if so, finishing the loading of the whole model file.

3. The model file hierarchical loading calculation method of a convolutional neural network as claimed in claim 1 or 2, characterized in that: the forward calculation process of the convolutional neural network performed by the S5 specifically includes the following steps: if the last step is S4, starting the forward calculation process of the whole convolutional neural network layer by layer according to the model files loaded in the memory, and after the model files in the memory participate in calculation, turning to S4 to carry out next loading; if the last step is S3, the forward calculation process is performed directly until the end.