CN114579207A - Model file layered loading calculation method of convolutional neural network - Google Patents

Model file layered loading calculation method of convolutional neural network Download PDF

Info

Publication number
CN114579207A
CN114579207A CN202210304705.7A CN202210304705A CN114579207A CN 114579207 A CN114579207 A CN 114579207A CN 202210304705 A CN202210304705 A CN 202210304705A CN 114579207 A CN114579207 A CN 114579207A
Authority
CN
China
Prior art keywords
model file
loading
model
memory
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210304705.7A
Other languages
Chinese (zh)
Other versions
CN114579207B (en
Inventor
沈志熙
徐赞林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202210304705.7A priority Critical patent/CN114579207B/en
Publication of CN114579207A publication Critical patent/CN114579207A/en
Application granted granted Critical
Publication of CN114579207B publication Critical patent/CN114579207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a model file layered loading calculation method of a convolutional neural network, S1 loads a model file into an external storage device such as a hard disk or an SD card of an embedded device, and the size of the model file is set to be mw(ii) a S2 records the memory size of the embedded device as maRecording the maximum operation memory of the detection program as mbThen the embedded device can allocate the memory size m for storing the model filecIs mc=(ma‑mb) X α; s3 if mw<=mcDirectly loading the model file into the memory of the embedded device at one time, and turning to S5; s4 if mw>mcModel of lawFiles need to be loaded in multiple times; s5 performs a forward calculation process of the algorithm model. The method utilizes the thought of step-by-step loading to analyze and calculate the model files with different sizes by combining with the memory space of the specific embedded equipment, thereby ensuring that the step-by-step loading of the model files is realized by the least memory access times, considering the requirement of real-time performance and breaking through the limitation of smaller storage space of the embedded equipment.

Description

Model file layered loading calculation method of convolutional neural network
Technical Field
The invention relates to the field of computer vision, in particular to a model file layered loading calculation method of a convolutional neural network.
Background
With the continuous development of deep learning in the field of target recognition and detection, networks such as VGG, GoogleNet, ResNet and the like have been developed to a deeper network layer number from AlexNet so as to seek better detection accuracy. Related researchers extract deep level features of a detection target by methods of increasing the number of convolution layers, increasing the number of convolution kernels and the like; although the deep network model is superior in many problems, the deep network model is limited in time and space in practical application, the large and deep network model has huge computation amount, even with the help of a graphic processor, the deep network model is difficult to embed and develop on equipment with limited computing resources and storage resources, and the deep network model is difficult to meet many scene requirements in daily life in time; and the high-performance computer has higher production and maintenance cost and is not suitable for popularization and promotion in large quantity. Therefore, in many current applications, especially in application deployment of mobile terminals and embedded systems, such as automatic driving, fatigue detection, robots, and the like, which are limited by integration equipment and processing speed, neural network model files with different sizes from tens of megabytes to hundreds of megabytes cannot be loaded and calculated, model compression research has been carried out, lightweight deep neural networks are continuously proposed, however, compression of models at one step can cause loss of detection accuracy, and simple compression is undoubtedly not preferable.
For the deep neural network, a large number of parameters of the model are concentrated in the convolutional layer, and how to avoid the conflict between the large number of model parameters of the deep neural network and the limited computing and storing capacity of the embedded device makes the deep neural network break through the application limit of the deep neural network model to a certain extent is the first problem of the deep neural network in the development and application of the embedded terminal.
Disclosure of Invention
Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the technical problem that the deep neural network is difficult to embed and develop on equipment with limited computing resources and storage resources is solved.
In order to solve the technical problems, the invention adopts the following technical scheme: a model file layered loading calculation method of a convolutional neural network comprises the following steps:
s1, loading the model file into an external storage device such as a hard disk or an SD card of the embedded device, and assuming that the size of the model file is mw
S2, recording the memory size of the embedded device as maRecording the maximum operation memory of the detection program as mbThen the embedded device can allocate the memory size m for storing the model filecIs mc=(ma-mb) X α, α is a margin factor, and is set to 0.9;
s3, if mw<=mcDirectly loading the model file into a memory of the embedded device at one time, and turning to S5;
s4, if mw>mcThen the model file needs to be loaded in several times;
and S5, performing a forward calculation process of the convolutional neural network.
As an improvement, the step of fractional loading in S4 is specifically as follows:
the number of times of planned loading of the model file is n,
Figure BDA0003558820500000021
the model file size for the first n-1 planned single loads is then
Figure BDA0003558820500000022
Then
Figure BDA0003558820500000023
The model file size of the nth scheduled load is
Figure BDA0003558820500000024
In order to ensure that the model file loaded each time is all parameters of N layers, the size of the loaded model file is recorded as mo,moThe initial value is 0; marking L as the L-th convolution layer of the model, wherein the initial value of L is 1;
let n be the nth time (n is the initial value)Is 1) the size of the actually loaded model file is mt,mtThe initial value is the memory size occupied by the Lth convolutional layer parameter; memory mLThe memory size occupied by the Lth convolution layer parameter; the process of actually loading the model file for the nth time is as follows:
if
Figure BDA0003558820500000025
And L is<NoIf L is equal to L +1, update mt=mt+mLRepeating the first step, wherein NoRepresenting the number of model convolution layers;
② if
Figure BDA0003558820500000026
Then, if L is L-1, m is updatedt=mt-mL
③ if
Figure BDA0003558820500000027
Then [ m ] of the model file is loadedo,mo+mt]Part, update mo=mo+mt
Fourthly, if mo<mwUpdating L to be L +1, and turning to the first step, namely starting the process of actually loading the model file for the (n + 1) th time;
wu if mo=mwAnd if so, finishing the loading of the whole model file.
As an improvement, the forward calculation process of the convolutional neural network performed in S5 includes the following specific steps: if the last step is S4, starting the forward calculation process of the whole convolutional neural network layer by layer according to the model files loaded in the memory, and after the model files in the memory participate in calculation, turning to S4 to carry out next loading; if the last step is S3, the forward calculation process is performed directly until the end.
Compared with the prior art, the invention has at least the following advantages:
1. the invention uses the thought of step-by-step loading to analyze and calculate the model files with different sizes by combining the memory space of the specific embedded device, thereby ensuring that the step-by-step loading of the model files is realized by the least memory access times, considering the requirement of real-time performance and breaking through the limitation of smaller memory space of the embedded device.
2. The method is characterized in that a forward calculation process of the deep convolutional neural network is carried out by applying a layered calculation idea, and is different from the existing end-to-end neural network operation process with good real-time performance.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Detailed Description
The present invention will be described in further detail below.
The invention provides a model file layered loading calculation method of a convolutional neural network, which is characterized in that a model file is dynamically loaded into a memory of an embedded device according to different storage capacities and different algorithm models of different devices, and is different from the traditional one-time loading method, so that a large and deep convolutional neural network can be embedded and developed on a mobile terminal with limited calculation and storage capacities, the technical problem that the deep neural network is difficult to be embedded and developed on devices with limited calculation resources and storage resources is solved, the problem that the deep convolutional neural network is difficult to fall into an actual application scene is solved to a certain extent, and the method has a good application prospect. The idea of the invention is to dynamically load and compute according to the idea of layering (i.e. each layer is considered as a whole).
The invention provides a model file layered loading calculation method of a convolutional neural network, which dynamically adjusts the process of loading a model file into an internal memory of embedded equipment according to a specific algorithm model and adopts a layered calculation method to carry out the operation process of the algorithm model. Loading the trained model file into the memory of the embedded device by using a dynamic loading method, wherein the convolution kernel parameters of each convolution layer are stored in the model file, the convolution kernels are generally an s × s matrix, the parameter number of a single convolution layer is the product of s × s and the convolution kernel number of the layer, the model file stores the parameters layer by layer one by one according to a one-dimensional array format, and the dynamic loading method comprises the following specific steps:
the model file is a text file used for storing convolution kernel parameters of each convolution layer in the convolutional neural network, wherein the convolution kernel parameters are stored in a floating point number format.
The training process of the convolutional neural network is a process of continuously updating convolutional kernel parameters, the final purpose is to optimize the performance of the network, which is to visually reflect that the loss value is converged, and the convolutional kernel parameters at the moment are stored to obtain a trained model file. The convolutional neural network is trained by the existing method.
Referring to fig. 1, a model file hierarchical loading calculation method of a convolutional neural network includes the following steps:
s1, loading the model file into an external storage device such as a hard disk or an SD card of the embedded device, and assuming that the size of the model file is mw
S2, recording the memory size of the embedded device as maRecording the maximum operation memory of the detection program as mbThen the embedded device can allocate the memory size m for storing the model filecIs mc=(ma-mb) And x α, α is a margin factor and is set to 0.9.
S3, if mw<=mcThen, the model file is directly loaded into the memory of the embedded device at one time, and the process goes to S5.
S4, if mw>mcThen the model file needs to be loaded in multiple times.
And S5, performing a forward calculation process of the convolutional neural network.
Specifically, the step of sub-loading in S4 is as follows:
according to the forward propagation principle of the convolutional neural network, the input image data passes through the backhaul bo layer by layerAnd (5) carrying out convolution operation on the convolution layer in ne until the convolution layer reaches the detection head, and outputting a result after processing. Therefore, it is required to ensure that each loaded model file is a complete parameter of N layers, where N is between 1 and the number of model convolution layers NoIn the meantime.
The number of times of planned loading of the model file is n,
Figure BDA0003558820500000041
the model file size for the first n-1 planned single loads is then
Figure BDA0003558820500000042
Then
Figure BDA0003558820500000043
The model file size of the nth scheduled load is
Figure BDA0003558820500000044
Because the forward calculation process of the convolutional neural network is carried out layer by layer, in order to ensure that the model file loaded each time is all parameters of N layers, N is any integer between 1 and the maximum number of convolutional layers, and the size of the loaded model file is recorded as mo,moThe initial value is 0; marking L as the L-th convolution layer of the model, wherein the initial value of L is 1;
let the size of the model file actually loaded n times (n is 1 as the initial value) be mt,mtThe initial value is the memory size occupied by the Lth convolutional layer parameter; memory mLThe memory size occupied by the Lth convolution layer parameter; the process of actually loading the model file for the nth time is as follows:
if
Figure BDA0003558820500000051
And L is<NoIf L is equal to L +1, update mt=mt+mLRepeating the first step, wherein NoRepresenting the number of model convolution layers;
② if
Figure BDA0003558820500000052
Then, if L is L-1, m is updatedt=mt-mL
③ if
Figure BDA0003558820500000053
Then [ m ] of the model file is loadedo,mo+mt]Part, update mo=mo+mt
Fourthly, if mo<mwUpdating L to be L +1, and turning to the first step, namely starting the process of actually loading the model file for the (n + 1) th time;
wu if mo=mwAnd if so, finishing the loading of the whole model file.
Specifically, the forward calculation process of the convolutional neural network performed in S5 includes the following steps: if the last step is S4, starting the forward calculation process of the whole convolutional neural network layer by layer according to the model files loaded in the memory, and after the model files in the memory participate in calculation, turning to S4 to carry out next loading; if the last step is S3, the forward calculation process is performed directly until the end.
The invention provides a model file layered loading calculation method of a convolutional neural network, which can apply a deep convolutional neural network to embedded equipment with extremely limited memory resources on the premise of not considering real-time performance, and breaks through the limitation of the traditional neural network based on a high-performance computer; if the real-time performance is high, the algorithm model can be lightened firstly, and then the method provided by the invention is combined, so that the application in the embedded terminal is realized.
Memory: and the memory requirement is used for measuring the memory occupation condition of each convolution layer of the convolution neural network.
Parameters: i.e. the number of parameters, is used to scale a convolutional neural network.
In the following, the convolutional neural network VGG16 is taken as an example to analyze the original memory requirement and parameter, and compare the memory requirement and parameter when the method of the present invention is used. Since the last three layers of VGG16 are fully connected, we only list the first 13 convolutional layers here. Table 1 shows the number of parameters and memory requirements of each convolutional layer of VGG 16.
TABLE 1
Figure BDA0003558820500000054
Figure BDA0003558820500000061
As can be seen from table 1, if a conventional convolutional neural network loading method is adopted, that is, the entire model file needs to be loaded into the memory at one time, 56MB of memory size needs to be occupied, which makes migration and development of most embedded devices with small memory very difficult. By adopting the dynamic loading and hierarchical calculation method provided by the invention, the minimum memory requirement is only 12MB, which is reduced by 78% compared with the prior art, and the effect is very obvious.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (3)

1. A model file layered loading calculation method of a convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:
s1, loading the model file into an external storage device such as a hard disk or an SD card of the embedded device, and assuming that the size of the model file is mw
S2, recording the memory size of the embedded device as maRecording the maximum operation memory of the detection program as mbThen the embedded device can allocate the memory size m for storing the model filecIs mc=(ma-mb) X α, α being a margin factor, typically between 0.7 and 0.9 depending on the computational overhead setting of the device resources;
s3, if mw<=mcDirectly loading the model file into a memory of the embedded device at one time, and turning to S5;
s4, if mw>mcThen the model file needs to be loaded in several times;
and S5, performing a forward calculation process of the convolutional neural network.
2. The model file hierarchical loading computation method of a convolutional neural network as claimed in claim 1, characterized in that: the step of fractional loading in S4 is specifically as follows:
the number of times of planned loading of the model file is n,
Figure FDA0003558820490000011
the model file size for the first n-1 planned single loads is
Figure FDA0003558820490000012
Then the
Figure FDA0003558820490000013
The model file size of the nth scheduled load is
Figure FDA0003558820490000014
In order to ensure that the model file loaded each time is all parameters of N layers, the size of the loaded model file is recorded as mo,moThe initial value is 0; marking L as the L-th convolution layer of the model, wherein the initial value of L is 1;
let the size of the model file actually loaded n times (n is 1 as the initial value) be mt,mtThe initial value is the memory size occupied by the Lth convolutional layer parameter; memory mLThe memory size occupied by the Lth convolution layer parameter; n th time practiceThe process of loading the model file is as follows:
if
Figure FDA0003558820490000015
And L is<NoIf L is equal to L +1, update mt=mt+mLRepeating the first step, wherein NoRepresenting the number of model convolution layers;
② if
Figure FDA0003558820490000016
Then L is equal to L-1, update mt=mt-mL
③ if
Figure FDA0003558820490000017
Then [ m ] of the model file is loadedo,mo+mt]Part, update mo=mo+mt
Fourthly, if mo<mwUpdating L to be L +1, and turning to the first step, namely starting the process of actually loading the model file for the (n + 1) th time;
wu if mo=mwAnd if so, finishing the loading of the whole model file.
3. The model file hierarchical loading calculation method of a convolutional neural network as claimed in claim 1 or 2, characterized in that: the forward calculation process of the convolutional neural network performed by the S5 specifically includes the following steps: if the last step is S4, starting the forward calculation process of the whole convolutional neural network layer by layer according to the model files loaded in the memory, and after the model files in the memory participate in calculation, turning to S4 to carry out next loading; if the last step is S3, the forward calculation process is performed directly until the end.
CN202210304705.7A 2022-03-22 2022-03-22 Model file hierarchical loading calculation method of convolutional neural network Active CN114579207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210304705.7A CN114579207B (en) 2022-03-22 2022-03-22 Model file hierarchical loading calculation method of convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210304705.7A CN114579207B (en) 2022-03-22 2022-03-22 Model file hierarchical loading calculation method of convolutional neural network

Publications (2)

Publication Number Publication Date
CN114579207A true CN114579207A (en) 2022-06-03
CN114579207B CN114579207B (en) 2023-09-01

Family

ID=81776296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210304705.7A Active CN114579207B (en) 2022-03-22 2022-03-22 Model file hierarchical loading calculation method of convolutional neural network

Country Status (1)

Country Link
CN (1) CN114579207B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304802A (en) * 2018-01-30 2018-07-20 华中科技大学 A kind of Quick filter system towards extensive video analysis
US20200050555A1 (en) * 2018-08-10 2020-02-13 Lg Electronics Inc. Optimizing data partitioning and replacement strategy for convolutional neural networks
CN111242180A (en) * 2020-01-03 2020-06-05 南京邮电大学 Image identification method and system based on lightweight convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304802A (en) * 2018-01-30 2018-07-20 华中科技大学 A kind of Quick filter system towards extensive video analysis
US20200050555A1 (en) * 2018-08-10 2020-02-13 Lg Electronics Inc. Optimizing data partitioning and replacement strategy for convolutional neural networks
CN111242180A (en) * 2020-01-03 2020-06-05 南京邮电大学 Image identification method and system based on lightweight convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CAO VU DUNG等: "Autonomous concrete crack detection using deep fully convolutional neural network", 《AUTOMATION IN CONSTRUCTION》, vol. 99, pages 52 - 58, XP085574683, DOI: 10.1016/j.autcon.2018.11.028 *

Also Published As

Publication number Publication date
CN114579207B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
US11907760B2 (en) Systems and methods of memory allocation for neural networks
CN110546628B (en) Minimizing memory reads with directed line buffers to improve neural network environmental performance
KR102434729B1 (en) Processing method and apparatus
TW202026858A (en) Exploiting activation sparsity in deep neural networks
CN108573305B (en) Data processing method, equipment and device
US11468316B2 (en) Cluster compression for compressing weights in neural networks
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN111242180B (en) Image identification method and system based on lightweight convolutional neural network
CN112163601B (en) Image classification method, system, computer device and storage medium
US10699190B1 (en) Systems and methods for efficiently updating neural networks
US11487342B2 (en) Reducing power consumption in a neural network environment using data management
CN113705775A (en) Neural network pruning method, device, equipment and storage medium
CN109145107B (en) Theme extraction method, device, medium and equipment based on convolutional neural network
WO2023236319A1 (en) Convolutional neural network deployment and optimization method for microcontroller
US20210326702A1 (en) Processing device for executing convolutional neural network computation and operation method thereof
CN112200310B (en) Intelligent processor, data processing method and storage medium
CN114579207A (en) Model file layered loading calculation method of convolutional neural network
JP2024516514A (en) Memory mapping of activations for implementing convolutional neural networks
CN112784818A (en) Identification method based on grouping type active learning on optical remote sensing image
CN114781634B (en) Automatic mapping method and device of neural network array based on memristor
CN113642724B (en) CNN accelerator for high bandwidth storage
CN113743448B (en) Model training data acquisition method, model training method and device
US20220318634A1 (en) Method and apparatus for retraining compressed model using variance equalization
JP2023024960A (en) Optimization of memory usage for efficiently executing neural network
TW202215300A (en) Convolutional neural network operation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant