CN114742211B - Convolutional neural network deployment and optimization method facing microcontroller - Google Patents

Convolutional neural network deployment and optimization method facing microcontroller Download PDF

Info

Publication number
CN114742211B
CN114742211B CN202210653260.3A CN202210653260A CN114742211B CN 114742211 B CN114742211 B CN 114742211B CN 202210653260 A CN202210653260 A CN 202210653260A CN 114742211 B CN114742211 B CN 114742211B
Authority
CN
China
Prior art keywords
data
layer
convolution
neural network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210653260.3A
Other languages
Chinese (zh)
Other versions
CN114742211A (en
Inventor
孙雁飞
王子牛
亓晋
许斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210653260.3A priority Critical patent/CN114742211B/en
Publication of CN114742211A publication Critical patent/CN114742211A/en
Priority to PCT/CN2022/106634 priority patent/WO2023236319A1/en
Application granted granted Critical
Publication of CN114742211B publication Critical patent/CN114742211B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

A convolutional neural network deployment and optimization method facing a microcontroller comprises three parts, namely design of a convolutional neural network model, optimization of convolutional calculation memory and deployment of a convolutional neural network. The design of the convolutional neural network model is based on neural network architecture search, and the convolutional neural network model which is suitable for a microcontroller and has small calculated amount, parameter amount and memory requirement is searched; respectively optimizing standard convolution, deep convolution and point convolution commonly used in the convolutional neural network, reducing memory occupation in the inference process of the convolutional neural network, and enabling the convolutional neural network to operate on more microcontrollers with limited memory; the method for the convolutional neural network running on the microcontroller from construction to application is provided, and the usability and the practicability of the microcontroller for running the convolutional neural network model are improved.

Description

Convolutional neural network deployment and optimization method facing microcontroller
Technical Field
The invention relates to the field of microcontroller design, in particular to a convolutional neural network deployment and optimization method facing a microcontroller.
Background
The microcontroller usually has only memory space and storage space of dozens to hundreds of KB, the operating frequency is from several MHz to several hundreds of MHz, and the parameter quantity of the mainstream convolution neural network model is different from several M to several hundreds of M, so that the storage space constraint of the microcontroller is difficult to meet. In response to the need for a lightweight convolutional neural network model, some methods for designing a lightweight neural network have been proposed in academia and industry, and although the parameters and the amount of calculation of the model are effectively reduced, the methods are still insufficient for a microcontroller. Taking the lightweight convolutional neural network model mobilonet V3 as an example, the parameter amount is 2.9M, and even after the weight quantization, the parameter amount cannot be stored in the microcontroller, and the large calculation amount makes it difficult to realize real-time detection on the microcontroller. In addition, the academic community mainly focuses on the accuracy, the calculated amount and the parameter of the convolutional neural network, and neglects the memory consumption of the convolutional neural network in the reasoning process, and the memory consumption also determines whether the convolutional neural network can operate on the microcontroller.
At present, a large amount of memory is needed in the calculation process of the convolutional neural network, and the convolutional neural network is difficult to operate on a microcontroller, so that the microcontroller is mainly responsible for acquiring data in the practical application of the convolutional neural network, the reading of a sensor is transmitted to a server, and the convolutional neural network is operated on the server for decision making, and the mode causes certain limitation on the application scene of the convolutional neural network.
In the prior art, "an image processing method and apparatus based on embedded GPU and convolution calculation" (CN 110246078B) discloses a method for reducing memory overhead during operation, which is a patent for reducing memory overhead compared with im2col convolution calculation method. im2col convolution calculations speed of convolution calculations is accelerated by optimizing the data layout using additional memory space, thereby reducing the number of times universal matrix multiplication is invoked. Compared with the common convolution calculation, im2col and the convolution calculation method disclosed in the patent both consume more memory space. "a method for optimizing convolution calculation of visual image" (CN 108564524A) discloses a method for optimizing convolution calculation, which optimizes memory transmission and improves the efficiency of convolution calculation, but does not reduce the memory usage. The patent only provides a method for training a deep learning algorithm, quantizing a model and deploying the model on a microcontroller, the model depends on manual design/selection, and methods such as model design, model compression, memory optimization, calculation acceleration and the like are not carried out on the microcontroller.
Disclosure of Invention
The invention aims to provide a convolutional neural network deployment and optimization method facing a microcontroller, and provides a method based on neural network architecture search aiming at the problems that the microcontroller is low in calculation power, limited in storage space and difficult to operate a mainstream convolutional neural network. In the searching process, the constraint of the accuracy rate, the calculation time and the parameter quantity of the convolutional neural network is considered, so that a convolutional neural network model which is suitable for the microcontroller and has small calculation quantity and parameter quantity is searched; aiming at the problem of limited memory space of a microcontroller, an optimization method for occupying memory by convolution calculation is provided, standard convolution, deep convolution and point convolution which are commonly used in a convolution neural network are respectively optimized, and memory occupation in the inference process of the convolution neural network is reduced by methods such as local calculation and the like; aiming at the application problem of the convolutional neural network on the microcontroller, a method for designing the convolutional neural network running on the microcontroller from construction to application comprises the processes of data acquisition, network design, training, deployment, acceleration and the like.
A convolutional neural network deployment and optimization method facing a microcontroller comprises three parts, namely design of a convolutional neural network model, optimization of convolutional calculation memory and deployment of a convolutional neural network. Wherein the content of the first and second substances,
designing a convolutional neural network model:
an optimal network structure is searched for three indexes of accuracy, calculation time and memory consumption in a set search space by using a neural network architecture search technology, and fig. 1 is a neural network architecture search flow chart.
The search space is a series of optional operations, a super network is formed by modules in the search space, the calculation time consumption and the memory space consumption of the micro-controller end are added to a loss function of the super network, and the accuracy is used as an optimization target together. And after the search is finished, selecting the module with the maximum probability in each layer of the super network as the module reserved in the layer, removing other modules, and forming the searched target network together with the modules reserved in other layers.
And (3) compressing the searched target model, wherein the model compression can use an automatic model compression algorithm based on AutoML, the model searched in the last step is used as a reference model, the agent part uses a depth certainty strategy gradient to receive embedding from the l-th layer, outputs a sparse ratio, compresses the l-th layer according to the sparse ratio, moves to the l + 1-th layer in the environment part to operate, and after the operation on all the layers is completed, estimates the accuracy of the whole network (the estimation process is the same as that of the conventional network, namely test set data is input into the network model, and the correct quantity of the network model prediction is calculated divided by the test lumped quantity). Finally, the reward including the accuracy, the parameter number and the actual calculation time is fed back to the agent part, and the following reward algorithm is designed according to the application scene of the microcontroller:
Figure 324490DEST_PATH_IMAGE001
Figure 70991DEST_PATH_IMAGE002
in the formulaRewardIn order to obtain a reward for the user,Latthe time of calculation of the model is represented,Memthe memory consumption of the model is represented,Errorare coefficients.
And (3) optimizing convolution calculation memory:
the convolution common in the convolutional neural network includes standard convolution, deep convolution and point convolution, and aiming at the three types of common convolution, the invention provides a convolution calculation method with optimized memory, and a memory multiplexing method is adopted to reduce the memory consumption.
Symbol convention:
Figure 850728DEST_PATH_IMAGE003
Figure 245938DEST_PATH_IMAGE004
Figure 857047DEST_PATH_IMAGE005
convolution input layer channel number, width and height;
Figure 679510DEST_PATH_IMAGE006
Figure 946543DEST_PATH_IMAGE007
Figure 145443DEST_PATH_IMAGE008
convolution output layer channel number, width and height;
Figure 142218DEST_PATH_IMAGE009
Figure 135582DEST_PATH_IMAGE010
convolution kernel width, height; h the height of the allocated memory space.
Standard convolution calculation:
a standard convolution calculation flow chart is shown in fig. 2.
The first condition is as follows:
Figure 889911DEST_PATH_IMAGE011
i.e., the convolved output layer size is not larger than the convolved input layer size (in which case the input layer space may store all of the output layer data), the calculation process is shown in fig. 3.
Step 1, allocating memory space m with the size of
Figure 626923DEST_PATH_IMAGE012
(h
Figure 714090DEST_PATH_IMAGE013
)。
And 2, performing convolution input layer part data and convolution kernel operation, and filling the memory space m.
And 3, copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data.
And 4, copying the upper layer data in the memory space m to the lower layer data in the memory space m to cover the original data.
And 2, steps 3 and 4, temporarily storing the output in m, wherein the calculated result cannot be directly stored in an input layer because convolution calculation relates to adjacent rows and columns, and the data in m can be copied to the corresponding position of the input layer data only when the input data at the position in the input layer is not used by the convolution calculation of the adjacent rows and columns subsequently.
And 5, calculating partial data of the convolution input layer and filling upper-middle layer data of the memory space m after convolution kernel operation according to the sequence.
And 6, copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data.
And 7, repeating the steps 4-6 until all data of the convolution input layer are calculated.
And 8, performing reshape operation on the data stored in the input layer after calculation to enable the data to be in accordance with the number, width and height of channels of the output layer.
Memory consumption before optimization:
Figure 143935DEST_PATH_IMAGE014
(output layer space total allocation);
optimizing memory consumption:
Figure 385560DEST_PATH_IMAGE015
(output layer multiplexing input layer space, h
Figure 660684DEST_PATH_IMAGE016
)。
Case two:
Figure 100892DEST_PATH_IMAGE017
that is, the size of the convolution output layer is larger than the size of the convolution input layer (at this time, the input layer space may not store all the output layer data, and an additional memory space M is needed), and the calculation process is shown in fig. 4.
Step 1, allocating memory space m with the size of
Figure 436059DEST_PATH_IMAGE018
(h
Figure 164980DEST_PATH_IMAGE019
). Allocating a memory space M of size
Figure 243795DEST_PATH_IMAGE020
And 2, performing convolution input layer part data and convolution kernel operation, and filling the memory space M.
And 3, calculating the convolution input layer part and the convolution kernel part according to the calculation sequence, and filling the memory space m after the convolution input layer part and the convolution kernel are operated.
And 4, copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data.
And 5, copying the upper layer data in the memory space m to the lower layer data in the memory space m to cover the original data.
And 3, 4, 5, temporarily storing the output in the m, wherein the calculated result cannot be directly stored in the input layer because the convolution calculation relates to adjacent rows and columns, and the data in the m can be copied to the corresponding position of the input layer data only when the input data at the position in the input layer is not used by the convolution calculation of the adjacent rows and columns.
And 6, calculating partial data of the convolution input layer and filling upper-middle layer data of the memory space m after convolution kernel operation according to the sequence.
And 7, copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data.
And 8, repeating the steps 5-7 until all data of the convolution input layer are calculated.
And 9, connecting the calculated data stored in the input layer with the data in the M, and performing reshape operation to make the data meet the number, width and height of channels of the output layer.
Memory consumption before optimization:
Figure 69668DEST_PATH_IMAGE021
(output layer space total allocation);
optimizing memory consumption:
Figure 841315DEST_PATH_IMAGE022
+
Figure 791954DEST_PATH_IMAGE023
(output layer part multiplexes input layer space, h
Figure 674459DEST_PATH_IMAGE024
)。
And (3) depth convolution calculation:
fig. 5 is a flowchart of depth convolution calculation, which includes the following specific steps:
step 1, allocating memory space m with size 1
Figure 26943DEST_PATH_IMAGE025
×
Figure 766229DEST_PATH_IMAGE026
I.e. allocating the memory space occupied by the output single channel.
And 2, performing deep convolution on the input 1 st channel and the 1 st convolution kernel, and outputting and storing in a memory space m.
And 3, storing the result of the deep convolution of the nth (n >1) channel of the input layer and the corresponding nth convolution kernel in the nth-1 channel.
And 4, copying the data stored in the memory space m to the last channel.
And 5, releasing the memory space m.
And 6, performing reshape operation on the data stored in the input layer after calculation to enable the data to be in accordance with the number, width and height of channels of the output layer.
A schematic diagram of the depth convolution calculation is shown in fig. 6.
Memory consumption before optimization:
Figure 204163DEST_PATH_IMAGE027
(output layer space total allocation);
optimizing memory consumption:
Figure 624780DEST_PATH_IMAGE028
(output layer multiplexes input layer space).
And (3) performing point convolution calculation:
the point convolution can be regarded as a standard convolution with a convolution kernel size of 1 × 1, and therefore the calculation method of the standard convolution described in the present invention can be adopted. In addition, aiming at the characteristic that the adjacent position value is not involved in the point convolution calculation process, the invention also provides a calculation method aiming at the point convolution memory optimization, which optimizes the standard convolution memoryIn-allocated m memory spaces are compressed into
Figure 97350DEST_PATH_IMAGE029
×
Figure 273116DEST_PATH_IMAGE030
×
Figure 198347DEST_PATH_IMAGE030
Size, lower memory consumption is achieved, and the point convolution flowchart, as shown in fig. 7, includes the following steps:
the first condition is as follows:
Figure 422655DEST_PATH_IMAGE031
i.e. the number of output channels is not greater than the number of input channels (at this time, the input layer space can store all the output layer data), fig. 8 is a calculation diagram of this case.
Step 1, allocating a memory space m with the size of
Figure 484152DEST_PATH_IMAGE029
×
Figure 768503DEST_PATH_IMAGE030
×
Figure 998672DEST_PATH_IMAGE030
I.e. each output channel is allocated a position size, and the point convolution calculation data is temporarily stored.
Step 2, inputting the position (i, j) of each channel of the input layer (i belongs to [1,
Figure 26671DEST_PATH_IMAGE032
], j∈[1,
Figure 208253DEST_PATH_IMAGE033
]) And performing convolution calculation with the points, and storing the calculation result in a memory space m.
And 3, copying the data in the middle m of the memory to the corresponding channel position (i, j) of the input layer to cover the original data.
And 4, repeating the step 2 and the step 3 until all input data are calculated.
And 5, releasing the memory space m.
And 6, performing reshape operation on the data stored in the input layer after calculation to enable the data to be in accordance with the number, width and height of channels of the output layer.
Memory consumption before optimization:
Figure 397926DEST_PATH_IMAGE034
(output layer space total allocation);
optimizing memory consumption:
Figure 94487DEST_PATH_IMAGE029
×
Figure 660597DEST_PATH_IMAGE030
×
Figure 962266DEST_PATH_IMAGE030
(output layer multiplexes input layer space).
And a second condition:
Figure 322840DEST_PATH_IMAGE035
that is, the number of output channels is greater than the number of input channels (at this time, the input layer space may not store all the output layer data, and an additional memory space M is needed), and fig. 9 is a calculation diagram of this case.
Step 1, allocating a memory space m with the size of
Figure 709959DEST_PATH_IMAGE029
×
Figure 407656DEST_PATH_IMAGE030
×
Figure 298252DEST_PATH_IMAGE030
I.e. each output channel is allocated a position size, and the point convolution calculation data is temporarily stored. Allocating a memory space M of size
Figure 95307DEST_PATH_IMAGE036
Step 2, inputting the position (i, j) of each channel of the input layer (i belongs to [1,
Figure 704143DEST_PATH_IMAGE037
], j∈[1,
Figure 674373DEST_PATH_IMAGE038
]) And (5) performing convolution calculation with the point, and storing a calculation result in a memory space m.
Step 3, the middle m of the memory is centered and centered
Figure 685054DEST_PATH_IMAGE039
And copying the data to the corresponding channel position (i, j) of the input layer to cover the original data. Middle m of memory remaining
Figure 653010DEST_PATH_IMAGE040
And copying the data to the channel position (i, j) corresponding to the memory space M.
And 4, repeating the step 2 and the step 3 until all input data are calculated.
And 5, releasing the memory space m.
And 6, connecting the calculated data stored in the input layer with the data in the M, and performing reshape operation to make the data meet the number, width and height of channels of the output layer.
Memory consumption before optimization:
Figure 749142DEST_PATH_IMAGE041
(output layer space total allocation);
optimizing memory consumption:
Figure 460746DEST_PATH_IMAGE029
×
Figure 889716DEST_PATH_IMAGE030
×
Figure 28573DEST_PATH_IMAGE030
+
Figure 612001DEST_PATH_IMAGE042
(output layer portion multiplexes input layer space).
Deployment of convolutional neural networks:
the convolutional neural network deployment method facing the microcontroller comprises three parts of convolutional neural network model design (namely the design of the convolutional neural network model in the above), convolutional neural network model verification and convolutional neural network model deployment, and is shown in fig. 10.
Aiming at the components, the specific technical scheme is as follows:
1. model design: the method comprises the steps of data set acquisition, data preprocessing, model searching and training and model compression.
(1) Data set acquisition: taking image data as an example, the data set uses image data acquired by a microcontroller, the image data acquired by the microcontroller is stored in a storage unit, such as a memory card or a FLASH, and after the acquisition is completed, the data set is transmitted to a computer and a corresponding label is printed as a training set and a verification set.
(2) Data preprocessing: the data preprocessing comprises image enhancement, and the acquired image data is cut, rotated, color-adjusted and the like for expanding the number of data set samples; adjusting the size to a size suitable for convolutional neural network model training; and (4) normalization, namely processing the mean value and the standard deviation of the acquired image data and accelerating the training process of the convolutional neural network model.
(3) Searching and training a convolutional neural network model: the method comprises the steps of searching a proper network structure in a set search space according to three indexes of accuracy, calculation time and memory consumption by using a neural network architecture search technology, compressing a searched model by using an automatic ML (automatic markup language) automatic model compression algorithm to obtain a target convolutional neural network model, and training by using preprocessed image data on a computer to obtain a trained convolutional neural network model.
2. And (3) model verification: the model verification comprises two steps of computer-side model verification and microcontroller-side model verification.
(1) Computer-side model verification: firstly, verifying whether convolution operators, pooling operators, activation function operators and the like used in a trained model file are supported or not by using a TensorFlow Lite for Micro deep learning inference framework at a computer terminal, and replacing the supported operators if the convolution operators, the pooling operators, the activation function operators and the like are not supported. And secondly, verifying the consistency of the inference result of the TensorFlow Lite for Micro deep learning inference framework and the deep learning framework result of the training deep learning model.
(2) And (3) verifying the microcontroller end model: firstly, verifying the consistency of the results of the deep learning frame of the microcontroller end using the TensorFlow Lite for Micro deep learning inference frame and the training deep learning model.
3. And (3) deploying the model, wherein the model comprises the steps of data acquisition, data preprocessing and convolutional neural network detection.
(1) Data acquisition: for example, a camera is used as data acquisition equipment, a microcontroller controls the camera to acquire data, the acquired image data is sent to a data preprocessing step, and the acquired image data is stored in an external storage unit.
(2) Data preprocessing: and (3) carrying out data preprocessing on image data to be detected, cutting and normalizing the image data, and processing the average value and the standard deviation of the image data.
(3) Detection of a convolutional neural network: and the convolutional neural network detection inputs the preprocessed data into a model reasoning framework to obtain a detection result, and the detection result is delivered to an application part for subsequent processing to execute corresponding actions. The method comprises the following steps: the convolutional neural network detection deployment block diagram is shown in FIG. 11.
Convolutional neural network application layer: the method is used for adopting different detection strategies according to the actual application scene, and can adopt strategies such as a single detection model or a plurality of cascade models to detect the data to be detected.
A model layer: and the convolutional neural network model is used for detecting the data to be detected, and is a model obtained by designing the first part of models.
Model inference framework layer: for parsing and executing model reasoning, the framework adopts TensorFlow Lite for Micro to execute reasoning calculation on a microcontroller.
CMSIS-NN computation layer: the method is used for accelerating the model reasoning speed, hardware acceleration is provided for an upper layer reasoning framework by packaging a Digital Signal Processor (DSP) in an ARM core, and compared with the method for carrying out reasoning operation by using a general CPU, the method for accelerating the model reasoning speed by using the DSP to carry out the reasoning operation can improve the reasoning speed by 5-10 times. In addition, the layer is optional, and can be removed for a microcontroller without a DSP, and a CPU is directly used for reasoning.
ARM Cortex-M layer: the model inference engine is used for executing the actual operation of the model inference and is also responsible for executing the functions of other modules, including the functions of data acquisition, data preprocessing, action execution and the like.
A storage layer: the storage layer comprises an RAM and a FLASH part, the RAM is used for storing temporary data of the middle layer in the model reasoning process, and the FLASH is used for storing weight files of the model. The storage layer is also used to store programs for other modules.
The invention achieves the following beneficial effects:
(1) the invention provides a neural network architecture search-based method for a convolutional neural network operated by a microcontroller, and the search is suitable for a convolutional neural network model with small calculated amount, parameter amount and memory requirement of the microcontroller.
(2) The invention provides a convolution calculation method for optimizing memory occupation. The standard convolution, the deep convolution and the point convolution which are commonly used in the convolutional neural network are respectively optimized, so that the memory occupation in the inference process of the convolutional neural network is reduced, and the convolutional neural network can be operated on more microcontrollers with limited memories.
(3) The invention designs a method for constructing and applying a convolutional neural network running on a microcontroller, and improves the usability and the practicability of the microcontroller for running a convolutional neural network model.
Drawings
Fig. 1 is a flow chart of neural network architecture search in the present invention.
Fig. 2 is a flow chart of a standard convolution calculation in the present invention.
FIG. 3 is a diagram of a standard convolution calculation in the present invention.
FIG. 4 is a diagram of a standard convolution calculation in the present invention.
Fig. 5 is a flowchart of the depth convolution calculation in the present invention.
Fig. 6 is a schematic diagram of the depth convolution calculation in the present invention.
Fig. 7 is a flow chart of point convolution in the present invention.
FIG. 8 is a diagram illustrating a first calculation of point convolution according to the present invention.
Fig. 9 is a schematic diagram of the point convolution calculation in the present invention.
Fig. 10 is a block diagram of a workpiece surface detection method based on the deep learning technique in the present invention.
FIG. 11 is a block diagram of a convolutional neural network detection deployment in the present invention.
FIG. 12 is a block diagram of a neural network architecture search module according to an embodiment of the present invention.
Fig. 13 is a schematic diagram of a neural network architecture search in an embodiment of the present invention.
FIG. 14 is a comparison graph of memory overhead histograms for the convolution algorithm in an embodiment of the invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the drawings in the specification.
A convolutional neural network deployment and optimization method facing a microcontroller comprises three parts, namely design of a convolutional neural network model, optimization of convolutional calculation memory and deployment of a convolutional neural network.
(1) Designing a convolutional neural network model:
1) n modules are defined as candidates for neural network structure search, and each module may be composed of several operators, such as convolution operators, etc., as shown in fig. 12.
2) The number of module layers L contained in the neural network is specified.
3) Defining a super network, wherein the network comprises L layers, each layer comprises n modules, and the output dimensions of the n modules in the same layer are the same.
4) Multiplying the output of each layer of n modules by a corresponding scalar
Figure 861717DEST_PATH_IMAGE043
The post-addition is taken as the output of the layer,
Figure 174886DEST_PATH_IMAGE043
and indicating the scalar corresponding to the jth module of the ith layer.
5) Defining a loss function:
Figure 484645DEST_PATH_IMAGE044
wherein the content of the first and second substances,
Figure 555369DEST_PATH_IMAGE045
in order to train the number of samples in the set,Lossfor the loss function, a cross-entropy loss function is used here,
Figure 608776DEST_PATH_IMAGE046
for the actual value of the tag(s),
Figure 245294DEST_PATH_IMAGE047
based on input for network
Figure 725953DEST_PATH_IMAGE048
And parameters
Figure 18395DEST_PATH_IMAGE049
Figure 875492DEST_PATH_IMAGE050
The predicted value, using a cross entropy loss function as the loss of the predicted and actual values.
Figure 569779DEST_PATH_IMAGE051
Which represents the computation time of the network and,
Figure 283657DEST_PATH_IMAGE052
the constant value is obtained by measurement of a microcontroller operating the network model;
Figure 328973DEST_PATH_IMAGE053
is shown as
Figure 724182DEST_PATH_IMAGE054
Scalar corresponding to jth module of layer;
Figure 272975DEST_PATH_IMAGE055
denotes an exponential function based on a natural constant e, exp: (
Figure 924799DEST_PATH_IMAGE056
) =
Figure 191832DEST_PATH_IMAGE057
Figure 390732DEST_PATH_IMAGE058
The size of the memory occupied by the network is indicated,
Figure 794032DEST_PATH_IMAGE059
wherein
Figure 52975DEST_PATH_IMAGE060
Figure 869621DEST_PATH_IMAGE061
Respectively represent
Figure 606633DEST_PATH_IMAGE062
Layer one
Figure 130018DEST_PATH_IMAGE063
First of a module
Figure 559862DEST_PATH_IMAGE064
Each operator outputs the width, height and number of channels of the feature.
Figure DEST_PATH_IMAGE065
Figure 863805DEST_PATH_IMAGE066
Representing the loss weight of computation time and memory consumption,
Figure 404508DEST_PATH_IMAGE065
Figure 47979DEST_PATH_IMAGE066
the larger the search, the less the computation time and memory consumption of the network.
6) Training super network, learning parameters
Figure 383145DEST_PATH_IMAGE067
And
Figure 908804DEST_PATH_IMAGE068
7) computing
Figure 987619DEST_PATH_IMAGE069
Fetch for each layer of the super network
Figure 485596DEST_PATH_IMAGE070
And reserving the maximum value module to obtain the optimal network model searched. As shown in fig. 13, the dark modules are retained to form the searched network, and the other modules are discarded to reduce the size of the network.
8) The model compression uses an automatic model compression algorithm based on AutoML, the model searched in the last step is used as a reference model, the agent part uses a depth certainty strategy gradient to receive embedding from the l layer, outputs a sparse ratio and compresses the model of the l layer according to the sparse ratio, then the environment part moves to the l +1 layer to operate, and after the operation of all the layers is completed, the accuracy of the whole network is evaluated. Finally, the reward including the accuracy, the parameter number and the actual calculation time is fed back to the agent part, and the following reward algorithm is designed according to the application scene of the microcontroller:
Figure 991664DEST_PATH_IMAGE001
Figure 771663DEST_PATH_IMAGE002
in the formula, Reward is the acquired Reward, Lat represents the model calculation time, Mem represents the memory consumption of the model, and Error is a coefficient.
(2) And (3) optimizing convolution calculation memory:
1) standard convolution calculation:
calculating input layer parameters of a current operator
Figure 654169DEST_PATH_IMAGE071
And output layer parameters
Figure 6653DEST_PATH_IMAGE072
The first condition is as follows:
Figure 949201DEST_PATH_IMAGE073
i.e., the convolutional output layer size is not larger than the convolutional input layer size (in which case the input layer space may store all the output layer data), the calculation process is as shown in fig. 3.
Step 1, allocating memory space m with the size of m
Figure 387135DEST_PATH_IMAGE074
(h
Figure 870069DEST_PATH_IMAGE075
) In the present embodiment
Figure 342639DEST_PATH_IMAGE076
=3Get ith=2
And 2, performing convolution input layer part data and convolution kernel operation and filling the memory space m.
And 3, copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data.
And 4, copying the upper layer data in the memory space m to the lower layer data in the memory space m to cover the original data.
And 5, calculating partial data of the convolution input layer and filling upper-middle layer data of the memory space m after convolution kernel operation according to the sequence.
And 6, copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data.
And 7, repeating the steps 4-6 until all data of the convolution input layer are calculated.
And 8, carrying out reshape operation on the data stored in the input layer after calculation to enable the data to accord with the number, width and height of channels of the output layer.
Case two:
Figure 190509DEST_PATH_IMAGE077
that is, the size of the convolution output layer is larger than the size of the convolution input layer (at this time, the input layer space may not store all the output layer data, and an additional memory space M is needed), and the calculation process is shown in fig. 4.
Step 1, allocating memory space m with the size of
Figure 115740DEST_PATH_IMAGE078
(h
Figure 402365DEST_PATH_IMAGE079
) In the present embodiment
Figure 729441DEST_PATH_IMAGE076
= 3Get ith=2. Allocating a memory space M of size
Figure 13792DEST_PATH_IMAGE080
And 2, performing convolution input layer part data and convolution kernel operation, and filling the memory space M.
And 3, calculating the part of the convolution input layer and the convolution kernel according to the calculation sequence, and filling the memory space m after the convolution input layer and the convolution kernel are operated.
And 4, copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data.
And 5, copying the upper layer data in the memory space m to the lower layer data in the memory space m to cover the original data.
And 6, calculating partial data of the convolution input layer and upper layer data in the memory space m after convolution kernel operation according to the sequence.
And 7, copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data.
And 8, repeating the steps 5-7 until all data of the convolution input layer are calculated.
And 9, connecting the calculated data stored in the input layer with the data in the M, and performing reshape operation to make the data meet the number, width and height of channels of the output layer.
2) And (3) depth convolution calculation:
a schematic diagram of the depth convolution calculation is shown in fig. 6, and the specific steps are as follows:
step 1, allocating memory space m with size 1
Figure 426318DEST_PATH_IMAGE081
×
Figure 251055DEST_PATH_IMAGE082
I.e. allocating the memory space occupied by the output single channel.
And 2, performing deep convolution on the input 1 st channel and the 1 st convolution kernel, and outputting and storing in a memory space m.
And 3, storing the result of the deep convolution of the nth (n >1) channel of the input layer and the corresponding nth convolution kernel in the nth-1 channel.
And 4, copying the data stored in the memory space m to the last channel.
And 5, releasing the memory space m.
And 6, performing reshape operation on the data stored in the input layer after calculation to enable the data to be in accordance with the number, width and height of channels of the output layer.
3) And (3) performing point convolution calculation:
calculating input layer parameters of current operator
Figure 432638DEST_PATH_IMAGE083
And output layer parameters
Figure 887890DEST_PATH_IMAGE084
The first condition is as follows:
Figure 787713DEST_PATH_IMAGE085
i.e. the number of output channels is not greater than the number of input channels (at this time, the input layer space can store all the output layer data), the calculation process is as shown in fig. 8.
Step 1, allocating a memory space m with the size of
Figure 353823DEST_PATH_IMAGE029
×
Figure 953694DEST_PATH_IMAGE030
×
Figure 579847DEST_PATH_IMAGE030
I.e. each output channel is allocated a position size, and the point convolution calculation data is temporarily stored.
Step 2, inputting the position (i, j) of each channel of the input layer (i belongs to [1,
Figure 966966DEST_PATH_IMAGE086
], j∈[1,
Figure 71189DEST_PATH_IMAGE087
]) And (5) performing convolution calculation with the point, and storing a calculation result in a memory space m.
And 3, copying the data in the middle m of the memory to the corresponding channel position (i, j) of the input layer to cover the original data.
And 4, repeating the step 2 and the step 3 until all input data are calculated.
And 5, releasing the memory space m.
And 6, performing reshape operation on the data stored in the input layer after calculation to enable the data to be in accordance with the number, width and height of channels of the output layer.
Case two:
Figure 289680DEST_PATH_IMAGE088
that is, the number of output channels is greater than the number of input channels (at this time, the input layer space may not store all the output layer data, and an additional memory space M is needed), and the calculation process is shown in fig. 9.
Step 1, allocating a memory space m with the size of
Figure 86735DEST_PATH_IMAGE029
×
Figure 695571DEST_PATH_IMAGE030
×
Figure 869063DEST_PATH_IMAGE030
I.e. each output channel is allocated a position size, and the point convolution calculation data is temporarily stored. Allocating a memory space M of size
Figure 879745DEST_PATH_IMAGE089
Step 2, inputting the position (i, j) of each channel of the input layer (i belongs to [1,
Figure 910018DEST_PATH_IMAGE090
], j∈[1,
Figure 6150DEST_PATH_IMAGE087
]) And (5) performing convolution calculation with the point, and storing a calculation result in a memory space m.
Step 3, the middle m of the memory is centered and centered
Figure DEST_PATH_IMAGE091
Copying data to corresponding channel bits of input layerAnd (i, j) is set to cover the original data. Middle m of memory remaining
Figure 452174DEST_PATH_IMAGE092
And copying the data to the channel position (i, j) corresponding to the memory space M.
And 4, repeating the step 2 and the step 3 until all input data are calculated.
And 5, releasing the memory space m.
And 6, connecting the calculated data stored in the input layer with the data in the M, and performing reshape operation to make the data meet the number, width and height of channels of the output layer.
(3) Deployment of convolutional neural networks:
1. and collecting and storing the sample data in a storage unit such as a FLASH or a memory card in the microcontroller through data collection.
2. And importing the acquired data into a computer, and marking label information according to the defect type for a deep learning algorithm.
3. And searching an optimal network model in a search space with constraint on computation time and memory consumption by adopting a neural network structure searching method.
4. A deep learning environment is built on a computer, frames such as TensorFlow, Pythrch, Caffe and the like can be adopted, and the speed of deep neural network training can be increased in a GPU acceleration mode, for example, an NVIDIA display card is adopted and GPU configuration is carried out on the display card.
5. And training the sample data of the surface defects of the workpiece by adopting the deep learning model generated according to the algorithm and the configured deep learning frame at a computer terminal. And adjusting the configurations of the deep learning model structure, the hyper-parameters and the like according to the training result to meet the target requirement.
6. And performing model compression on the trained deep learning model, wherein the model compression can greatly reduce memory occupation and calculation time, and the compressed model can be stored in the formats of tflite, onnx, h5 and the like.
7. Storing the deep learning model file data on the microcontroller.
8. And a TensorFlow Lite for Micro inference framework and a CMSIS-NN neural network hardware acceleration component are deployed on the microcontroller. The TensorFlow Lite for Micro deep learning inference framework and the CMSIS-NN computing layer are combined by compiling an intermediate layer code, the TensorFlow Lite for Micro is responsible for analyzing and executing a deep learning model, the CMSIS-NN computing layer is called to execute computing operation, and the CMSIS-NN computing layer is responsible for calling actual computing in the DSP execution model inference process. For a microcontroller with a kernel that does not include a DSP, the CMSIS-NN may not be used, and the CPU performs the actual calculations in the inference process.
9. And (3) verifying whether the deep learning operator used in the trained model file is supported or not by using a TensorFlow Lite for Micro deep learning inference framework at a computer terminal, and replacing the supported operator if the deep learning operator is not supported. And verifying the consistency of the inference result of the deep learning framework using the TensorFlow Lite for Micro deep learning inference framework on the computer side and the inference result of the deep learning framework training the deep learning model and the inference result of the deep learning framework using the TensorFlow Lite for Micro deep learning inference framework on the microcontroller side.
10. The microcontroller sends the acquired image data to the reasoning framework, the reasoning framework returns a reasoning result after reasoning, and the microcontroller executes corresponding actions according to the reasoning result and actual needs.
Then, the method and several algorithms are compared and tested, and the method comprises the following specific steps:
TABLE 1 test set information
Figure 645258DEST_PATH_IMAGE093
TABLE 2 memory overhead comparison of several convolution algorithms
Figure 784116DEST_PATH_IMAGE094
Table 1 shows experimental test data. Table 2 shows the experimental results, where the memory usage includes the extra memory usage in the convolution calculation process and the memory usage of the output matrix, and does not include the input matrixAnd the memory usage of the convolution kernel,
Figure 367544DEST_PATH_IMAGE095
Figure 617260DEST_PATH_IMAGE096
Figure 166315DEST_PATH_IMAGE097
and
Figure DEST_PATH_IMAGE098
respectively representing im2col + GEMM, MEC, direct convolution and the size of the memory usage amount of the method. Fig. 14 is a histogram comparison of the data of table 2. It can be seen that the method significantly reduces the use overhead of the operation memory.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims (6)

1. A convolutional neural network deployment and optimization method facing to a microcontroller is characterized in that: the method comprises three parts, namely design of a convolutional neural network model, optimization of convolutional calculation memory and deployment of a convolutional neural network;
in the design of the convolutional neural network model, firstly, a network structure with the optimal index is obtained through searching to form a super network, and a target network is obtained by combining the index requirement of a microcontroller; then compressing the target network, and evaluating the accuracy rate of the compressed network and a corresponding design reward function;
in the optimization of convolution calculation memory, memory use of three calculation modes of standard convolution, deep convolution and point convolution is optimized respectively, and memory consumption is reduced based on memory multiplexing; for standard convolution, carrying out classification processing according to the relation between the size of a convolution output layer and the size of a convolution input layer; for point convolution, classifying processing is carried out according to the number of output channels and the number of input channels;
for a standard convolution: when the size of the convolution output layer is not larger than that of the convolution input layer, allocating a memory space m; after the convolution input layer part data and the convolution kernel are operated, filling the memory space m; copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data; copying the upper layer data in the memory space m to the lower layer data in the memory space m to cover the original data; calculating partial data of the convolution input layer and upper layer data in the memory space m after convolution kernel operation according to the sequence; copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data; repeating the above process until all data of the convolution input layer are calculated; performing reshape operation on the calculated data stored in the input layer to enable the calculated data to accord with the number, width and height of channels of the output layer;
when the size of the convolution output layer is larger than that of the convolution input layer, allocating a memory space M and a memory space M; after the convolution input layer part data and the convolution kernel are operated, filling the memory space M; calculating the convolution input layer part and the convolution kernel part according to the calculation sequence, and filling the memory space m after the operation; copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data; copying the upper layer data in the memory space m to the lower layer data in the memory space m to cover the original data; calculating partial data of the convolution input layer and upper-layer data in the memory space m after convolution kernel operation according to the sequence; copying the lower layer data in the memory space m to a proper position of a convolution input layer at the moment, and covering the original input data; repeating the above steps until all data of the convolution input layer are calculated; connecting the calculated data stored in the input layer with the data in the M, and performing reshape operation to make the data meet the number, width and height of channels of the output layer;
for the depth convolution calculation, allocating a memory space m, namely allocating and outputting the memory space occupied by a single channel; performing deep convolution on the input 1 st channel and the 1 st convolution kernel, and outputting and storing in a memory space m; the nth channel of the input layer and the corresponding nth convolution kernel are subjected to deep convolution, and the result is stored in the (n-1) th channel, wherein n is greater than 1; copying data stored in the memory space m to the last channel; releasing the memory space m; performing reshape operation on the calculated data stored in the input layer to enable the calculated data to accord with the number, width and height of channels of the output layer;
for point convolution: when the number of output channels is not more than the number of input channels, allocating a memory space m, allocating a position size to each output channel, and temporarily storing point convolution calculation data; performing convolution calculation on the positions of all channels of an input layer and points, and storing the calculation result in a memory space m; copying data in the middle m of the memory to a position of a corresponding channel of an input layer, and covering original data; repeating the above process until all input data are calculated; releasing the memory space m; performing reshape operation on the calculated data stored in the input layer to enable the calculated data to accord with the number, width and height of channels of the output layer;
when the number of output channels is larger than that of input channels, allocating a memory space M, allocating a position size to each output channel, temporarily storing point convolution calculation data, and allocating a memory space M; performing convolution calculation on the positions of all channels of the input layer and the points, and storing the calculation result in a memory space m; copying the previous data corresponding to the number of the channels of the convolution input layer in the middle M of the memory to the position of the corresponding channel of the input layer, covering the original data, and copying the rest data in the middle M of the memory to the position of the corresponding channel of the memory space M; repeating the steps until all input data are calculated; releasing the memory space m; connecting the calculated data stored in the input layer with the data in the M, and performing reshape operation to make the data meet the number, width and height of channels of the output layer;
in the deployment of the convolutional neural network, based on the design of a convolutional neural network model, the convolutional neural network model verification and the deployment of the convolutional neural network model are further included;
the model verification comprises computer-side model verification and microcontroller-side model verification; the model deployment comprises data acquisition, data preprocessing and convolutional neural network detection.
2. The microcontroller-oriented convolutional neural network deployment and optimization method of claim 1, wherein: searching an optimal network structure in a set search space according to three indexes of accuracy, calculation time and memory consumption by using a neural network architecture search technology, forming a super network by using modules in the search space, adding calculation time consumption and memory space consumption of a micro controller end into a loss function of the super network, and taking the calculation time consumption and the memory space consumption together with the accuracy as an optimization target; and after the search is finished, selecting the module with the maximum probability in each layer of the super network as the module reserved in the layer, removing other modules, and forming the searched target network together with the modules reserved in other layers.
3. The microcontroller-oriented convolutional neural network deployment and optimization method of claim 2, wherein: in model compression, the model searched in the last step is used as a reference model, the agent part uses a depth certainty strategy gradient to receive embedding from the l layer, outputs a sparse ratio and carries out model compression on the l layer according to the sparse ratio, then the agent part moves to the l +1 layer for operation in the environment part, and after the operation on all the layers is finished, the accuracy of the whole network is evaluated; finally, the reward including the accuracy, the parameter number and the actual calculation time is fed back to the agent part, and the following reward algorithm is designed according to the application scene of the microcontroller:
Reward lat =-Error×log(Lat)
Reward mem =-Error×log(Mem)
in the formula, Reward is the acquired Reward, Lat represents the calculation time of the model, Mem represents the memory consumption of the model, and Error is a coefficient.
4. The microcontroller-oriented convolutional neural network deployment and optimization method of claim 1, wherein: the model verification specifically comprises the following steps:
computer-side model verification: firstly, a deep learning inference frame is used at a computer end to verify whether a convolution operator, a pooling operator and an activation function operator used in a trained model file are supported, and if not, the supported operator is replaced; secondly, verifying the consistency of the deep learning reasoning frame reasoning result and the deep learning frame result of the training deep learning model;
and (3) verifying the microcontroller end model: and verifying the result consistency of the deep learning frame of the microcontroller end using the deep learning inference frame and the training deep learning model.
5. The microcontroller-oriented convolutional neural network deployment and optimization method of claim 1, wherein: the model deployment specifically comprises the following sub-steps:
data acquisition: the microcontroller controls external equipment to collect data, the collected data is sent to a data preprocessing step, and the data is stored in an external storage unit;
data preprocessing: the data preprocessing is used for cutting, normalizing, and processing the mean value and the standard deviation of the acquired data;
detecting a convolutional neural network: the convolutional neural network detection inputs the preprocessed data into a model reasoning framework to obtain a detection result; the deployed convolutional neural network comprises an application layer, a model inference framework layer, a CMSIS-NN hardware acceleration layer, an ARM Cortex-M layer and a storage layer.
6. The microcontroller-oriented convolutional neural network deployment and optimization method of claim 1, wherein: in the convolutional neural network,
the convolutional neural network application layer is used for adopting different detection strategies according to actual conditions;
different detection models are replaced in the model layer according to actual needs;
the model reasoning framework layer is used for analyzing and executing model reasoning;
the CMSIS-NN computing layer is used for accelerating the model reasoning speed and provides hardware acceleration for an upper layer reasoning framework by packaging a Digital Signal Processor (DSP) in an ARM core;
the ARM Cortex-M layer is used for executing actual operation of model reasoning and is also responsible for executing functions of other modules, including functions for data acquisition, data preprocessing and action execution;
the storage layer comprises an RAM and a FLASH part, the RAM is used for storing temporary data of the middle layer in the model reasoning process, and the FLASH is used for storing weight files of the model.
CN202210653260.3A 2022-06-10 2022-06-10 Convolutional neural network deployment and optimization method facing microcontroller Active CN114742211B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210653260.3A CN114742211B (en) 2022-06-10 2022-06-10 Convolutional neural network deployment and optimization method facing microcontroller
PCT/CN2022/106634 WO2023236319A1 (en) 2022-06-10 2022-07-20 Convolutional neural network deployment and optimization method for microcontroller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210653260.3A CN114742211B (en) 2022-06-10 2022-06-10 Convolutional neural network deployment and optimization method facing microcontroller

Publications (2)

Publication Number Publication Date
CN114742211A CN114742211A (en) 2022-07-12
CN114742211B true CN114742211B (en) 2022-09-23

Family

ID=82287414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210653260.3A Active CN114742211B (en) 2022-06-10 2022-06-10 Convolutional neural network deployment and optimization method facing microcontroller

Country Status (2)

Country Link
CN (1) CN114742211B (en)
WO (1) WO2023236319A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114742211B (en) * 2022-06-10 2022-09-23 南京邮电大学 Convolutional neural network deployment and optimization method facing microcontroller
CN115630578B (en) * 2022-10-30 2023-04-25 四川通信科研规划设计有限责任公司 Calculation power system prediction layout optimization method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447239A (en) * 2018-09-26 2019-03-08 华南理工大学 A kind of embedded convolutional neural networks accelerated method based on ARM
CN111768458A (en) * 2020-06-28 2020-10-13 苏州鸿鹄骐骥电子科技有限公司 Sparse image processing method based on convolutional neural network
CN112766467A (en) * 2021-04-06 2021-05-07 深圳市一心视觉科技有限公司 Image identification method based on convolution neural network model
CN113011570A (en) * 2021-04-30 2021-06-22 电子科技大学 Adaptive high-precision compression method and system of convolutional neural network model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776668B2 (en) * 2017-12-14 2020-09-15 Robert Bosch Gmbh Effective building block design for deep convolutional neural networks using search
CN114742211B (en) * 2022-06-10 2022-09-23 南京邮电大学 Convolutional neural network deployment and optimization method facing microcontroller

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447239A (en) * 2018-09-26 2019-03-08 华南理工大学 A kind of embedded convolutional neural networks accelerated method based on ARM
CN111768458A (en) * 2020-06-28 2020-10-13 苏州鸿鹄骐骥电子科技有限公司 Sparse image processing method based on convolutional neural network
CN112766467A (en) * 2021-04-06 2021-05-07 深圳市一心视觉科技有限公司 Image identification method based on convolution neural network model
CN113011570A (en) * 2021-04-30 2021-06-22 电子科技大学 Adaptive high-precision compression method and system of convolutional neural network model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度卷积神经网络嵌入式推理框架的设计与实现;张英杰;《中国优秀硕士论文全文数据库》;20210215;正文第19-44页 *

Also Published As

Publication number Publication date
CN114742211A (en) 2022-07-12
WO2023236319A1 (en) 2023-12-14

Similar Documents

Publication Publication Date Title
CN114742211B (en) Convolutional neural network deployment and optimization method facing microcontroller
CN109587713B (en) Network index prediction method and device based on ARIMA model and storage medium
CN108764471B (en) Neural network cross-layer pruning method based on feature redundancy analysis
US20230297846A1 (en) Neural network compression method, apparatus and device, and storage medium
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN113420651B (en) Light weight method, system and target detection method for deep convolutional neural network
CN115222950A (en) Lightweight target detection method for embedded platform
CN113674862A (en) Acute renal function injury onset prediction method based on machine learning
CN117131449A (en) Data management-oriented anomaly identification method and system with propagation learning capability
CN114882497A (en) Method for realizing fruit classification and identification based on deep learning algorithm
CN114974306A (en) Transformer abnormal voiceprint detection and identification method and device based on deep learning
CN113762503A (en) Data processing method, device, equipment and computer readable storage medium
CN112685374B (en) Log classification method and device and electronic equipment
CN115357718B (en) Method, system, device and storage medium for discovering repeated materials of theme integration service
CN117058079A (en) Thyroid imaging image automatic diagnosis method based on improved ResNet model
CN111860601A (en) Method and device for predicting large fungus species
CN116741159A (en) Audio classification and model training method and device, electronic equipment and storage medium
CN116740808A (en) Animal behavior recognition method based on deep learning target detection and image classification
CN113160987B (en) Health state prediction method, apparatus, computer device and storage medium
CN110879934B (en) Text prediction method based on Wide & Deep learning model
CN115374687A (en) Numerical-shape combined intelligent diagnosis method for working conditions of oil well
CN112818164A (en) Music type identification method, device, equipment and storage medium
CN111813975A (en) Image retrieval method and device and electronic equipment
CN116431355B (en) Computing load prediction method and system based on power field super computing platform
CN116959489B (en) Quantization method and device for voice model, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant