CN117197576A - Image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution - Google Patents

Image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution Download PDF

Info

Publication number
CN117197576A
CN117197576A CN202311208995.6A CN202311208995A CN117197576A CN 117197576 A CN117197576 A CN 117197576A CN 202311208995 A CN202311208995 A CN 202311208995A CN 117197576 A CN117197576 A CN 117197576A
Authority
CN
China
Prior art keywords
mcu
convolution
block
model
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311208995.6A
Other languages
Chinese (zh)
Inventor
陈彦明
武钢
张以文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202311208995.6A priority Critical patent/CN117197576A/en
Publication of CN117197576A publication Critical patent/CN117197576A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides an image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution, and relates to the field of machine learning. The image classification method specifically comprises the following steps: the method comprises the steps of 1, obtaining MCU-BLOCK-A, improving depth separable convolution of A lightweight neural network MobileNet to obtain MCU-BLOCK-A, adding A BN layer and an efficient channel attention mechanism between the MCU-BLOCK-A and the MCU-BLOCK-A by using one depth convolution and point-by-point convolution, finally adding one layer of depth convolution, carrying out residual connection on input and output of the last layer of depth convolution, and 2, obtaining MCU-BLOCK-B, and 3, constructing A model by combining A nonlinear pooling layer. The model provided by the invention has the advantages of low parameter quantity, less peak memory occupation, capability of meeting the resource requirements of most MCUs and better classification performance. The machine learning model is operated on the MCU, so that uploading of data to the cloud can be avoided, data privacy is greatly protected, real-time processing and response are accelerated, and energy consumption is greatly reduced.

Description

Image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution
Technical Field
The invention relates to the field of machine learning, in particular to an image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution.
Background
In recent years, as Machine Learning (ML) continues to advance, new opportunities are brought for machine learning to be applied to internet of things (IoT) nodes with limited resources. At present, the machine learning algorithm is widely applied to industries such as intelligent home, precise agriculture, consumer electronics and the like. Although running a machine learning model (e.g., image classification, etc.) on a Microcontroller (MCU) can avoid uploading data to the cloud, accelerate real-time processing and response, greatly protect data privacy, and greatly reduce energy consumption, deploying intelligent algorithms on MCUs still faces a number of problems.
1) The model size, the FLASH memory (FLASH) of the MCU is used for storing model parameters, the space range is mostly 0-2 MB, and the general lightweight neural network model size is more than 10MB, for example, the MobileNet v2 size is 13.6MB, and the parameter efficiency is too low.
2) The peak memory size, the Static Random Access Memory (SRAM) of the MCU is used to store temporary intermediate data of the neural network operation, including input and output active matrices, and the memory size is typically 0-512 KB. The peak memory of MobileNet v2 and efficientNet-B0 reaches 2.29MB, which is not suitable for most MCUs in the prior art.
3) In the existing method, the method for compressing the model by utilizing pruning, quantization and other technologies only focuses on reducing the model parameters and the calculated amount, but does not solve the memory bottleneck, and in addition, a method for searching (NAS) by utilizing a neural network architecture needs to spend a great deal of hardware resources, and the manually designed network MobileNet v2-0.35 is not balanced with the EtinyNet in terms of the contradiction among the peak memory, the calculated amount and the accuracy.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides an image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme: an image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution specifically comprises the following steps:
step 1 acquisition of MCU-BLOCK-A
The method comprises the steps of improving the depth separable convolution of A lightweight neural network MobileNet to obtain MCU-BLOCK-A, utilizing A depth convolution (DWConv) and A point-by-point convolution (PWConv), adding A BN layer and A high-efficiency channel attention mechanism (Efficient Channel Attention, ECA for short) between the two, adding A layer of depth convolution at last, and carrying out residual connection on input and output of the last layer of depth convolution;
step 2 acquisition of MCU-BLOCK-B
Based on the MCU-BLOCK-A obtained in the step 1, removing residual connection between the input and the final layer of depth convolution output on the basis of the MCU-BLOCK-A, adding residual connection between the input and the first layer of point-by-point convolution output, and carrying out residual connection between the connected output and the final layer of depth convolution output;
step 3, obtaining a nonlinear pooling layer
Based on a nonlinear pooling module, introducing nonlinear pooling after a first convolution layer, rapidly downsampling the picture size, and bypassing a middle large activation layer to complete image aggregation calculation attenuation;
step 4, model construction
The modeling is carried out by combining convolution, nonlinear pooling and MCU-BLOCK-A, MCU-BLOCK-B module, and the peak memory, the model size, the calculated amount and the accuracy are weighed, which comprises the following steps:
1) The first stage, extracting local features by convolution with a stride of 2;
2) In the second stage, feature information is extracted along the row direction and the column direction in the picture by utilizing nonlinear pooling, then the picture size is reduced rapidly, and the peak memory of the CNN model is ensured not to be higher than the static random access memory of the MCU;
3) The third stage, extracting features through A plurality of MCU-BLOCK-A modules;
4) A fourth stage, adopting a constructed MCU-BLOCK-B module;
5) Fifthly, performing dimension reduction by using global pooling, and finally obtaining a classification result through a full connection layer; step 5, model training and deployment
Training and testing the model by adopting an ImageNet data set and a visual wake-up word data set, and deploying the model trained on the VWW data set on a singlechip to test the performance of the model.
Preferably, the ReLU activation functions used in step 1, in step 2, between the MCU-BLOCK-A, the point-by-point convolution and the depth convolution, and after the last depth convolution are all described.
Preferably, the efficient channel attention mechanism in step 1 specifically includes:
firstly, carrying out global pooling on an input original feature map, and then calculating the weight of each channel by using a learnable 1D convolution operation, wherein the formula is as follows:
w i =σ(C1D k (y))
wherein C1D k For a fast 1D convolution, k represents how many neighboring channels participate in the channel's attention prediction process, σ is the Sigmoid activation function, and then multiplied element-by-element with the original feature map.
Preferably, the specific operation of the nonlinear pooling module in the step 3 includes extracting a receptive field picture (r×c×k) in a specified format from the input picture, where r is a receptive field row size, c is a receptive field column size, and k is a receptive field channel number. Then, extracting features on the steps by using a fast gate cycle neural network (FastGRNN 1) to obtain r pieces of length h 1 Characteristic block h of (1) 1 Hide layer size for FastGRNN1, then length h at r 1 Is subjected to bidirectional FastGRNN2 on the characteristic blocks to obtain two blocks with the length of h 2 Is a feature block of (1); similar to extracting features on row level, c lengths h are obtained on column level 1 Is subjected to bidirectional FastGRNN2 operation to obtain two characteristic blocks with the length of h 2 Is a feature block of (1); finally, four lengths are h 2 And (3) splicing the characteristic blocks to obtain the characteristic vector after nonlinear pooling operation is carried out on the single receptive field.
Preferably, the step of deploying the CNN neural network based on the MCU specifically comprises the following steps:
based on the step four, model construction is carried out, an image classification dataset is loaded, training is carried out, a model weight file with highest accuracy is stored, the model weight file is converted into an open neural network exchange format, 8bit asymmetric quantization is carried out, and a quantization formula is as follows:
val fp32 =scale*val quantized
and analyzing the model by using the STMCube.AI toolkit, generating corresponding C language basic codes, and developing upper-layer application to realize the operation of an image classification algorithm on the MCU.
Preferably, the device for image classification calculation is deployed based on MCU, and the device comprises MCU microprocessor and camera, wherein:
STM32F746G-DISCO is used as MCU, flash memory in MCU is used for storing neural network model weight, basic code frame, OS, SRAM in MCU is used for storing CNN network in-process middle activation value and other buffer file, arducam is used as the camera acquisition image.
(III) beneficial effects
The invention provides an image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution. The beneficial effects are as follows:
1. the invention provides an image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution. The method is limited by the memory and the computing power of the MCU, the conventional attention mechanism SE has the advantages of multiple parameters and large computing capacity, and is not suitable for deployment on the MCU, and by adding the ECA, compared with the conventional attention, the method does not need to introduce additional parameters, so that the model can pay more attention to important characteristic channels, and the parameter quantity and the computing cost of a full-connection layer of the common attention mechanism are reduced, thereby improving the computing efficiency. The invention achieves better classification performance using less than 1M parameters.
2. The invention provides an image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution, which has smaller peak memory, uses convolution with 8 steps of output channels being 2 in the first stage, and then downsamples pictures to 1/4 of the original picture by utilizing nonlinear pooling, so that compared with other models, the image classification method has smaller peak memory, further reduces the calculated amount, is suitable for a plurality of MCUs, and obtains better classification performance. The machine learning model is operated on the MCU, so that uploading of data to the cloud can be avoided, data privacy is greatly protected, real-time processing and response are accelerated, and energy consumption is greatly reduced. The invention can be widely applied to industries such as intelligent home, precise agriculture, consumer electronics and the like.
Drawings
FIG. 1 is a block diagram of a depth separable convolution of the present invention;
FIG. 2 is A BLOCK diagram of an improved module MCU-BLOCK-A of the present invention;
FIG. 3 is a BLOCK diagram of an improved module MCU-BLOCK-B of the present invention;
FIG. 4 is a graph of intermediate activation value changes for a network of the present invention;
FIG. 5 is a graph showing accuracy and peak memory comparison with other models in the visual wake-up word experiment of the present invention;
FIG. 6 is a graph showing accuracy and parameter amounts of the wake-up word test and other models;
FIG. 7 is a flow chart of a model training deployment of the present invention;
FIG. 8 is a block diagram of a model deployment hardware device of the present invention;
fig. 9 is a flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples:
as shown in fig. 1-9, an embodiment of the present invention provides an image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution, which specifically includes the following steps:
step one, obtaining MCU-BLOCK-A
And improving the depth separable convolution of the lightweight neural network MobileNet to obtain the MCU-BLOCK-A. Firstly, DWConv and PWConv are utilized, a BN layer and an ECA layer are added between the DWConv and the PWConv, cross-channel interaction is realized under the condition that the parameter quantity and the calculated quantity are hardly increased, and the model classification accuracy is increased. And finally, adding a layer of DWConv, and carrying out residual connection on the input and the output of the last layer of DWConv, so as to relieve gradient disappearance and improve the mobility of information.
As shown in fig. one, the structure of the mobilenes depth separable convolution is shown. The second diagram is the MCU-BLOCK-A improved by us, we first use A DWConv to extract information channel by channel, then use ECA to learn the importance of each channel, here first use global pooling, then apply A1D convolution operation that can be learned, calculate the weight of each channel, the formulA is as follows:
w i =σ(C1D k (y))
where C1D is a fast 1D convolution, k represents how many neighboring channels participate in the attention prediction process of the channel, the size of k is determined dynamically by the number of channels, and σ is a Sigmoid activation function. The formula is as follows:
c is the number of channels, |t| odd Represents the nearest odd number to t, γ is 2, and b is 1. Then multiplying the original feature map element by element, and then utilizing PWConv to interact and combine the features between channels so as to obtain richer feature representation. The last layer we used DWConv again because of the comparisonThe DWConv parameters are fewer in calculation amount and lower in calculation amount, so that the method is more suitable for being used in a scene with limited resources. We therefore increase the proportion of DWConv in the model. And meanwhile, residual connection is introduced into the input and the output of the last layer, so that a deeper network can be constructed, and the representation capability and performance of the model can be improved.
Step two, obtaining MCU-BLOCK-B
As shown in figure three. Based on the MCU-BLOCK-A obtained in the step 1, residual connection between the input and the output of the last layer of depth convolution is removed on the basis of the MCU-BLOCK-A, residual connection between the input and the output of the first layer of point-by-point convolution is added, and residual connection is carried out on the connected output and the output of the last layer of depth convolution. As shown in figure four. Because of the larger picture size in the early stages of the CNN network, if too many residual connections are used, the peak memory of the model will exceed the SRAM of the MCU. And the model later picture size is low so that more residual connections can be used to increase the equivalent width of the model without exceeding the SRAM size limit of the MCU. Therefore, the equivalent width of the method is increased by introducing a plurality of residual connection under the conditions of smaller network width and parameters, and the accuracy is ensured.
ReLU activation functions are adopted between the first PWConv and the last DWConv in the MCU-BLOCK-A, MCU-BLOCK-B and after the last DWConv, because the ReLU activation functions facilitate quantization and effectively increase the non-linearity capability of the network under the condition of low parameter quantity and calculation quantity. Whereas the first DWConv did not employ a ReLU activation function since ReLU would block the flow of information in the low dimensional data, thereby weakening the capacity and expressive power of the model.
Step three, obtaining a nonlinear pooling layer
After the first convolution layer, nonlinear pooling is introduced, the picture size is rapidly downsampled, and image aggregation calculation attenuation is completed by bypassing the middle large activation layer. The nonlinear pooling is introduced to enable the model to perform finer aggregation on a large receptive field, more picture information is extracted compared with the common pooling, meanwhile, the occupied memory and calculation amount in the operation process are reduced, and too much precision is not lost.
As shown in fig. four, a receptive field picture (r×c×k) of a specified format is extracted from an input picture, where r is the receptive field row size, c is the receptive field column size, and k is the receptive field channel number. Then, extracting features on the steps by using a fast gate cycle neural network (FastGRNN 1) to obtain r pieces of length h 1 Characteristic block h of (1) 1 Hide layer size for FastGRNN1, then length h at r 1 Is subjected to bidirectional FastGRNN2 on the characteristic blocks to obtain two blocks with the length of h 2 Is described. Similar to extracting features on row level, c lengths h are obtained on column level 1 Is subjected to bidirectional FastGRNN2 operation to obtain two characteristic blocks with the length of h 2 Is described. Finally, four lengths are h 2 And (3) splicing the characteristic blocks to obtain the characteristic vector after nonlinear pooling operation is carried out on the single receptive field. In the present embodiment h 1 =h 2 After = 8,c =r= 6,k =8, the nonlinear pooling operation is performed on different receptive fields of the pictures, and the finer aggregation extracts more picture information than the common pooling, and simultaneously reduces the memory and the calculation amount occupied in the running process. The fastgnns formula is as follows:
z t =σ(Wp t +Uh t-1 +b z )
p t represents the t-th input feature vector, h t-1 Representing the output of the last step, W, U, b z ,b h All are parameters, z t To update the gate indicates how much information of the last state can be updated into the current state,to candidate hidden layer states, the hidden layer state is generated by fusing the current input and the previous hidden state,generating a possible update state, h t Compared with GRU and LSTM, fastGRNN has less calculation amount, faster training time and can effectively capture the edge and direction of the graph, thus being more suitable for resource-restricted equipment.
List one
Fourth, model construction
And combining conventional convolution, nonlinear pooling and MCU-BLOCK-A, MCU-BLOCK-B modules, global pooling and full-connection layer to construct a model. The network structure is shown in table one. Where Input is the Input picture size and channel number, operator is the specific operation, c is the output channel number, n is the number of repetitions, s is the stride number, and ECA indicates whether ECA modules are used in this operation.
The method specifically comprises the following steps of weighing peak memory, model size, calculated amount and accuracy:
1) The first stage, extracting local features by convolution with a stride of 2;
2) In the second stage, feature information is extracted along the row direction and the column direction in the picture by utilizing nonlinear pooling, then the picture size is reduced rapidly, and the peak memory of the CNN model is ensured not to be higher than the static random access memory of the MCU;
3) The third stage, extracting features through A plurality of MCU-BLOCK-A modules;
4) A fourth stage, adopting a constructed MCU-BLOCK-B module;
5) And fifthly, performing dimension reduction by using global pooling, and finally obtaining a classification result through the full connection layer.
Step 5, model training and deployment
The performance of the model is tested by training and testing the model by using an ImageNet data set and a Visual Wake Words (VWW) data set, and deploying the model trained on the VWW data set on an STM32F746 singlechip. The ImageNet dataset was the most convincing benchmark, with 1,281,167 images in the training set and 50,000 images in the validation set, the images were pre-processed first, and all training sets were adjusted to 224 x 224. Subsequently, the image is randomly flipped horizontally and then normalized using the mean and standard deviation. The model was trained using a random gradient descent (SGD) and momentum (momentum) optimizer with a weight decay of 4X 10-5 and a momentum of 0.9. And (3) adopting a cosine learning rate attenuation strategy, wherein the initial learning rate is 0.1, performing 400 rounds of training, and finally attenuating the learning rate to 0.00001, wherein the batch size is set to 1024, and adopting a PyTorch frame and 4 NVIDIAGeForceA100 GPU for training, wherein experimental results are shown in a table II. Compared with other lightweight models, the model provided by the invention has fewer parameters and peak memory, so that the model can be deployed on MCU with fewer memory resources, and has higher accuracy.
Watch II
EtinyNet-1.0 in the second table is that the output channel of the first layer of the model is reduced to 8, and the same Peak RAM retraining result as the model is kept;
visual wake Words (VWW for short) are the benchmark for evaluating the performance of a miniature vision model on a microcontroller. The task includes determining the presence of a person in the image, the data set being a realistic representation of a generic use case of the microcontroller-based vision application and providing a standardized basis for evaluating the accuracy and effectiveness of the small vision model. The training dataset contained 115,000 images and the validation dataset contained 8,000 images.
Specifically, the example uses the same data enhancement strategy as the experiment on ImageNet data, with an initial learning rate of 0.05, a cosine learning rate decay strategy, and a model trained using SGD and momentum optimizers with a weight decay of 3e-4. Training 300 rounds, batch size 256. The present example tested the effectiveness and efficiency of the model on images with resolutions of 80 x 80, 144 x 144, and 244 x 244 pixels. The present example sets the multiplying factor (multiplier) of the model to 0.75, i.e., the number of all output channels of the model multiplied by 0.75, the first layer output channel becomes 6, the nonlinear pooled output channel becomes 24, and so on. Thus, the standard conditions of visual wake task, namely the requirement of peak memory (not more than 256 KB), MMACs (not more than 60M) and parameter number (not more than 300K) are met, and the model is ensured to be suitable for deployment on MCU equipment with limited resources. Experimental comparison results as shown in fig. five and six, a better classification was achieved with less peak memory and parameter amount (265 k). The actual deployment flow is shown in a seventh diagram, firstly, a model file with highest classification accuracy obtained by training on a data set with resolution of 80×80 is converted into an ONNX format, and then 8bit static asymmetric quantization is utilized, wherein a quantization formula is as follows:
val fp32 =scale*val quantized
wherein val fp32 For the original value of the model weight, val quantized For the values after the quantization,maximum value in the original model weight, +.>Minimum value in original model weight, +.>Is the maximum value after quantization, since the 8bit quantization is here 255,/>is the quantized minimum value of 0.
The model is analyzed by using the STMCube AI toolkit, corresponding C language basic codes are generated, and the quantized model weights are initialized in the form of static arrays. The hardware configuration is shown in fig. eight. The invention adopts STM32F746G-DISCO as a main controller, arducam as a camera to acquire images, wherein STM32F746G-DISCO is used as an MCU, flash memory in the MCU is used for storing neural network model weights, basic code frames and OS, SRAM in the MCU is used for storing intermediate activation values and other buffer files in the CNN network operation process, and Arducam is used as the camera to acquire images. The method and the device realize real-time detection of whether people exist in the video.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. An image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution is characterized by comprising the following steps:
step 1 acquisition of MCU-BLOCK-A
The method comprises the steps of improving the depth separable convolution of A lightweight neural network MobileNet to obtain MCU-BLOCK-A, utilizing one depth convolution and point-by-point convolution, adding A BN layer and an efficient channel attention mechanism between the two, adding one layer of depth convolution at last, and carrying out residual connection on input and output of the last layer of depth convolution;
step 2 acquisition of MCU-BLOCK-B
Based on the MCU-BLOCK-A obtained in the step 1, removing residual connection between the input and the final layer of depth convolution output on the basis of the MCU-BLOCK-A, adding residual connection between the input and the point-by-point convolution output, and carrying out residual connection on the connected output and the final layer of depth convolution output;
step 3, obtaining a nonlinear pooling layer
Based on the nonlinear pooling module, the nonlinear pooling module is applied to an initial convolution layer, the picture size is rapidly downsampled, and the image aggregation calculation attenuation is completed by bypassing a large activation layer in the middle;
step 4, model construction
The modeling is carried out by combining convolution, nonlinear pooling and MCU-BLOCK-A, MCU-BLOCK-B module, and the peak memory, the model size, the calculated amount and the accuracy are weighed, which comprises the following steps:
1) The first stage, extracting local features by convolution with a stride of 2;
2) In the second stage, feature information is extracted along the row direction and the column direction in the picture by utilizing nonlinear pooling, then the picture size is reduced rapidly, and the peak memory of the CNN model is ensured not to be higher than the capacity of the static random access memory of the MCU;
3) The third stage, extracting features through A plurality of MCU-BLOCK-A modules;
4) A fourth stage, adopting a constructed MCU-BLOCK-B module;
5) Fifthly, performing dimension reduction by using global pooling, and finally obtaining a classification result through a full connection layer;
step 5, model training and deployment
Training and testing the model by adopting an ImageNet data set and a visual wake-up word data set, and deploying the model trained on the VWW data set on a singlechip to test the performance of the model.
2. The image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution according to claim 1, wherein: and the MCU-BLOCK-A in the step 1, the MCU-BLOCK-B in the step 2, the non-linear activation function is not used after the first depth convolution, and the ReLU activation function is adopted between the point-by-point convolution and the depth convolution and after the last depth convolution.
3. The image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution according to claim 1, wherein: the efficient channel attention mechanism in the step 1 specifically includes:
firstly, carrying out global pooling on an input original feature map, and then calculating the weight of each channel by using a learnable 1D convolution operation, wherein the formula is as follows:
w i =σ(C1D k (y))
wherein C1D k For a fast 1D convolution, k represents how many neighboring channels participate in the channel's attention prediction process, σ is the Sigmoid activation function, and then multiplied element-by-element with the original feature map.
4. The image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution according to claim 1, wherein: the specific operation of the nonlinear pooling module in the step 3 comprises extracting a receptive field picture (r×c×k) in a specified format from an input picture, wherein r is a receptive field row size, c is a receptive field column size, and k is a receptive field channel number. Then, extracting features on the steps by using a fast gate cycle neural network (FastGRNN 1) to obtain r pieces of length h 1 Characteristic block h of (1) 1 Hide layer size for FastGRNN1, then length h at r 1 Is subjected to bidirectional FastGRNN2 on the characteristic blocks to obtain two blocks with the length of h 2 Is a feature block of (1); similar to extracting features on row level, c lengths h are obtained on column level 1 Is subjected to bidirectional FastGRNN2 operation to obtain two characteristic blocks with the length of h 2 Is a feature block of (1); finally, four lengths are h 2 And (3) splicing the characteristic blocks to obtain the characteristic vector after nonlinear pooling operation is carried out on the single receptive field.
5. The image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution according to claim 1, comprising the steps of deploying CNN neural network based on MCU, specifically comprising the steps of:
based on the step four, model construction is carried out, an image classification dataset is loaded, training is carried out, a model weight file with highest accuracy is stored, the model weight file is converted into an open neural network exchange format, 8bit asymmetric quantization is carried out, and a quantization formula is as follows:
val fp32 =scale*val quantized
and analyzing the model by using the STMCube.AI toolkit, generating corresponding C language basic codes, and developing upper-layer application to realize the operation of an image classification algorithm on the MCU.
6. The method for classifying images suitable for MCU deployment based on nonlinear pooling and depth separable convolution according to claim 1, comprising a device for classifying images based on MCU deployment, the device comprising an MCU microprocessor and a camera, wherein:
STM32F746G-DISCO is used as MCU, flash memory in MCU is used for storing neural network model weight, basic code frame, OS, SRAM in MCU is used for storing CNN network in-process middle activation value and other buffer file, arducam is used as the camera acquisition image.
CN202311208995.6A 2023-09-19 2023-09-19 Image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution Pending CN117197576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311208995.6A CN117197576A (en) 2023-09-19 2023-09-19 Image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311208995.6A CN117197576A (en) 2023-09-19 2023-09-19 Image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution

Publications (1)

Publication Number Publication Date
CN117197576A true CN117197576A (en) 2023-12-08

Family

ID=88988523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311208995.6A Pending CN117197576A (en) 2023-09-19 2023-09-19 Image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution

Country Status (1)

Country Link
CN (1) CN117197576A (en)

Similar Documents

Publication Publication Date Title
US11870947B2 (en) Generating images using neural networks
Kalchbrenner et al. Video pixel networks
WO2021018163A1 (en) Neural network search method and apparatus
CN112639828A (en) Data processing method, method and equipment for training neural network model
CN111914997B (en) Method for training neural network, image processing method and device
CN112541877B (en) Defuzzification method, system, equipment and medium for generating countermeasure network based on condition
US20210019555A1 (en) Generating video frames using neural networks
CN112561027A (en) Neural network architecture searching method, image processing method, device and storage medium
CN112561028A (en) Method for training neural network model, and method and device for data processing
CN114595799A (en) Model training method and device
WO2023024406A1 (en) Data distillation method and apparatus, device, storage medium, computer program, and product
CN111428854A (en) Structure searching method and structure searching device
CN112183742A (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN114463727A (en) Subway driver behavior identification method
CN115393633A (en) Data processing method, electronic device, storage medium, and program product
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN109558819B (en) Depth network lightweight method for remote sensing image target detection
CN117197576A (en) Image classification method suitable for MCU deployment based on nonlinear pooling and depth separable convolution
CN112532251A (en) Data processing method and device
CN115115924A (en) Concrete image crack type rapid intelligent identification method based on IR7-EC network
WO2021179117A1 (en) Method and apparatus for searching number of neural network channels
CN111582444B (en) Matrix data processing method and device, electronic equipment and storage medium
CN112308200B (en) Searching method and device for neural network
CN115439849B (en) Instrument digital identification method and system based on dynamic multi-strategy GAN network
CN115631115B (en) Dynamic image restoration method based on recursion transform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination