CN113743602A - Method for improving model post-processing speed - Google Patents
Method for improving model post-processing speed Download PDFInfo
- Publication number
- CN113743602A CN113743602A CN202010460920.7A CN202010460920A CN113743602A CN 113743602 A CN113743602 A CN 113743602A CN 202010460920 A CN202010460920 A CN 202010460920A CN 113743602 A CN113743602 A CN 113743602A
- Authority
- CN
- China
- Prior art keywords
- confidence
- post
- class
- channels
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012805 post-processing Methods 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000001514 detection method Methods 0.000 claims abstract description 19
- 230000008707 rearrangement Effects 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 19
- 238000005070 sampling Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 10
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method for improving the post-processing speed of a model, which adopts channel rearrangement for the post-processing part of a detection model, improves the hit rate of a CPU CACHE, reduces the time for reading data, completes the calculation of 8 data points by utilizing the SIMD technology, improves the calculation efficiency, reduces the post-processing time of the detection model, and thus improves the overall operation efficiency of the detection model.
Description
Technical Field
The invention relates to the field of acceleration of convolutional neural networks, in particular to a method for improving model post-processing speed.
Background
With the rapid development of computer technology, algorithms based on convolutional neural networks are successfully applied to various identification fields. In recent years, with the rapid development of science and technology, the big data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and achieves remarkable results in key fields of many human intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is a typical DNN structure, can effectively extract hidden layer features of an image and accurately classify the image, and is widely applied to the field of image identification and detection in recent years. In the existing technology, corresponding calculation is performed by sequentially traversing data, so that a final target frame is obtained.
In the prior art, corresponding calculation is performed by sequentially traversing data, the channel distribution of the last layer of convolution of a detection model (yolv 3) based on a preset anchor (anchor) is generally (x, y, w, h, confidence, pred _ class) anchors _ num, when a final result is calculated, the score of the confidence is calculated firstly, and if the score is greater than the preset score, the corresponding coordinate is calculated, so that the efficiency is low during calculation due to discrete distribution of the confidence, and the efficiency of the whole detection model is slowed down.
Further, technical terms commonly used in the prior art are as follows:
convolutional Neural Networks (CNN): is a type of feedforward neural network that contains convolution calculations and has a depth structure.
Post-processing of the detection model: because the current detection model trains the model by the deviation of the real frame position relative to the preset frame, the predicted frame position needs to be solved according to the preset frame when the model is inferred.
The batch size is an important parameter in the convolutional network.
feature map: at each convolutional layer, the data exists in three dimensions. It can be seen that a plurality of two-dimensional pictures are overlapped, each of which is called a feature map. On the input layer, if the picture is a gray picture, only one feature map exists; in the case of color pictures, there are typically 3 feature maps (red, green, and blue). There are several convolution kernels (kernel) between layers, and the convolution of the previous layer and each feature map with each convolution kernel will generate a feature map of the next layer.
SIMD single instruction stream Multiple Data (SIMD) is a technique that uses one controller to control Multiple processors while performing the same operation on each of a set of Data (also called "Data vectors") separately to achieve spatial parallelism.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to: in order to improve the efficiency of post-processing of the detection model, the time for calculating the detection frame is reduced by optimizing the flow of post-processing calculation of the model. The method solves the problem that the post-processing of the detection model in the existing scheme can optimize the calculation process for insufficient utilization of system resources.
Specifically, the invention provides a method for improving the post-processing speed of a model, which comprises the following steps:
s1, the following operations are performed for the post-processing part:
the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size,
h, W is the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is
[ x, y, w, h, confidence, class 1, class 2] has a total of 9 channels, i.e., 9 anchors;
the total number of 9 anchors per anchor has a channel distribution of
[ x, y, w, h, confidence1, class 1, class 2], for a total of 7 × 9 ═ 63 channels;
s2, channel rearrangement is carried out, channels of the confidence of each anchor of each data point are put together in the last layer of the operation of the post-processing part of S1, and the taking number of the innermost cycle is continuous; the channel distribution of each anchor is [ x, y, w, h, confidence1, class 1, class 2], and the confidence of each anchor is taken out and put together to be arranged continuously [ confidence1, confidence2.. confidence9 ];
s3, optimizing by using SIMD, obtaining a rule by counting the output of the last layer of convolution, and directly converting the comparison between the original Sigmoid (confidence) and 0.5 into the comparison between the confidence and 0 according to the characteristics of a Sigmoid function; and because scale is greater than 0, the result is finally converted into confidence + bias to be compared with 0; because the output result of the last layer of convolution is stored by 16 bits, 8 confidence levels can be compared at the same time, if the 8 confidence levels are all smaller than 0, the 8 confidence levels are skipped over directly, and when the 8 confidence levels are not satisfied, the 8 confidence levels are compared in sequence.
In S1, since the loss function of YOLOV3 is used in the loss function part of the detection model, the same operation as that of the YOLOV3 is used in the post-processing part.
In the data distribution output by the last layer in S1, the confidence of each anchor of each data point is discontinuous, and each operation requires reading data from the memory.
In S1, it is assumed that the resolution of the input image of the detection model is 1920x1080, the output of the model is 240x135x63, down-sampling is 3 times, stride is 2 each time, and 240x135x9 data are needed to be compared with 0.5.
In S3, the statistical output of the last layer of convolution is as follows:
a) the number of the confidence less than 0 accounts for 99.75 percent of the total number of the confidence;
b) all confidences of feature points are less than 0 and account for 98.95% of the total feature points.
Step S3 further includes: since there are 9 confidences in total, an additional decision is needed.
Thus, the present application has the advantages that: the post-processing part of the detection model adopts channel rearrangement, so that the hit rate of the CPU CACHE is improved, the time for reading data is reduced, the calculation of 8 data points is completed simultaneously by utilizing the SIMD technology, the calculation efficiency is improved, the post-processing time of the detection model is reduced, and the overall operation efficiency of the detection model is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.
FIG. 1 is a schematic flow diagram of the process of the present invention.
Fig. 2 is an S-curve of a Sigmoid function in the prior art.
Detailed Description
In order that the technical contents and advantages of the present invention can be more clearly understood, the present invention will now be described in further detail with reference to the accompanying drawings.
As shown in FIG. 1, the present invention relates to a method for increasing model post-processing speed, the method comprising the steps of:
s1, the following operations are performed for the post-processing part:
the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size,
h, W is the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is
[ x, y, w, h, confidence, class 1, class 2] has a total of 9 channels, i.e., 9 anchors;
the total number of 9 anchors per anchor has a channel distribution of
[ x, y, w, h, confidence1, class 1, class 2], for a total of 7 × 9 ═ 63 channels;
s2, channel rearrangement is carried out, channels of the confidence of each anchor of each data point are put together in the last layer of the operation of the post-processing part of S1, and the taking number of the innermost cycle is continuous; the channel distribution of each anchor is [ x, y, w, h, confidence1, class 1, class 2], and the confidence of each anchor is taken out and put together to be arranged continuously [ confidence1, confidence2.. confidence9 ];
s3, optimizing by using SIMD, obtaining a rule by counting the output of the last layer of convolution, and directly converting the comparison between the original Sigmoid (confidence) and 0.5 into the comparison between the confidence and 0 according to the characteristics of a Sigmoid function; and because scale is greater than 0, the result is finally converted into confidence + bias to be compared with 0; because the output result of the last layer of convolution is stored by 16 bits, 8 confidence levels can be compared at the same time, if the 8 confidence levels are all smaller than 0, the 8 confidence levels are skipped over directly, and when the 8 confidence levels are not satisfied, the 8 confidence levels are compared in sequence.
The invention can also be construed as follows:
1. the post-treatment part comprises the following specific operations:
since the loss function of YOLOV3 is used in the loss function part of the detection model, the same operation as YOLOV3 is used in the post-processing part, and the following operations are mainly required in the post-processing part:
the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size, H, W are the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is [ x, y, W, H, confidence, class 1, class 2] with 9 channels in total, namely 9 anchors;
2. specific optimization details are as follows:
the consistency of each anchor of each data point is discontinuous by observing the data distribution output by the last layer, so that the cache hit rate of the CPU is greatly reduced by directly operating, and almost every operation needs to read data from the memory. Assuming that the resolution of the input image of the detection model is 1920x1080 and the output of the model is 240x135x63 (down-sampled 3 times, stride 2 each), where roughly 240x135x9 data need to be compared with 0.5, which is the bottleneck of optimization.
The following optimization schemes are proposed for the problems:
the channels of the confidence of each anchor are put together, so that the fetching of the innermost loop is continuous, the cache hit rate of the CPU can be improved, and the operation efficiency is improved.
The following law was found by counting the output of the last layer of convolution (there are on average 55 targets in the input picture):
a) the number of the confidence levels less than 0 accounts for 99.75% (290871/(240 × 135 × 9))
b) All confidences of feature points are less than 0 and account for 98.95% (32059/(240 × 135))
Because the characteristics of the Sigmoid function can be directly converted into comparison between the confidence and 0 (so that exponential operation is avoided) by comparing 0.5, and because the scale is greater than 0, the result can be finally converted into comparison between the confidence + bias and 0, and on the basis of the previous step, the SIMD can be further utilized for optimization, because the output result of the last layer of convolution is stored by 16 bits, 8 confidence can be simultaneously compared, if 8 confidence are less than 0, the skipping is directly performed, and when the result is not satisfied, the 8 confidence are sequentially compared, (because 9 confidence are totally obtained, the judgment is needed additionally).
By using channel reordering and SIMD optimization, the post-processing time is reduced from the previous 78ms to around 20 m.
Furthermore, the Sigmoid function is defined by the following formula:
its derivative to x can be expressed by itself:
the Sigmoid function is shown graphically as an S-curve, as shown in fig. 2. It can be seen that when approaching positive infinity or negative infinity, the function approaches a smooth state, the sigmoid function is often used for the probability of two classes because of the output range (0, 1), and in fact the logistic regression has the following advantages when using this function:
the 1 value range is between 0 and 1
The 2 function has very good symmetry
Thus, the function is insensitive to inputs beyond a certain range.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A method for increasing the post-processing speed of a model, the method comprising the steps of:
s1, the following operations are performed for the post-processing part:
the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size, H, W are the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is [ x, y, W, H, confidence, class 1, class 2] with 9 channels in total, namely 9 anchors; the channel distribution of each anchor of 9 anchors is [ x, y, w, h, confidence1, class 1 and class 2], and the total number of channels is 7 × 9 to 63;
s2, channel rearrangement is carried out, channels of the confidence of each anchor of each data point are put together in the last layer of the operation of the post-processing part of S1, and the taking number of the innermost cycle is continuous; the channel distribution of each anchor is [ x, y, w, h, confidence1, class 1, class 2], and the confidence of each anchor is taken out and put together to be arranged continuously [ confidence1, confidence2.. confidence9 ];
s3, optimizing by using SIMD, obtaining a rule by counting the output of the last layer of convolution, and directly converting the comparison between the original Sigmoid (confidence) and 0.5 into the comparison between the confidence and 0 according to the characteristics of a Sigmoid function; and because scale is greater than 0, the result is finally converted into confidence + bias to be compared with 0; because the output result of the last layer of convolution is stored by 16 bits, 8 confidence levels can be compared at the same time, if the 8 confidence levels are all smaller than 0, the 8 confidence levels are skipped over directly, and when the 8 confidence levels are not satisfied, the 8 confidence levels are compared in sequence.
2. The method of claim 1, wherein the same operation as the YOLOV3 loss function is performed for the post-processing part due to the YOLOV3 loss function used for the loss function part of the test model in S1.
3. The method of claim 1 or 2, wherein the confidence of each anchor of the data distribution output from the last layer in S1 is discontinuous, and each operation requires reading data from the memory.
4. The method of claim 1, wherein in S1, assuming that the resolution of the input image of the detection model is 1920x1080, the output of the model is 240x135x63, and down-sampling is 3 times, with stride of 2, where 240x135x9 data need to be compared with 0.5.
5. The method of claim 4, wherein in step S3, the statistical final layer convolution output is normalized as follows:
a) the number of the confidence less than 0 accounts for 99.75 percent of the total number of the confidence;
b) all confidences of feature points are less than 0 and account for 98.95% of the total feature points.
6. The method for increasing model post-processing speed according to claim 1, wherein the method, S3, further comprises: since there are 9 confidences in total, an additional decision is needed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010460920.7A CN113743602B (en) | 2020-05-27 | 2020-05-27 | Method for improving post-processing speed of model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010460920.7A CN113743602B (en) | 2020-05-27 | 2020-05-27 | Method for improving post-processing speed of model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113743602A true CN113743602A (en) | 2021-12-03 |
CN113743602B CN113743602B (en) | 2024-05-03 |
Family
ID=78723690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010460920.7A Active CN113743602B (en) | 2020-05-27 | 2020-05-27 | Method for improving post-processing speed of model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113743602B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685208A (en) * | 2018-12-24 | 2019-04-26 | 合肥君正科技有限公司 | A kind of method and device accelerated for the dilute combization of neural network processor data |
WO2019079895A1 (en) * | 2017-10-24 | 2019-05-02 | Modiface Inc. | System and method for image processing using deep neural networks |
CN109859190A (en) * | 2019-01-31 | 2019-06-07 | 北京工业大学 | A kind of target area detection method based on deep learning |
CN110060248A (en) * | 2019-04-22 | 2019-07-26 | 哈尔滨工程大学 | Sonar image submarine pipeline detection method based on deep learning |
CN110147252A (en) * | 2019-04-28 | 2019-08-20 | 深兰科技(上海)有限公司 | A kind of parallel calculating method and device of convolutional neural networks |
US20190294929A1 (en) * | 2018-03-20 | 2019-09-26 | The Regents Of The University Of Michigan | Automatic Filter Pruning Technique For Convolutional Neural Networks |
CN110544282A (en) * | 2019-08-30 | 2019-12-06 | 清华大学 | three-dimensional multi-energy spectrum CT reconstruction method and equipment based on neural network and storage medium |
CN110807170A (en) * | 2019-10-21 | 2020-02-18 | 中国人民解放军国防科技大学 | Multi-sample multi-channel convolution neural network Same convolution vectorization implementation method |
CN111160111A (en) * | 2019-12-09 | 2020-05-15 | 电子科技大学 | Human body key point detection method based on deep learning |
-
2020
- 2020-05-27 CN CN202010460920.7A patent/CN113743602B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019079895A1 (en) * | 2017-10-24 | 2019-05-02 | Modiface Inc. | System and method for image processing using deep neural networks |
US20190294929A1 (en) * | 2018-03-20 | 2019-09-26 | The Regents Of The University Of Michigan | Automatic Filter Pruning Technique For Convolutional Neural Networks |
CN109685208A (en) * | 2018-12-24 | 2019-04-26 | 合肥君正科技有限公司 | A kind of method and device accelerated for the dilute combization of neural network processor data |
CN109859190A (en) * | 2019-01-31 | 2019-06-07 | 北京工业大学 | A kind of target area detection method based on deep learning |
CN110060248A (en) * | 2019-04-22 | 2019-07-26 | 哈尔滨工程大学 | Sonar image submarine pipeline detection method based on deep learning |
CN110147252A (en) * | 2019-04-28 | 2019-08-20 | 深兰科技(上海)有限公司 | A kind of parallel calculating method and device of convolutional neural networks |
CN110544282A (en) * | 2019-08-30 | 2019-12-06 | 清华大学 | three-dimensional multi-energy spectrum CT reconstruction method and equipment based on neural network and storage medium |
CN110807170A (en) * | 2019-10-21 | 2020-02-18 | 中国人民解放军国防科技大学 | Multi-sample multi-channel convolution neural network Same convolution vectorization implementation method |
CN111160111A (en) * | 2019-12-09 | 2020-05-15 | 电子科技大学 | Human body key point detection method based on deep learning |
Non-Patent Citations (2)
Title |
---|
QI-CHAO MAO 等: "Mini-YOLOv3: Real-Time Object Detector for Embedded Applications", 《IEEE ACCESS》, vol. 7 * |
徐晗智 等: "一种基于通道重排的轻量级目标检测网络", 《计算机与现代化》, no. 2 * |
Also Published As
Publication number | Publication date |
---|---|
CN113743602B (en) | 2024-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230089380A1 (en) | Neural network construction method and apparatus | |
US20200401812A1 (en) | Method and system for detecting and recognizing target in real-time video, storage medium, and device | |
WO2021218517A1 (en) | Method for acquiring neural network model, and image processing method and apparatus | |
WO2022052601A1 (en) | Neural network model training method, and image processing method and device | |
WO2021218470A1 (en) | Neural network optimization method and device | |
WO2022007867A1 (en) | Method and device for constructing neural network | |
CN110263855B (en) | Method for classifying images by utilizing common-basis capsule projection | |
CN112036475A (en) | Fusion module, multi-scale feature fusion convolutional neural network and image identification method | |
JP7226696B2 (en) | Machine learning method, machine learning system and non-transitory computer readable storage medium | |
US20220270341A1 (en) | Method and device of inputting annotation of object boundary information | |
WO2022156475A1 (en) | Neural network model training method and apparatus, and data processing method and apparatus | |
CN115908908A (en) | Remote sensing image gathering type target identification method and device based on graph attention network | |
WO2023282569A1 (en) | Method and electronic device for generating optimal neural network (nn) model | |
Peng et al. | New network based on D-LinkNet and densenet for high resolution satellite imagery road extraction | |
CN117034100A (en) | Self-adaptive graph classification method, system, equipment and medium based on hierarchical pooling architecture | |
CN113743602B (en) | Method for improving post-processing speed of model | |
CN112487927B (en) | Method and system for realizing indoor scene recognition based on object associated attention | |
CN110211041B (en) | Optimization method of neural network image classifier based on receptive field integration | |
CN113095493A (en) | System and method for reducing memory requirements in a neural network | |
Chen et al. | Research on warehouse object detection algorithm based on fused densenet and ssd | |
CN112836729A (en) | Construction method of image classification model and image classification method | |
Li et al. | Deep reinforcement learning for automatic thumbnail generation | |
CN113313249B (en) | Dynamic integrated training method based on reinforcement learning system | |
Cao et al. | Bypass enhancement rgb stream model for pedestrian action recognition of autonomous vehicles | |
WO2024078308A1 (en) | Image optimization method and apparatus, electronic device, medium, and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |