CN113743602A - Method for improving model post-processing speed - Google Patents

Method for improving model post-processing speed Download PDF

Info

Publication number
CN113743602A
CN113743602A CN202010460920.7A CN202010460920A CN113743602A CN 113743602 A CN113743602 A CN 113743602A CN 202010460920 A CN202010460920 A CN 202010460920A CN 113743602 A CN113743602 A CN 113743602A
Authority
CN
China
Prior art keywords
confidence
post
class
channels
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010460920.7A
Other languages
Chinese (zh)
Inventor
张东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Ingenic Technology Co ltd
Original Assignee
Hefei Ingenic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Ingenic Technology Co ltd filed Critical Hefei Ingenic Technology Co ltd
Priority to CN202010460920.7A priority Critical patent/CN113743602A/en
Publication of CN113743602A publication Critical patent/CN113743602A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides a method for improving the post-processing speed of a model, which adopts channel rearrangement for the post-processing part of a detection model, improves the hit rate of a CPU CACHE, reduces the time for reading data, completes the calculation of 8 data points by utilizing the SIMD technology, improves the calculation efficiency, reduces the post-processing time of the detection model, and thus improves the overall operation efficiency of the detection model.

Description

Method for improving model post-processing speed
Technical Field
The invention relates to the field of acceleration of convolutional neural networks, in particular to a method for improving model post-processing speed.
Background
With the rapid development of computer technology, algorithms based on convolutional neural networks are successfully applied to various identification fields. In recent years, with the rapid development of science and technology, the big data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and achieves remarkable results in key fields of many human intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is a typical DNN structure, can effectively extract hidden layer features of an image and accurately classify the image, and is widely applied to the field of image identification and detection in recent years. In the existing technology, corresponding calculation is performed by sequentially traversing data, so that a final target frame is obtained.
In the prior art, corresponding calculation is performed by sequentially traversing data, the channel distribution of the last layer of convolution of a detection model (yolv 3) based on a preset anchor (anchor) is generally (x, y, w, h, confidence, pred _ class) anchors _ num, when a final result is calculated, the score of the confidence is calculated firstly, and if the score is greater than the preset score, the corresponding coordinate is calculated, so that the efficiency is low during calculation due to discrete distribution of the confidence, and the efficiency of the whole detection model is slowed down.
Further, technical terms commonly used in the prior art are as follows:
convolutional Neural Networks (CNN): is a type of feedforward neural network that contains convolution calculations and has a depth structure.
Post-processing of the detection model: because the current detection model trains the model by the deviation of the real frame position relative to the preset frame, the predicted frame position needs to be solved according to the preset frame when the model is inferred.
The batch size is an important parameter in the convolutional network.
feature map: at each convolutional layer, the data exists in three dimensions. It can be seen that a plurality of two-dimensional pictures are overlapped, each of which is called a feature map. On the input layer, if the picture is a gray picture, only one feature map exists; in the case of color pictures, there are typically 3 feature maps (red, green, and blue). There are several convolution kernels (kernel) between layers, and the convolution of the previous layer and each feature map with each convolution kernel will generate a feature map of the next layer.
SIMD single instruction stream Multiple Data (SIMD) is a technique that uses one controller to control Multiple processors while performing the same operation on each of a set of Data (also called "Data vectors") separately to achieve spatial parallelism.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to: in order to improve the efficiency of post-processing of the detection model, the time for calculating the detection frame is reduced by optimizing the flow of post-processing calculation of the model. The method solves the problem that the post-processing of the detection model in the existing scheme can optimize the calculation process for insufficient utilization of system resources.
Specifically, the invention provides a method for improving the post-processing speed of a model, which comprises the following steps:
s1, the following operations are performed for the post-processing part:
the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size,
h, W is the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is
[ x, y, w, h, confidence, class 1, class 2] has a total of 9 channels, i.e., 9 anchors;
the total number of 9 anchors per anchor has a channel distribution of
[ x, y, w, h, confidence1, class 1, class 2], for a total of 7 × 9 ═ 63 channels;
Figure BDA0002510888620000031
s2, channel rearrangement is carried out, channels of the confidence of each anchor of each data point are put together in the last layer of the operation of the post-processing part of S1, and the taking number of the innermost cycle is continuous; the channel distribution of each anchor is [ x, y, w, h, confidence1, class 1, class 2], and the confidence of each anchor is taken out and put together to be arranged continuously [ confidence1, confidence2.. confidence9 ];
Figure BDA0002510888620000032
Figure BDA0002510888620000041
s3, optimizing by using SIMD, obtaining a rule by counting the output of the last layer of convolution, and directly converting the comparison between the original Sigmoid (confidence) and 0.5 into the comparison between the confidence and 0 according to the characteristics of a Sigmoid function; and because scale is greater than 0, the result is finally converted into confidence + bias to be compared with 0; because the output result of the last layer of convolution is stored by 16 bits, 8 confidence levels can be compared at the same time, if the 8 confidence levels are all smaller than 0, the 8 confidence levels are skipped over directly, and when the 8 confidence levels are not satisfied, the 8 confidence levels are compared in sequence.
In S1, since the loss function of YOLOV3 is used in the loss function part of the detection model, the same operation as that of the YOLOV3 is used in the post-processing part.
In the data distribution output by the last layer in S1, the confidence of each anchor of each data point is discontinuous, and each operation requires reading data from the memory.
In S1, it is assumed that the resolution of the input image of the detection model is 1920x1080, the output of the model is 240x135x63, down-sampling is 3 times, stride is 2 each time, and 240x135x9 data are needed to be compared with 0.5.
In S3, the statistical output of the last layer of convolution is as follows:
a) the number of the confidence less than 0 accounts for 99.75 percent of the total number of the confidence;
b) all confidences of feature points are less than 0 and account for 98.95% of the total feature points.
Step S3 further includes: since there are 9 confidences in total, an additional decision is needed.
Thus, the present application has the advantages that: the post-processing part of the detection model adopts channel rearrangement, so that the hit rate of the CPU CACHE is improved, the time for reading data is reduced, the calculation of 8 data points is completed simultaneously by utilizing the SIMD technology, the calculation efficiency is improved, the post-processing time of the detection model is reduced, and the overall operation efficiency of the detection model is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.
FIG. 1 is a schematic flow diagram of the process of the present invention.
Fig. 2 is an S-curve of a Sigmoid function in the prior art.
Detailed Description
In order that the technical contents and advantages of the present invention can be more clearly understood, the present invention will now be described in further detail with reference to the accompanying drawings.
As shown in FIG. 1, the present invention relates to a method for increasing model post-processing speed, the method comprising the steps of:
s1, the following operations are performed for the post-processing part:
the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size,
h, W is the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is
[ x, y, w, h, confidence, class 1, class 2] has a total of 9 channels, i.e., 9 anchors;
the total number of 9 anchors per anchor has a channel distribution of
[ x, y, w, h, confidence1, class 1, class 2], for a total of 7 × 9 ═ 63 channels;
Figure BDA0002510888620000051
Figure BDA0002510888620000061
s2, channel rearrangement is carried out, channels of the confidence of each anchor of each data point are put together in the last layer of the operation of the post-processing part of S1, and the taking number of the innermost cycle is continuous; the channel distribution of each anchor is [ x, y, w, h, confidence1, class 1, class 2], and the confidence of each anchor is taken out and put together to be arranged continuously [ confidence1, confidence2.. confidence9 ];
Figure BDA0002510888620000062
s3, optimizing by using SIMD, obtaining a rule by counting the output of the last layer of convolution, and directly converting the comparison between the original Sigmoid (confidence) and 0.5 into the comparison between the confidence and 0 according to the characteristics of a Sigmoid function; and because scale is greater than 0, the result is finally converted into confidence + bias to be compared with 0; because the output result of the last layer of convolution is stored by 16 bits, 8 confidence levels can be compared at the same time, if the 8 confidence levels are all smaller than 0, the 8 confidence levels are skipped over directly, and when the 8 confidence levels are not satisfied, the 8 confidence levels are compared in sequence.
The invention can also be construed as follows:
1. the post-treatment part comprises the following specific operations:
since the loss function of YOLOV3 is used in the loss function part of the detection model, the same operation as YOLOV3 is used in the post-processing part, and the following operations are mainly required in the post-processing part:
the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size, H, W are the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is [ x, y, W, H, confidence, class 1, class 2] with 9 channels in total, namely 9 anchors;
Figure BDA0002510888620000071
2. specific optimization details are as follows:
the consistency of each anchor of each data point is discontinuous by observing the data distribution output by the last layer, so that the cache hit rate of the CPU is greatly reduced by directly operating, and almost every operation needs to read data from the memory. Assuming that the resolution of the input image of the detection model is 1920x1080 and the output of the model is 240x135x63 (down-sampled 3 times, stride 2 each), where roughly 240x135x9 data need to be compared with 0.5, which is the bottleneck of optimization.
The following optimization schemes are proposed for the problems:
the channels of the confidence of each anchor are put together, so that the fetching of the innermost loop is continuous, the cache hit rate of the CPU can be improved, and the operation efficiency is improved.
The following law was found by counting the output of the last layer of convolution (there are on average 55 targets in the input picture):
a) the number of the confidence levels less than 0 accounts for 99.75% (290871/(240 × 135 × 9))
b) All confidences of feature points are less than 0 and account for 98.95% (32059/(240 × 135))
Because the characteristics of the Sigmoid function can be directly converted into comparison between the confidence and 0 (so that exponential operation is avoided) by comparing 0.5, and because the scale is greater than 0, the result can be finally converted into comparison between the confidence + bias and 0, and on the basis of the previous step, the SIMD can be further utilized for optimization, because the output result of the last layer of convolution is stored by 16 bits, 8 confidence can be simultaneously compared, if 8 confidence are less than 0, the skipping is directly performed, and when the result is not satisfied, the 8 confidence are sequentially compared, (because 9 confidence are totally obtained, the judgment is needed additionally).
By using channel reordering and SIMD optimization, the post-processing time is reduced from the previous 78ms to around 20 m.
Furthermore, the Sigmoid function is defined by the following formula:
Figure BDA0002510888620000081
its derivative to x can be expressed by itself:
Figure BDA0002510888620000082
the Sigmoid function is shown graphically as an S-curve, as shown in fig. 2. It can be seen that when approaching positive infinity or negative infinity, the function approaches a smooth state, the sigmoid function is often used for the probability of two classes because of the output range (0, 1), and in fact the logistic regression has the following advantages when using this function:
the 1 value range is between 0 and 1
The 2 function has very good symmetry
Thus, the function is insensitive to inputs beyond a certain range.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A method for increasing the post-processing speed of a model, the method comprising the steps of:
s1, the following operations are performed for the post-processing part:
the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size, H, W are the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is [ x, y, W, H, confidence, class 1, class 2] with 9 channels in total, namely 9 anchors; the channel distribution of each anchor of 9 anchors is [ x, y, w, h, confidence1, class 1 and class 2], and the total number of channels is 7 × 9 to 63;
Figure FDA0002510888610000011
s2, channel rearrangement is carried out, channels of the confidence of each anchor of each data point are put together in the last layer of the operation of the post-processing part of S1, and the taking number of the innermost cycle is continuous; the channel distribution of each anchor is [ x, y, w, h, confidence1, class 1, class 2], and the confidence of each anchor is taken out and put together to be arranged continuously [ confidence1, confidence2.. confidence9 ];
s3, optimizing by using SIMD, obtaining a rule by counting the output of the last layer of convolution, and directly converting the comparison between the original Sigmoid (confidence) and 0.5 into the comparison between the confidence and 0 according to the characteristics of a Sigmoid function; and because scale is greater than 0, the result is finally converted into confidence + bias to be compared with 0; because the output result of the last layer of convolution is stored by 16 bits, 8 confidence levels can be compared at the same time, if the 8 confidence levels are all smaller than 0, the 8 confidence levels are skipped over directly, and when the 8 confidence levels are not satisfied, the 8 confidence levels are compared in sequence.
2. The method of claim 1, wherein the same operation as the YOLOV3 loss function is performed for the post-processing part due to the YOLOV3 loss function used for the loss function part of the test model in S1.
3. The method of claim 1 or 2, wherein the confidence of each anchor of the data distribution output from the last layer in S1 is discontinuous, and each operation requires reading data from the memory.
4. The method of claim 1, wherein in S1, assuming that the resolution of the input image of the detection model is 1920x1080, the output of the model is 240x135x63, and down-sampling is 3 times, with stride of 2, where 240x135x9 data need to be compared with 0.5.
5. The method of claim 4, wherein in step S3, the statistical final layer convolution output is normalized as follows:
a) the number of the confidence less than 0 accounts for 99.75 percent of the total number of the confidence;
b) all confidences of feature points are less than 0 and account for 98.95% of the total feature points.
6. The method for increasing model post-processing speed according to claim 1, wherein the method, S3, further comprises: since there are 9 confidences in total, an additional decision is needed.
CN202010460920.7A 2020-05-27 2020-05-27 Method for improving model post-processing speed Pending CN113743602A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010460920.7A CN113743602A (en) 2020-05-27 2020-05-27 Method for improving model post-processing speed

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010460920.7A CN113743602A (en) 2020-05-27 2020-05-27 Method for improving model post-processing speed

Publications (1)

Publication Number Publication Date
CN113743602A true CN113743602A (en) 2021-12-03

Family

ID=78723690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010460920.7A Pending CN113743602A (en) 2020-05-27 2020-05-27 Method for improving model post-processing speed

Country Status (1)

Country Link
CN (1) CN113743602A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685208A (en) * 2018-12-24 2019-04-26 合肥君正科技有限公司 A kind of method and device accelerated for the dilute combization of neural network processor data
WO2019079895A1 (en) * 2017-10-24 2019-05-02 Modiface Inc. System and method for image processing using deep neural networks
CN109859190A (en) * 2019-01-31 2019-06-07 北京工业大学 A kind of target area detection method based on deep learning
CN110060248A (en) * 2019-04-22 2019-07-26 哈尔滨工程大学 Sonar image submarine pipeline detection method based on deep learning
CN110147252A (en) * 2019-04-28 2019-08-20 深兰科技(上海)有限公司 A kind of parallel calculating method and device of convolutional neural networks
US20190294929A1 (en) * 2018-03-20 2019-09-26 The Regents Of The University Of Michigan Automatic Filter Pruning Technique For Convolutional Neural Networks
CN110544282A (en) * 2019-08-30 2019-12-06 清华大学 three-dimensional multi-energy spectrum CT reconstruction method and equipment based on neural network and storage medium
CN110807170A (en) * 2019-10-21 2020-02-18 中国人民解放军国防科技大学 Multi-sample multi-channel convolution neural network Same convolution vectorization implementation method
CN111160111A (en) * 2019-12-09 2020-05-15 电子科技大学 Human body key point detection method based on deep learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019079895A1 (en) * 2017-10-24 2019-05-02 Modiface Inc. System and method for image processing using deep neural networks
US20190294929A1 (en) * 2018-03-20 2019-09-26 The Regents Of The University Of Michigan Automatic Filter Pruning Technique For Convolutional Neural Networks
CN109685208A (en) * 2018-12-24 2019-04-26 合肥君正科技有限公司 A kind of method and device accelerated for the dilute combization of neural network processor data
CN109859190A (en) * 2019-01-31 2019-06-07 北京工业大学 A kind of target area detection method based on deep learning
CN110060248A (en) * 2019-04-22 2019-07-26 哈尔滨工程大学 Sonar image submarine pipeline detection method based on deep learning
CN110147252A (en) * 2019-04-28 2019-08-20 深兰科技(上海)有限公司 A kind of parallel calculating method and device of convolutional neural networks
CN110544282A (en) * 2019-08-30 2019-12-06 清华大学 three-dimensional multi-energy spectrum CT reconstruction method and equipment based on neural network and storage medium
CN110807170A (en) * 2019-10-21 2020-02-18 中国人民解放军国防科技大学 Multi-sample multi-channel convolution neural network Same convolution vectorization implementation method
CN111160111A (en) * 2019-12-09 2020-05-15 电子科技大学 Human body key point detection method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QI-CHAO MAO 等: "Mini-YOLOv3: Real-Time Object Detector for Embedded Applications", 《IEEE ACCESS》, vol. 7 *
徐晗智 等: "一种基于通道重排的轻量级目标检测网络", 《计算机与现代化》, no. 2 *

Similar Documents

Publication Publication Date Title
CN109447034B (en) Traffic sign detection method in automatic driving based on YOLOv3 network
US11625921B2 (en) Method and system for detecting and recognizing target in real-time video, storage medium, and device
WO2021057056A1 (en) Neural architecture search method, image processing method and device, and storage medium
WO2022052601A1 (en) Neural network model training method, and image processing method and device
CN111652903B (en) Pedestrian target tracking method based on convolution association network in automatic driving scene
CN111079674A (en) Target detection method based on global and local information fusion
WO2021218517A1 (en) Method for acquiring neural network model, and image processing method and apparatus
WO2021218470A1 (en) Neural network optimization method and device
WO2022007867A1 (en) Method and device for constructing neural network
CN110263855B (en) Method for classifying images by utilizing common-basis capsule projection
CN105701482A (en) Face recognition algorithm configuration based on unbalance tag information fusion
CN112036475A (en) Fusion module, multi-scale feature fusion convolutional neural network and image identification method
WO2023036157A1 (en) Self-supervised spatiotemporal representation learning by exploring video continuity
WO2023282569A1 (en) Method and electronic device for generating optimal neural network (nn) model
US20220270341A1 (en) Method and device of inputting annotation of object boundary information
WO2022156475A1 (en) Neural network model training method and apparatus, and data processing method and apparatus
CN111931572B (en) Target detection method for remote sensing image
CN110942463B (en) Video target segmentation method based on generation countermeasure network
CN117034100A (en) Self-adaptive graph classification method, system, equipment and medium based on hierarchical pooling architecture
CN113743602A (en) Method for improving model post-processing speed
CN111104831B (en) Visual tracking method, device, computer equipment and medium
JP7226696B2 (en) Machine learning method, machine learning system and non-transitory computer readable storage medium
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
US20210216868A1 (en) Systems and methods for reducing memory requirements in neural networks
CN110211041B (en) Optimization method of neural network image classifier based on receptive field integration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination