CN113743602A

CN113743602A - Method for improving model post-processing speed

Info

Publication number: CN113743602A
Application number: CN202010460920.7A
Authority: CN
Inventors: 张东
Original assignee: Hefei Ingenic Technology Co ltd
Current assignee: Hefei Ingenic Technology Co ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2021-12-03
Anticipated expiration: 2040-05-27
Also published as: CN113743602B

Abstract

The invention provides a method for improving the post-processing speed of a model, which adopts channel rearrangement for the post-processing part of a detection model, improves the hit rate of a CPU CACHE, reduces the time for reading data, completes the calculation of 8 data points by utilizing the SIMD technology, improves the calculation efficiency, reduces the post-processing time of the detection model, and thus improves the overall operation efficiency of the detection model.

Description

Method for improving model post-processing speed

Technical Field

The invention relates to the field of acceleration of convolutional neural networks, in particular to a method for improving model post-processing speed.

Background

With the rapid development of computer technology, algorithms based on convolutional neural networks are successfully applied to various identification fields. In recent years, with the rapid development of science and technology, the big data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and achieves remarkable results in key fields of many human intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is a typical DNN structure, can effectively extract hidden layer features of an image and accurately classify the image, and is widely applied to the field of image identification and detection in recent years. In the existing technology, corresponding calculation is performed by sequentially traversing data, so that a final target frame is obtained.

In the prior art, corresponding calculation is performed by sequentially traversing data, the channel distribution of the last layer of convolution of a detection model (yolv 3) based on a preset anchor (anchor) is generally (x, y, w, h, confidence, pred _ class) anchors _ num, when a final result is calculated, the score of the confidence is calculated firstly, and if the score is greater than the preset score, the corresponding coordinate is calculated, so that the efficiency is low during calculation due to discrete distribution of the confidence, and the efficiency of the whole detection model is slowed down.

Further, technical terms commonly used in the prior art are as follows:

convolutional Neural Networks (CNN): is a type of feedforward neural network that contains convolution calculations and has a depth structure.

Post-processing of the detection model: because the current detection model trains the model by the deviation of the real frame position relative to the preset frame, the predicted frame position needs to be solved according to the preset frame when the model is inferred.

The batch size is an important parameter in the convolutional network.

feature map: at each convolutional layer, the data exists in three dimensions. It can be seen that a plurality of two-dimensional pictures are overlapped, each of which is called a feature map. On the input layer, if the picture is a gray picture, only one feature map exists; in the case of color pictures, there are typically 3 feature maps (red, green, and blue). There are several convolution kernels (kernel) between layers, and the convolution of the previous layer and each feature map with each convolution kernel will generate a feature map of the next layer.

SIMD single instruction stream Multiple Data (SIMD) is a technique that uses one controller to control Multiple processors while performing the same operation on each of a set of Data (also called "Data vectors") separately to achieve spatial parallelism.

Disclosure of Invention

In order to solve the problems in the prior art, the invention aims to: in order to improve the efficiency of post-processing of the detection model, the time for calculating the detection frame is reduced by optimizing the flow of post-processing calculation of the model. The method solves the problem that the post-processing of the detection model in the existing scheme can optimize the calculation process for insufficient utilization of system resources.

Specifically, the invention provides a method for improving the post-processing speed of a model, which comprises the following steps:

s1, the following operations are performed for the post-processing part:

the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size,

h, W is the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is

[ x, y, w, h, confidence, class 1, class 2] has a total of 9 channels, i.e., 9 anchors;

the total number of 9 anchors per anchor has a channel distribution of

[ x, y, w, h, confidence1, class 1, class 2], for a total of 7 × 9 ═ 63 channels;

s2, channel rearrangement is carried out, channels of the confidence of each anchor of each data point are put together in the last layer of the operation of the post-processing part of S1, and the taking number of the innermost cycle is continuous; the channel distribution of each anchor is [ x, y, w, h, confidence1, class 1, class 2], and the confidence of each anchor is taken out and put together to be arranged continuously [ confidence1, confidence2.. confidence9 ];

s3, optimizing by using SIMD, obtaining a rule by counting the output of the last layer of convolution, and directly converting the comparison between the original Sigmoid (confidence) and 0.5 into the comparison between the confidence and 0 according to the characteristics of a Sigmoid function; and because scale is greater than 0, the result is finally converted into confidence + bias to be compared with 0; because the output result of the last layer of convolution is stored by 16 bits, 8 confidence levels can be compared at the same time, if the 8 confidence levels are all smaller than 0, the 8 confidence levels are skipped over directly, and when the 8 confidence levels are not satisfied, the 8 confidence levels are compared in sequence.

In S1, since the loss function of YOLOV3 is used in the loss function part of the detection model, the same operation as that of the YOLOV3 is used in the post-processing part.

In the data distribution output by the last layer in S1, the confidence of each anchor of each data point is discontinuous, and each operation requires reading data from the memory.

In S1, it is assumed that the resolution of the input image of the detection model is 1920x1080, the output of the model is 240x135x63, down-sampling is 3 times, stride is 2 each time, and 240x135x9 data are needed to be compared with 0.5.

In S3, the statistical output of the last layer of convolution is as follows:

a) the number of the confidence less than 0 accounts for 99.75 percent of the total number of the confidence;

b) all confidences of feature points are less than 0 and account for 98.95% of the total feature points.

Step S3 further includes: since there are 9 confidences in total, an additional decision is needed.

Thus, the present application has the advantages that: the post-processing part of the detection model adopts channel rearrangement, so that the hit rate of the CPU CACHE is improved, the time for reading data is reduced, the calculation of 8 data points is completed simultaneously by utilizing the SIMD technology, the calculation efficiency is improved, the post-processing time of the detection model is reduced, and the overall operation efficiency of the detection model is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.

FIG. 1 is a schematic flow diagram of the process of the present invention.

Fig. 2 is an S-curve of a Sigmoid function in the prior art.

Detailed Description

In order that the technical contents and advantages of the present invention can be more clearly understood, the present invention will now be described in further detail with reference to the accompanying drawings.

As shown in FIG. 1, the present invention relates to a method for increasing model post-processing speed, the method comprising the steps of:

s1, the following operations are performed for the post-processing part:

the total number of 9 anchors per anchor has a channel distribution of

The invention can also be construed as follows:

1. the post-treatment part comprises the following specific operations:

since the loss function of YOLOV3 is used in the loss function part of the detection model, the same operation as YOLOV3 is used in the post-processing part, and the following operations are mainly required in the post-processing part:

the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size, H, W are the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is [ x, y, W, H, confidence, class 1, class 2] with 9 channels in total, namely 9 anchors;

2. specific optimization details are as follows:

the consistency of each anchor of each data point is discontinuous by observing the data distribution output by the last layer, so that the cache hit rate of the CPU is greatly reduced by directly operating, and almost every operation needs to read data from the memory. Assuming that the resolution of the input image of the detection model is 1920x1080 and the output of the model is 240x135x63 (down-sampled 3 times, stride 2 each), where roughly 240x135x9 data need to be compared with 0.5, which is the bottleneck of optimization.

The following optimization schemes are proposed for the problems:

the channels of the confidence of each anchor are put together, so that the fetching of the innermost loop is continuous, the cache hit rate of the CPU can be improved, and the operation efficiency is improved.

The following law was found by counting the output of the last layer of convolution (there are on average 55 targets in the input picture):

a) the number of the confidence levels less than 0 accounts for 99.75% (290871/(240 × 135 × 9))

b) All confidences of feature points are less than 0 and account for 98.95% (32059/(240 × 135))

Because the characteristics of the Sigmoid function can be directly converted into comparison between the confidence and 0 (so that exponential operation is avoided) by comparing 0.5, and because the scale is greater than 0, the result can be finally converted into comparison between the confidence + bias and 0, and on the basis of the previous step, the SIMD can be further utilized for optimization, because the output result of the last layer of convolution is stored by 16 bits, 8 confidence can be simultaneously compared, if 8 confidence are less than 0, the skipping is directly performed, and when the result is not satisfied, the 8 confidence are sequentially compared, (because 9 confidence are totally obtained, the judgment is needed additionally).

By using channel reordering and SIMD optimization, the post-processing time is reduced from the previous 78ms to around 20 m.

Furthermore, the Sigmoid function is defined by the following formula:

its derivative to x can be expressed by itself:

the Sigmoid function is shown graphically as an S-curve, as shown in fig. 2. It can be seen that when approaching positive infinity or negative infinity, the function approaches a smooth state, the sigmoid function is often used for the probability of two classes because of the output range (0, 1), and in fact the logistic regression has the following advantages when using this function:

the 1 value range is between 0 and 1

The 2 function has very good symmetry

Thus, the function is insensitive to inputs beyond a certain range.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for increasing the post-processing speed of a model, the method comprising the steps of:

s1, the following operations are performed for the post-processing part:

the result output by the last layer is: [ N, H, W, C ], wherein N is the size of the batch size, H, W are the length and width of FeatureMap, C is the number of channels, and the distribution of the channels is [ x, y, W, H, confidence, class 1, class 2] with 9 channels in total, namely 9 anchors; the channel distribution of each anchor of 9 anchors is [ x, y, w, h, confidence1, class 1 and class 2], and the total number of channels is 7 × 9 to 63;

2. The method of claim 1, wherein the same operation as the YOLOV3 loss function is performed for the post-processing part due to the YOLOV3 loss function used for the loss function part of the test model in S1.

3. The method of claim 1 or 2, wherein the confidence of each anchor of the data distribution output from the last layer in S1 is discontinuous, and each operation requires reading data from the memory.

4. The method of claim 1, wherein in S1, assuming that the resolution of the input image of the detection model is 1920x1080, the output of the model is 240x135x63, and down-sampling is 3 times, with stride of 2, where 240x135x9 data need to be compared with 0.5.

5. The method of claim 4, wherein in step S3, the statistical final layer convolution output is normalized as follows:

6. The method for increasing model post-processing speed according to claim 1, wherein the method, S3, further comprises: since there are 9 confidences in total, an additional decision is needed.