CN110895714A

CN110895714A - Network compression method of YOLOv3

Info

Publication number: CN110895714A
Application number: CN201911270679.5A
Authority: CN
Inventors: 王以忠; 许素霞; 房臣; 郭肖勇; 杨国威
Original assignee: Tianjin University of Science and Technology
Current assignee: Tianjin University of Science and Technology
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-03-20

Abstract

The invention discloses a network compression method of YOLOv3, which comprises the following steps: obtaining a picture data set from the acquired video; carrying out information annotation on the picture data set; performing data amplification on the data set; training the YOLOv3 model with the dataset; measuring and selecting a stored weight file; sparsifying the selected YOLOv3 weight; pruning the network model; fine-tuning the network model to improve the detection effect; judging whether the accuracy reaches a threshold value, outputting a compressed network model if the accuracy reaches the threshold value, and otherwise, continuing to perform sparse training, pruning and fine adjustment; and carrying out target detection on the campus data by using the compressed network. According to the invention, through network compression of YOLOv3, the size of a network model can be reduced, the network detection speed is increased, and the requirement of object detection on objects such as pedestrians, vehicles and the like in a campus is met.

Description

Network compression method of YOLOv3

Technical Field

The invention belongs to the technical field of deep neural network optimization, and particularly relates to a compression method of a deep neural network.

Background

The development of the deep learning technology enables the deep neural network to obtain remarkable results in the technical field of target classification and detection. However, in practical application, the deep neural network has some difficulties in the field of target detection and classification. Because the deep neural network needs to perform a large amount of calculations in the process of target detection and classification, high requirements are required on equipment for performing calculations in the actual application process, and otherwise, the requirement for rapid identification and classification is difficult to achieve. The YOLOv3 classifies 80 objects, and when the types of objects needing to be detected are relatively small, such as pedestrians and vehicles in a campus, the YOLOv3 has redundancy in detection. In order to reduce the calculation amount of the deep neural network and improve the portability of the deep neural network on equipment such as a development board, some researches start to be directed at the compression of the deep neural network, and the current main compression methods comprise network pruning, network quantization, low-rank decomposition, knowledge distillation and compact network design.

The traditional pruning method is to set a weight threshold, the weight below the threshold is deleted, and the purpose of compressing the network is achieved by deleting the weight of the network. The compression method does not carry out sparse training on the network for weight deletion, and randomly deletes the weight as a single parameter, so that the weight with correlation is also deleted, the precision of the compressed network is poor, and the running time of the network is not reduced. Unstructured pruning of individual weights is not suitable for complex deep neural networks and requires specialized software computational libraries and hardware support, which further increases the complexity of the compressed network.

Disclosure of Invention

The invention provides a network compression method of YOLOv3, which aims to reduce the calculation amount of a deep neural network, improve the portability of the neural network on devices such as a development board and the like and compress the network on the premise of not reducing the network detection effect.

The technical scheme for realizing the invention is as follows:

a network compression method of YOLOv3, comprising the following steps:

obtaining a picture dataset from the captured video: the method comprises the steps of obtaining a plurality of video clips of different corners from a monitoring video stored in a school, and obtaining and screening pictures from the video clips.

Carrying out information annotation on the picture data set: and carrying out information annotation on four types of targets needing to be detected on the stored picture data set.

Performing data amplification on the data set: and expanding the data set by rotating, turning and zooming the picture.

Training a Yolov3 network model with a data set; the YOLOv3 model was trained with the processed picture dataset.

And (3) weighing weight file: and comparing the stored weight files by calculating the network recall rate and the accuracy rate, and selecting the file with the best effect as the weight file of the target detection network.

Sparse training: and performing L1 regularization constraint on the gamma coefficient of the network BN layer to generate a sparse weight matrix.

Network model pruning: the scaling factor gamma is introduced for each channel to weigh the importance of the network channel, the closer the scaling factor is to zero, the less important the corresponding channel is to the network, and the less influential channel is deleted.

Fine-tuning the network model: the accuracy of the pruned network model is reduced, and the accuracy of the network is improved through fine adjustment.

Target detection: and carrying out object detection on objects such as pedestrians, vehicles and the like appearing in the campus monitoring video by using the compressed network.

The invention provides a network compression method of YOLOv3, which comprises the following steps:

the video acquisition device comprises: cameras arranged in different corners of the campus;

the picture processing module: classifying a training set and a data set of pictures stored in a video clip, labeling the pictures, and amplifying the data set by the labeled data set through rotation and turning operations;

a training module: training the neural network by using the amplified data set, and extracting picture characteristics;

a pruning module: and (4) channel deletion is carried out on the network after the sparse network is deleted, and fine adjustment is carried out on the network after the channel deletion is carried out.

A detection module: and detecting the compressed YOLOv3 network by using the campus data set, and detecting the detection speed and accuracy of the compressed network.

Compared with the prior art, the invention has the following advantages:

the invention provides a network compression method of YOLOv3, which is used for sparsifying training a network, so that a model can adjust parameters towards a structural sparse direction, and the correlation among weights is considered; the importance of network channels is measured by introducing scaling factors to the sparse network, only the channels with small influence factors are deleted in the network pruning process, and the network feature learning capability and the detection precision are ensured; the trimmed network is finely adjusted, so that the detection effect of the network is recovered, the size of a network model is reduced, the detection accuracy of the deep neural network is guaranteed, and the calculation cost in the detection process is reduced.

Drawings

Fig. 1 is a flowchart of a network compression method of YOLOv3 according to the present invention;

fig. 2 is a schematic block diagram of a network compression method of YOLOv3 according to the present invention;

FIG. 3 is a labeled picture used to train the YOLOv3 network;

FIG. 4 is a flow chart of pruning a YOLOv3 network;

fig. 5 is a diagram of the results of network detection after compression.

Detailed description of the preferred embodiment

The invention is described in further detail below with reference to the accompanying drawings and specific embodiments:

fig. 1 is a flowchart of a network compression method of YOLOv3 provided by the present invention, and fig. 2 is a schematic block diagram of a network compression method of YOLOv3 provided by the present invention, where the method includes:

obtaining a picture data set from a captured video, specifically comprising:

the method comprises the steps of obtaining multiple sections of video files through cameras in different places of a campus, and screening out 50 video clips. The method comprises the steps of intercepting a picture for each video file every 5 frames, screening 5000 pictures for the stored picture files, and adopting a screening principle that picture data comprise four types of data of pedestrians, bicycles, automobiles and backpacks, which are the four most common things in a campus.

Carrying out information annotation on the picture data set: the data set used to train the network is obtained by the picture processing module in fig. 2. And dividing 5000 screened pictures into 4000 training sets and 1000 testing sets according to the ratio of 8: 2, and carrying out information labeling on the pictures in the training sets by using a labeling tool labellimg. And (3) framing pedestrians, bicycles, automobiles and backpacks appearing in the pictures by using rectangular frames respectively, and labeling corresponding type information, as shown in fig. 3, labeling four types of objects appearing in the pictures for the marked pictures.

Performing data amplification on the data set: the marked image data set needs data amplification, so that the feature learning capacity of the deep neural network on the campus image is enhanced, and the network detection effect is improved. Firstly, all pictures are scaled according to the proportion of 0.5, 0.75, 1.25 and 1.5, then the pictures are horizontally turned over, and then all the pictures are rotated by different angles, and finally the purpose of amplifying a data set is achieved.

Training a Yolov3 network model with a data set; the augmented campus picture data set is used as a data set for training a Yolov3 network, network training is carried out through a training module in fig. 2, pictures and corresponding marked xml files are placed in the same folder, the name of each picture is obtained and stored in a train. Modifying data, names and network structure files, wherein the data files comprise classified number and paths, txt file addresses of the paths of the pictures required for training, names file addresses and storage addresses of the weight files; the names file comprises 4 types of label information classification, including four types of pedestrians, bicycles, automobiles and backpacks; setting a basic learning rate of 0.001, the number of iterations of 20000 and a network as a training mode in a network parameter file, then starting training of the network, saving a weight file every 2000 iterations in the training process, and obtaining 12 weight files including a final weight file and a latest weight file after the training is finished.

And (3) weighing weight file: through the training of the network, 12 trained weight files are saved. Drawing a loss curve through the stored log file, calculating the mAP of the stored weight file, wherein the image path which is required to be added in the data file for calculating the mAP is the same as the trained image path, and selecting the weight file with the highest mAP and lower loss as the weight file compressed by YOLOv 3.

Model compression of the network is implemented by the network pruning module in fig. 2, and the method includes:

as shown in fig. 4, which is a flow chart of network pruning, the weight file of the initial network is the selected weight file with the highest mag, and the network needs to be sparsely trained, specifically including:

the data set used for sparsely training the network is also a campus data set, and since the neural network only detects 4 types of objects, and the Yolov3 original network detects 80 types of objects, the network needs to be sparsely trained for detecting the campus objects, redundant weights are deleted, and the network only needs to learn the characteristics of the objects in the campus data set. The method comprises the steps of conducting sparse training on a network, applying L1 regularization constraint on gamma coefficients of a batch normalization layer of the network, enabling intersection points of isolines of square error terms and isolines of regularization terms to be generally arranged on coordinate axes when L1 regularization is adopted, adjusting adjustment factors, forcing weights to be equal to 0, selecting variables, and enabling a model to adjust parameters towards a structural sparse direction. At this time, the Gamma coefficient of the BN layer enables the network to force some weights to go to 0 according to the characteristics of the data set picture in the sparse training process.

Pruning the network after sparsifying the network, specifically comprising:

the importance of the network channels is weighted by introducing a scaling factor γ for each channel. The closer the scaling factor is to zero, the less important the corresponding channel is to the network. And in the training process, the network and the scaling factor are simultaneously trained, the channel with the small scaling factor is automatically deleted, and the advantage of introducing the scaling factor is that no additional overhead is brought to the network. Pruning the network according to the proportion of 20%, 40%, 60% and 80%, and then selecting a network model with low precision reduction and large compression ratio for precision recovery of the model calculation precision after pruning. The network after the channel deletion only needs less parameters and less running memory.

Fine-tuning the network model: by deleting the channels with the scaling factors close to zero, the input and output connected with the channels and the related weights are correspondingly removed, so that the accuracy of the compressed neural network is reduced, and the compressed neural network can be compensated in a fine adjustment mode. And fine-tuning the campus data set, generating a corresponding anchor value through a clustering algorithm according to the characteristics of the campus data set, training the pruned network model, and improving the accuracy of network detection. And after fine adjustment, evaluating the detection performance of the network, outputting the pruned network model after the network performance meets the requirement, and if the network performance does not meet the requirement, continuously performing sparse training, channel deletion and fine adjustment in a circulating manner.

Detection with campus datasets: because the YOLOv3 is pruned by using the campus data set, the network sparsely training, pruning and fine tuning are performed on pedestrians, automobiles, bicycles and bags in the campus in the compression process, for example, as shown in FIG. 5, the compressed network can meet the requirement of detecting four types of objects in the campus.

Claims

1. A network compression method of YOLOv3 is characterized in that:

step 1: obtaining a picture data set from the acquired video;

step 2: carrying out information annotation on the picture data set;

and step 3: performing data amplification on the data set;

and 4, step 4: training the YOLOv3 model with the dataset;

and 5: weighing the weight files stored in the training process, and selecting the weight files with good detection effect;

step 6: sparsifying the selected YOLOv3 weight;

and 7: pruning the network model and deleting unimportant channels;

and 8: fine-tuning the network model to improve the detection effect;

and step 9: and carrying out target detection on pedestrians, vehicles and the like appearing in the campus data by using the compressed network.

2. The network compression method of YOLOv3, according to claim 1, wherein: the video clips in the step 1 are monitoring videos in different areas of a school, one picture is stored in each 1O frame from multiple video files, the stored pictures are selected, and the pictures without detection targets are removed. The selected pictures comprise four types of pictures including pedestrians, bicycles, automobiles and bags, and the pictures are divided into a training set and a testing set.

3. The network compression method of YOLOv3, according to claim 1, wherein: the data set is as follows 8: and 2, dividing the ratio of the image to a training set and a test set, carrying out information labeling on the image in the training set by using a labeling tool, and marking pedestrians, bicycles, automobiles and bags appearing in the image in a rectangular frame.

4. The network compression method of YOLOv3, according to claim 1, wherein: in order to improve the characteristic learning effect of the network and improve the detection accuracy, the data amplification is achieved by turning, rotating and zooming the pictures and the labeled files.

5. The network compression method of YOLOv3, according to claim 1, wherein: carrying out YOLOv3 network model training by using the data set after data augmentation to generate two files, wherein one file is used for storing picture names, the other file is used for storing absolute paths of all pictures, and the label information of each picture is stored in a single file; parameters in the network structure file are modified for training, and the training log is used for observing the training condition of the model and adjusting the parameters to improve the network performance. A weight file is saved every 2000 iterations in the training process. And comparing the stored weight files by calculating the network recall rate and the accuracy, and selecting the weight with the best effect as the weight file of the detection network for compression.

6. The network compression method of YOLOv3, according to claim 1, wherein: before pruning, the network needs to be sparsely trained, the data set used for training is a marked campus picture data set, L1 regularization constraint is carried out on gamma coefficients of a network BN layer, and a sparsely weighted matrix is generated. When the model pruning is carried out, a scaling factor is introduced into each channel to measure the importance of the channel, and the unimportant channels are automatically deleted, so that the purpose of reducing the size of the network model is achieved.

7. The network compression method of YOLOv3, according to claim 1, wherein: with the reduction of network channels, the detection effect of the network is reduced, and the detection effect of the network can be improved by finely adjusting the network by using the campus data set.