CN112836657A

CN112836657A - Pedestrian detection method and system based on lightweight YOLOv3

Info

Publication number: CN112836657A
Application number: CN202110171542.5A
Authority: CN
Inventors: 杨利红; 甘彤; 商国军; 张琦珺; 程剑; 刘海涛; 窦曼莉; 任好; 房思思; 卢安安; 聂建华; 姜少波
Original assignee: CETC 38 Research Institute
Current assignee: CETC 38 Research Institute
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2021-05-25
Anticipated expiration: 2041-02-08
Also published as: CN112836657B

Abstract

The invention discloses a pedestrian detection method based on lightweight YOLOv3, which comprises the following steps: establishing a pedestrian data set aiming at a perimeter intrusion prevention application scene; constructing a lightweight YOLOv3 pedestrian detection network; dividing a pedestrian detection training set, and training a lightweight YOLOv3 pedestrian detection network to obtain a lightweight pedestrian detection model; dividing a pedestrian detection verification set, and verifying the effect of the lightweight pedestrian detection model obtained by training; the lightweight YOLOv3 pedestrian detection model is deployed into embedded front-end equipment. The method adopts the high-precision lightweight backbone network to replace the backbone network of the traditional YOLOv3 detection network, greatly reduces the forward calculation amount of the pedestrian detection network and the parameter data amount of the pedestrian detection network, greatly improves the pedestrian detection speed while ensuring the pedestrian detection precision, and is suitable for embedded equipment with lower computing capacity and smaller storage space.

Description

Pedestrian detection method and system based on lightweight YOLOv3

Technical Field

The invention relates to the technical field of target identification, in particular to a pedestrian detection method based on light-weight YOLOv 3.

Background

The perimeter security system is widely applied to construction places such as detention houses, prisons, airports, nuclear power plants, oil depots and the like, and is used for preventing illegal invasion. With the continuous progress of social science and technology, the challenges of security are more and more serious, and the creation of stronger and more intelligent perimeter security systems is urgent. Traditional perimeter security protection system comprises closed rail and a large amount of surveillance cameras, receives the influence of natural environment factors such as bad weather more easily, has the too high problem of false alarm rate, and the user uses and experiences not well.

In recent years, with rapid progress of hardware technology and leap-type development of deep learning technology, the perimeter security system introduces artificial intelligence technology to protect, judges illegal intrusion targets by using a target identification algorithm based on deep learning, can accurately identify the interested intrusion targets, enables the perimeter security system not to be interfered by factors such as illumination shadows, rain, snow, fog, sand, dust, tree shaking, small animals and the like, and greatly reduces the false alarm rate of the perimeter security system.

The target identification algorithm based on deep learning generally has the problems of huge network forward computation and overlarge model parameter data amount, and needs to be operated on a high-performance server with strong computation capability. Due to the fact that the deployment environment of the perimeter security system is complex, the images collected by the front-end monitoring camera are transmitted to the rear-end high-performance server in real time to process the images, and the problems of time delay, packet loss and the like caused by overlarge data amount exist. And the target recognition algorithm is deployed in the front-end embedded equipment, and the target recognition result is returned to the back end for display, so that the pressure of the transmission system can be effectively reduced. In order to solve the problems, a lightweight target identification algorithm needs to be designed, the forward operation amount of a network is reduced, and the data amount of model parameters is reduced, so that the method can be used in embedded equipment with low computing capacity and limited storage space.

For example, a method for detecting a vehicle and a license plate and fusing a long focus and a short focus based on light-weight YOLOv3 disclosed in application number CN201910500483.4 is used for establishing a vehicle and license plate data set and designing and training a light-weight YOLOv3 network. Aiming at the problems of large network parameter quantity and long calculation time of the YOLOv3 network, the light-weight network is used for replacing a backbone network, and other convolutional layer frameworks are reconstructed, so that the detection speed is greatly improved on the premise of ensuring the detection precision, and the target detection network can be moved to a vehicle-mounted embedded unit. The light weight network designed in the invention greatly reduces the parameter and the calculation amount of the original backbone network of YOLOv3, but still has a space for further reducing the calculation amount, and the running efficiency of the detection algorithm in the embedded equipment can be further improved by designing a more efficient light weight network.

Disclosure of Invention

The invention aims to solve the technical problem of how to improve the running speed of a pedestrian detection network in embedded equipment while ensuring the accuracy of pedestrian detection, and provides a pedestrian detection method based on light-weight YOLOv 3.

The invention solves the technical problems through the following technical means:

a pedestrian detection method based on light-weight YOLOv3 comprises the following steps:

s1, establishing a perimeter security pedestrian detection data set; the data set comprises a real pedestrian image in a protected scene and an annotation; extracting a pedestrian image in a natural scene contained in the open source data set and converting the annotation information of the pedestrian image; collecting unmanned images with the number equivalent to that of the pedestrian images as background images and constructing a blank file for each background image as a label;

s2, constructing a light YOLOv3 pedestrian detection network; the lightweight backbone network structure adopted by the lightweight YOLOv3 pedestrian detection network is as follows: sequentially including rolling layer conv1, lightweight layer 3, lightweight layer 2, lightweight layer 1 × 2, lightweight layer 3, lightweight layer 1 × 3, lightweight layer 2, lightweight layer 1 × 2, lightweight layer 3, and rolling layer conv 2;

s3, dividing a pedestrian detection training set, and training a lightweight YOLOv3 pedestrian detection network;

s4, dividing a pedestrian detection verification set, and verifying the effect of the lightweight YOLOv3 pedestrian detection model;

s5, the embedded device deploys a light-weight YOLOv3 pedestrian detection model.

The lightweight YOLOv3 pedestrian detection network is constructed, the lightweight backbone network is adopted to replace a darknet53 backbone network used by the traditional YOLOv3, the calculated amount of the lightweight YOLOv3 pedestrian detection network is reduced by 71% compared with the traditional YOLOv3 forward calculation amount, and the speed of detecting pedestrians in each frame of image is greatly improved; before convolution operation is carried out on each lightweight layer in the lightweight backbone network to extract features, the number of feature channels participating in operation is increased through the amplification convolution layers, and the extracted image features are richer; the lightweight layer 1 fuses low-dimensional features and high-dimensional features, further improves feature expression capability, and ensures that the whole lightweight backbone network has excellent feature expression capability.

Further, constructing lightweight YOLOv3 pedestrian detection network extraction features in the S2; detecting the pedestrian by adopting a three-scale detection module: the small-scale output is used for detecting pedestrians with large target proportion, the medium-scale output is used for detecting pedestrians with medium target proportion, and the large-scale output is used for detecting pedestrians with small target proportion.

Further, the detection module structure adopted for constructing the light-weight YOLOv3 pedestrian detection network in S2 is as follows: the small-scale comprises a convolution layer conv3, a convolution layer conv4, a convolution layer conv5, a convolution layer conv6, a convolution layer conv7, a convolution layer conv8 and a convolution layer conv9 in sequence; the medium-scale sequentially comprises a route layer 1, a convolution layer conv10, an up-sampling layer 1, a route layer 2, a convolution layer conv11, a convolution layer conv12, a convolution layer conv13, a convolution layer conv14, a convolution layer conv15, a convolution layer conv16 and a convolution layer conv 17; the large-scale sequentially comprises a route layer 3, a convolution layer conv18, an up-sampling layer 2, a route layer 4, a convolution layer conv19, a convolution layer conv20, a convolution layer conv21, a convolution layer conv22, a convolution layer conv23, a convolution layer conv24 and a convolution layer conv 25.

Further, the light weight layer 1 for constructing the light weight YOLOv3 pedestrian detection network in the step S2 sequentially includes an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv (step size is 1), a compression convolutional layer 1 × 1conv, and a short layer; the lightweight layer 2 includes an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv (step 2), and a compression convolutional layer 1 × 1conv in this order; the lightweight layer 3 includes an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv (step 1), and a compression convolutional layer 1 × 1conv in this order.

Further, training the lightweight YOLOv3 pedestrian detection network in S3, randomly selecting image samples with a set proportion in the perimeter security pedestrian detection data set as a pedestrian detection training set, and performing online data enhancement on training images in the training process, including: randomly selecting two original training images to carry out random cutting, random scaling and random color transformation operations, and carrying out corresponding transformation on the marking information of the two original training images according to the cutting and scaling operations; and fusing the two transformed training images into a new training image, and combining the labeling information of the two transformed training images to be used as the label of the new training image. The formula for fusing the two training images is as follows:

I(x,y)＝0.5×I₁(x,y)+0.5×I₂(x,y)

wherein, I₁(x, y) and I₂And (x, y) respectively represents the pixel values of the two transformed training images at the coordinate point (x, y), and I (x, y) represents the pixel value of the new fused training image at the coordinate point (x, y).

Further, in the step S3, training the lightweight YOLOv3 pedestrian detection network until the loss function is stable and does not decrease any more, and stopping the training, wherein the loss function adopted in the training process is as follows:

the method comprises the following steps that S represents the size of a detection module adopted by the lightweight pedestrian detection network, and B represents the number of target frames predicted by each cell under each detection scale of the detection module;

indicating whether the jth predicted target frame of the ith cell under a certain scale contains a target or not, and if so, containing the target

If no target is included

x_i,y_i,w_i,h_i,C_iRespectively representing the ith cell at a certain scale

The coordinate of the central point x, the coordinate of the central point y, the width, the height and the confidence coefficient of the predicted target frame are 1;

respectively representing the x coordinate of the central point, the y coordinate of the central point, the width, the height and the confidence coefficient of the target marked in advance, class represents the target category to be detected, and p_i(c) For the prediction probability of each of the classes,

true probability for each category;

the first line of the loss function represents the loss of effective predicted target center coordinates; the second row represents the penalty on the effective predicted target width and height; the third row represents confidence loss for all prediction boxes; the fourth row represents the class penalty for an effective prediction target.

Further, verifying the effect of the lightweight pedestrian detection model in S4, randomly selecting a sample with a set proportion in the perimeter security pedestrian detection data set as a pedestrian detection verification set, detecting pedestrians and positions thereof existing in each image sample of the verification set through the trained lightweight pedestrian detection model, storing detection results, and comparing the detection results with pedestrian positions in the verification set marking information, thereby finally obtaining the overall recall rate and accuracy data of the lightweight pedestrian detection model on the pedestrian detection verification set.

The invention also provides a pedestrian detection system based on the light-weight YOLOv3, which comprises

The data set establishing module is used for establishing a perimeter security pedestrian detection data set; the method comprises the following steps: acquiring and marking a real pedestrian image in a protected place scene; extracting a pedestrian image in a natural scene contained in the open source data set and converting the annotation information of the pedestrian image; collecting unmanned images with the number equivalent to that of the pedestrian images as background images and constructing a blank file for each background image as a label;

the light-weight YOLOv3 pedestrian detection network construction module is used for constructing a light-weight YOLOv3 pedestrian detection network; the lightweight backbone network structure adopted by the lightweight YOLOv3 pedestrian detection network is as follows: sequentially including rolling layer conv1, lightweight layer 3, lightweight layer 2, lightweight layer 1 × 2, lightweight layer 3, lightweight layer 1 × 3, lightweight layer 2, lightweight layer 1 × 2, lightweight layer 3, and rolling layer conv 2;

the lightweight YOLOv3 pedestrian detection network training module divides a pedestrian detection training set and trains a lightweight YOLOv3 pedestrian detection network;

the light-weight YOLOv3 pedestrian detection network verification module divides a pedestrian detection verification set and verifies the effect of a light-weight YOLOv3 pedestrian detection model;

the pedestrian detection system is characterized by comprising a lightweight YOLOv3 pedestrian detection model application module and an embedded device deployment lightweight YOLOv3 pedestrian detection model.

Further, in the lightweight YOLOv3 pedestrian detection network construction module, a three-scale detection module is adopted to detect pedestrians: the small-scale output is used for detecting pedestrians with large target proportion, the medium-scale output is used for detecting pedestrians with medium target proportion, and the large-scale output is used for detecting pedestrians with small target proportion.

Further, the detection module structure adopted in the lightweight YOLOv3 pedestrian detection network construction module is as follows: the small-scale comprises a convolution layer conv3, a convolution layer conv4, a convolution layer conv5, a convolution layer conv6, a convolution layer conv7, a convolution layer conv8 and a convolution layer conv9 in sequence; the medium-scale sequentially comprises a route layer 1, a convolution layer conv10, an up-sampling layer 1, a route layer 2, a convolution layer conv11, a convolution layer conv12, a convolution layer conv13, a convolution layer conv14, a convolution layer conv15, a convolution layer conv16 and a convolution layer conv 17; the large-scale sequentially comprises a route layer 3, a convolution layer conv18, an up-sampling layer 2, a route layer 4, a convolution layer conv19, a convolution layer conv20, a convolution layer conv21, a convolution layer conv22, a convolution layer conv23, a convolution layer conv24 and a convolution layer conv 25; the lightweight layer 1 comprises an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv, a compression convolutional layer 1 × 1conv and a short layer in sequence; the lightweight layer 2 comprises an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv and a compression convolutional layer 1 × 1conv in this order; the lightweight layer 3 includes an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv, and a compression convolutional layer 1 × 1conv in this order.

The invention has the advantages that:

the lightweight YOLOv3 pedestrian detection network is constructed, the lightweight backbone network is adopted to replace a darknet53 backbone network used by the traditional YOLOv3, the forward calculation amount of the lightweight YOLOv3 pedestrian detection network is 41.364BFLOPS, the forward calculation amount is reduced by 71% compared with that of the traditional YOLOv3, and the speed of detecting pedestrians in each frame of image is greatly improved; the pedestrian detection model parameter data volume obtained by training the lightweight YOLOv3 pedestrian detection network is 89MB, which is reduced by 62% compared with the traditional YOLOv3, and the requirement on the storage space of hardware is reduced. Each lightweight layer in the backbone network of the lightweight YOLOv3 pedestrian detection network increases the number of characteristic channels participating in operation through an amplification convolution layer before convolution operation is carried out to extract characteristics, the extracted image characteristics are richer, meanwhile, the lightweight layer 1 fuses low-dimensional characteristics and high-dimensional characteristics, the characteristic expression capability is further improved, and the backbone network is ensured to have excellent characteristic extraction capability; the detection head of the lightweight YOLOv3 pedestrian detection network adopts three scales to detect pedestrian targets with three different sizes, namely large, medium and small, so that the missing detection rate is greatly reduced; the rich features extracted by the backbone network of the lightweight YOLOv3 pedestrian detection network are matched with the multi-scale detection of the detection head, so that the pedestrian detection can obtain higher precision. In conclusion, the lightweight YOLOv3 pedestrian detection method provided by the invention is suitable for embedded equipment with low computing power and small storage space, can ensure high detection precision, and is convenient for front-end application of perimeter security products.

Drawings

Fig. 1 is a general flowchart of a pedestrian detection method based on light-weight YOLOv3 in an embodiment of the present invention.

Fig. 2 is a light YOLOv3 pedestrian detection network structure diagram in the embodiment of the present invention.

Fig. 3 is a structure diagram of a lightweight layer in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a pedestrian detection method based on light YOLOv3, which comprises the following steps as shown in figure 1:

s1, establishing a perimeter security pedestrian detection data set

Collect the image and establish the pedestrian detection data set to perimeter security protection application scene, ensure that pedestrian detection data set image source is diversified, specifically include: acquiring a real pedestrian image from a protected wanted scene; extracting a natural scene descending person image meeting the requirement from the open source data set; an image of a wanted scene or an image of a natural scene in the case of no person is collected as a background image. The proportion of the pedestrian image to the background image in the established perimeter security pedestrian detection data set is approximately 1:1, and the total number of the images reaches 87300.

And marking the collected images, wherein each marked image corresponds to a mark file with the same name and the format is txt. Marking the position of each pedestrian in the collected real pedestrian image to generate a corresponding marking file; converting the position information of each pedestrian in the existing marking files of the pedestrian images in the open source data set to generate a new marking file; each background image generates a blank txt document as a markup file. And storing the position information of each pedestrian in the mark file of the pedestrian image as a line, and sequentially storing the x coordinate of the upper left corner, the y coordinate of the upper left corner, the width and the height of the outer frame of the pedestrian.

S2, constructing a light YOLOv3 pedestrian detection network

The lightweight YOLOv3 pedestrian detection network adopts a lightweight backbone network to replace a darknet53 backbone network used by the traditional YOLOv 3; detecting the pedestrian by adopting a three-scale detection module: the small-scale output tensor 19 × 19 × 18 is used for detecting a pedestrian with a large target proportion, the medium-scale output tensor 38 × 38 × 18 is used for detecting a pedestrian with a medium target proportion, and the large-scale output tensor 76 × 76 × 18 is used for detecting a pedestrian with a small target proportion. The structure of the lightweight YOLOv3 pedestrian detection network is shown in fig. 2, and the output tensors of each stage are shown in the following table:

the light-weight backbone network used in the light-weight YOLOv3 pedestrian detection network includes, in order, rolling layer conv1, light-weight layer 3, light-weight layer 2, light-weight layer 1 × 2, light-weight layer 3, light-weight layer 1 × 3, light-weight layer 2, light-weight layer 1 × 2, light-weight layer 3, and rolling layer conv 2.

The small-scale detection module adopted by the light-weight YOLOv3 pedestrian detection network sequentially comprises a convolution layer conv3, a convolution layer conv4, a convolution layer conv5, a convolution layer conv6, a convolution layer conv7, a convolution layer conv8 and a convolution layer conv 9; the medium-scale sequentially comprises a route layer 1, a convolution layer conv10, an up-sampling layer 1, a route layer 2, a convolution layer conv11, a convolution layer conv12, a convolution layer conv13, a convolution layer conv14, a convolution layer conv15, a convolution layer conv16 and a convolution layer conv 17; the large-scale sequentially comprises a route layer 3, a convolution layer conv18, an up-sampling layer 2, a route layer 4, a convolution layer conv19, a convolution layer conv20, a convolution layer conv21, a convolution layer conv22, a convolution layer conv23, a convolution layer conv24 and a convolution layer conv 25.

The lightweight backbone network uses three different lightweight layers, the structure of which is shown in fig. 3. The lightweight layer 1 comprises in sequence the following operations: the number of channels of the amplified convolutional layer 1 × 1conv output characteristic diagram is 6 times of the number of channels of the input characteristic diagram, and the resolution of the output characteristic diagram is the same as that of the input characteristic diagram; the number of channels of the output characteristic diagram of the depth convolution layer 3 multiplied by 3DwConv (step length is 1) is 6 times of the number of channels of the input characteristic diagram, and the resolution of the output characteristic diagram is the same as that of the input characteristic diagram; the number of channels of the compressed convolutional layer 1 × 1conv output characteristic diagram is the same as that of the channels of the input characteristic diagram, and the resolution of the output characteristic diagram is the same as that of the input characteristic diagram; the number of channels of the output feature map of the shortcut layer is the same as that of the channels of the input feature map, and the resolution of the output feature map is the same as that of the input feature map. The lightweight layer 2 comprises in sequence the following operations: the number of channels of the amplified convolutional layer 1 × 1conv output characteristic diagram is 6 times of the number of channels of the input characteristic diagram, and the resolution of the output characteristic diagram is the same as that of the input characteristic diagram; the number of channels of the output feature map of the depth convolution layer 3 multiplied by 3DwConv (step size is 2) is 6 times of the number of channels of the input feature map, and the resolution of the output feature map is 1/2 of the resolution of the input feature map; the number of channels of the compressed convolutional layer 1 × 1conv output characteristic diagram is the same as that of the channels of the input characteristic diagram, and the resolution of the output characteristic diagram is 1/2 of that of the input characteristic diagram. The lightening layer 3 comprises in sequence the following operations: the number of channels of the amplified convolutional layer 1 × 1conv output characteristic diagram is 6 times of the number of channels of the input characteristic diagram, and the resolution of the output characteristic diagram is the same as that of the input characteristic diagram; the number of channels of the output characteristic diagram of the depth convolution layer 3 multiplied by 3DwConv (step length is 1) is 6 times of the number of channels of the input characteristic diagram, and the resolution of the output characteristic diagram is the same as that of the input characteristic diagram; the number of channels of the compressed convolutional layer 1 × 1conv output characteristic diagram is the same as that of the channels of the input characteristic diagram, and the resolution of the output characteristic diagram is the same as that of the input characteristic diagram.

The constructed lightweight YOLOv3 pedestrian detection network replaces a darknet53 backbone network used by the traditional YOLOv3 with a lightweight backbone network, the forward computation amount is 41.364BFLOPS, the forward computation amount is reduced by 71% compared with the traditional YOLOv3, and the speed of detecting pedestrians in each frame of image is greatly improved; meanwhile, the number of characteristic channels participating in operation is increased by amplifying the convolution layer in the lightweight layer, and high accuracy of pedestrian detection can be ensured.

S3, dividing a pedestrian detection training set, training a lightweight YOLOv3 pedestrian detection network

And the training pedestrian detection training set randomly selects 90% of images from the perimeter security pedestrian detection data set to form the image. The online data enhancement is carried out on training images in the training process, and the online data enhancement comprises the following steps: randomly selecting two original training images to carry out random cutting, random scaling and random color transformation operations, and carrying out corresponding transformation on the marking information of the two original training images according to the cutting and scaling operations; and fusing the two transformed training images into a new training image, and combining the labeling information of the two transformed training images to be used as the label of the new training image. The formula for fusing the two training images is as follows:

I(x,y)＝0.5×I₁(x,y)+0.5×I₂(x,y)

In the training process, multi-resolution training is adopted, the input resolution of the scaled training images is not fixed, the scaled resolution of the training images is randomly changed after every 20 times of iterative training, and the selectable resolution is as follows: 320. 352, 384, 416, 448, 480, 512, 544, 576, 608.

S3-3, the loss function used to train the lightweighting YOLOv3 is as follows:

wherein S represents the size of a detection module adopted by the lightweight pedestrian detection network, and the numerical values are respectively 19, 38 and 76; b represents the number of the predicted target frames of each cell under each detection scale of the detection module, and the numerical value of the predicted target frames is 3;

If no target is included

x_i,y_i,w_i,h_i,C_iRespectively representing the ith cell at a certain scale

Predicted target Box of 1The center point x coordinate, the center point y coordinate, the width, the height and the confidence coefficient of the sensor;

respectively representing the x coordinate of the central point, the y coordinate of the central point, the width, the height and the confidence coefficient of the target marked in advance. class represents the class of target to be detected, p_i(c) For the prediction probability of each of the classes,

as is the true probability of each class. The first line of the loss function represents the loss of the effective predicted target center coordinates; the second row represents the penalty on the effective predicted target width and height; the third row represents confidence loss for all prediction boxes; the fourth row represents the class penalty for an effective prediction target.

The data volume of the trained lightweight YOLOv3 pedestrian detection model parameter is 89MB, which is reduced by 62% compared with the traditional YOLOv3, and the requirement on the storage space of the embedded device is reduced.

S4, dividing a pedestrian detection verification set, and verifying the effect of the lightweight YOLOv3 pedestrian detection model

The pedestrian detection verification set is formed by randomly selecting 10% of samples from the perimeter security protection pedestrian detection data set, the pedestrian detection verification set and the pedestrian detection training set do not have coincident images, and the union set of the pedestrian detection verification set and the pedestrian detection training set is the perimeter security protection pedestrian detection data set.

When the effect of the lightweight YOLOv3 pedestrian detection model is verified, each image in the verification set is sequentially selected and zoomed to 608 x 608, pedestrians and position information thereof existing in the image are detected through the trained lightweight pedestrian detection model, the detection result is stored and compared with the position of the pedestrian in the mark file corresponding to the image, and finally, the overall recall rate and accuracy data of the lightweight pedestrian detection model on the pedestrian detection verification set are obtained and used for evaluating the detection effect of the pedestrian detection model.

S5, deployment lightweight YOLOv3 pedestrian detection model of embedded equipment

The forward calculation amount of the constructed lightweight pedestrian detection network is 41.364BFLOPS, which is 71% lower than that of the traditional YOLOv3, thereby not only greatly improving the pedestrian detection speed, but also ensuring that the pedestrian detection can obtain higher precision; the data volume of the trained lightweight YOLOv3 pedestrian detection model parameter is 89MB, which is reduced by 62% compared with the traditional YOLOv3, and the requirement on the storage space of the embedded device is reduced. The lightweight pedestrian detection model which meets the requirements of recall rate and accuracy index is deployed in the embedded equipment to operate, and the characteristics of low computing capability and small storage space of the embedded equipment can be adapted.

The invention also provides a pedestrian detection system based on the light-weight YOLOv3, and the flow chart is shown in fig. 1 and comprises the following components:

detection data set construction module

Lightweight YOLOv3 pedestrian detection network construction module

Lightweight YOLOv3 pedestrian detection network training module

I(x,y)＝0.5×I₁(x,y)+0.5×I₂(x,y)

wherein, I₁(x, y) and I₂(x, y) respectively representing coordinate points of the two transformed training imagesAnd I (x, y) represents the pixel value of the new training image at the coordinate point (x, y) after fusion.

S3-3, the loss function used to train the lightweighting YOLOv3 is as follows:

If no target is included

x_i,y_i,w_i,h_i,C_iRespectively representing the ith cell at a certain scale

respectively representing the x coordinate of the central point, the y coordinate of the central point, the width, the height and the confidence coefficient of the target marked in advance. class represents the purpose of the assayClass of logo, p_i(c) For the prediction probability of each of the classes,

Lightweight YOLOv3 pedestrian detection model verification module

Lightweight YOLOv3 pedestrian detection model deployment module

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A pedestrian detection method based on light-weight YOLOv3 is characterized by comprising the following steps:

2. The pedestrian detection method based on light-weight YOLOv3 according to claim 1, wherein the light-weight YOLOv3 pedestrian detection network extraction features are constructed in S2; detecting the pedestrian by adopting a three-scale detection module: the small-scale output is used for detecting pedestrians with large target proportion, the medium-scale output is used for detecting pedestrians with medium target proportion, and the large-scale output is used for detecting pedestrians with small target proportion.

3. The pedestrian detection method based on light-weight YOLOv3 according to claim 1, wherein the detection module structure adopted for constructing the light-weight YOLOv3 pedestrian detection network in S2 is as follows: the small-scale comprises a convolution layer conv3, a convolution layer conv4, a convolution layer conv5, a convolution layer conv6, a convolution layer conv7, a convolution layer conv8 and a convolution layer conv9 in sequence; the medium-scale sequentially comprises a route layer 1, a convolution layer conv10, an up-sampling layer 1, a route layer 2, a convolution layer conv11, a convolution layer conv12, a convolution layer conv13, a convolution layer conv14, a convolution layer conv15, a convolution layer conv16 and a convolution layer conv 17; the large-scale sequentially comprises a route layer 3, a convolution layer conv18, an up-sampling layer 2, a route layer 4, a convolution layer conv19, a convolution layer conv20, a convolution layer conv21, a convolution layer conv22, a convolution layer conv23, a convolution layer conv24 and a convolution layer conv 25.

4. The pedestrian detection method based on light-weight YOLOv3 according to claim 3, wherein the light-weight layer 1 for constructing the light-weight YOLOv3 pedestrian detection network in step S2 sequentially comprises an amplification convolutional layer 1 x 1conv, a depth convolutional layer 3 x 3DwConv, a compression convolutional layer 1 x 1conv, a shortcut layer; the lightweight layer 2 comprises an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv and a compression convolutional layer 1 × 1conv in this order; the lightweight layer 3 includes an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv, and a compression convolutional layer 1 × 1conv in this order.

5. The pedestrian detection method based on light-weight YOLOv3 as claimed in claim 1, wherein the training of the light-weight YOLOv3 pedestrian detection network in S3 randomly selects image samples with a set proportion in a perimeter security pedestrian detection data set as a pedestrian detection training set, and performs online data enhancement on the training images during the training process, including: randomly selecting two original training images to carry out random cutting, random scaling and random color transformation operations, and carrying out corresponding transformation on the marking information of the two original training images according to the cutting and scaling operations; and fusing the two transformed training images into a new training image, and combining the labeling information of the two transformed training images to be used as the label of the new training image. The formula for fusing the two training images is as follows:

I(x,y)＝0.5×I₁(x,y)+0.5×I₂(x,y)

6. The pedestrian detection method based on light weight YOLOv3 according to claim 1, wherein the training of the light weight YOLOv3 pedestrian detection network in S3 is stopped until the loss function stabilizes and no longer decreases, and the loss function used in the training process is as follows:

If no target is included

x_i,y_i,w_i,h_i,C_iRespectively representing the ith cell at a certain scale

true probability for each category;

7. The pedestrian detection method based on light weight YOLOv3 as claimed in claim 1, wherein the step S4 is to verify the effect of the light weight pedestrian detection model, randomly select a proportion of samples in the perimeter security pedestrian detection data set as a pedestrian detection verification set, detect pedestrians and their positions in each image sample of the verification set through the trained light weight pedestrian detection model, store the detection results and compare the detection results with the pedestrian positions in the verification set labeling information, and finally obtain the overall recall rate and accuracy data of the light weight pedestrian detection model on the pedestrian detection verification set.

8. A pedestrian detection system based on light-weight YOLOv3 is characterized by comprising

9. The pedestrian detection system based on light-weight YOLOv3 of claim 8, wherein in the light-weight YOLOv3 pedestrian detection network construction module, a three-scale detection module is used to detect pedestrians: the small-scale output is used for detecting pedestrians with large target proportion, the medium-scale output is used for detecting pedestrians with medium target proportion, and the large-scale output is used for detecting pedestrians with small target proportion.

10. The pedestrian detection system based on light-weight YOLOv3 of claim 8, wherein the detection module structure employed in the light-weight YOLOv3 pedestrian detection network construction module is as follows: the small-scale comprises a convolution layer conv3, a convolution layer conv4, a convolution layer conv5, a convolution layer conv6, a convolution layer conv7, a convolution layer conv8 and a convolution layer conv9 in sequence; the medium-scale sequentially comprises a route layer 1, a convolution layer conv10, an up-sampling layer 1, a route layer 2, a convolution layer conv11, a convolution layer conv12, a convolution layer conv13, a convolution layer conv14, a convolution layer conv15, a convolution layer conv16 and a convolution layer conv 17; the large-scale sequentially comprises a route layer 3, a convolution layer conv18, an up-sampling layer 2, a route layer 4, a convolution layer conv19, a convolution layer conv20, a convolution layer conv21, a convolution layer conv22, a convolution layer conv23, a convolution layer conv24 and a convolution layer conv 25; the lightweight layer 1 comprises an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv, a compression convolutional layer 1 × 1conv and a short layer in sequence; the lightweight layer 2 comprises an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv and a compression convolutional layer 1 × 1conv in this order; the lightweight layer 3 includes an amplification convolutional layer 1 × 1conv, a depth convolutional layer 3 × 3DwConv, and a compression convolutional layer 1 × 1conv in this order.