CN111598066A

CN111598066A - Helmet wearing identification method based on cascade prediction

Info

Publication number: CN111598066A
Application number: CN202010722851.2A
Authority: CN
Inventors: 郑影; 徐芬; 张文广; 王军; 徐晓刚; 何鹏飞
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-08-28

Abstract

The invention discloses a helmet wearing identification method based on cascade prediction. When the pedestrian is detected, the position of a pedestrian target frame in each frame of image of the video is obtained through a detection algorithm based on a deep convolutional neural network; based on the pedestrian detection result, performing track association by adopting Kalman filtering and Hungarian algorithm to obtain an optimized pedestrian target frame; and for the pedestrian in each target frame, judging whether the pedestrian wears the safety helmet or not through a classifier based on the deep convolutional neural network. The method adopts a mode of three-module cascade prediction, has simple implementation method and strong transportability, and can realize accurate identification of whether workers wear safety helmets in workplaces such as factories and construction sites shot by the monitoring camera.

Description

Helmet wearing identification method based on cascade prediction

Technical Field

The invention relates to the field of computer vision, in particular to a helmet wearing identification method based on cascade prediction.

Background

In the workplaces such as high temperature, power supply lines, factories and construction sites, the wearing of safety helmets is a basic safety precaution requirement. However, the manual supervision is time-consuming, labor-consuming and difficult, which causes safety accidents caused by the fact that the constructors do not wear safety helmets. In order to solve the problem, it is necessary to identify whether a worker wears a safety helmet by an intelligent means. Conventional helmet detection methods are generally directed to images of a motorcycle taken on the road, intended to determine whether the rider is wearing a helmet during riding. Although whether the safety helmet is worn or not is judged, the difference between the road scene and the actual construction environment is great, and the actual effect is poor.

In recent years, a target detection method based on a deep convolutional neural network is also applied to solve the problem of helmet wearing identification. In such methods, whether the safety helmet is worn or not is generally taken as two independent targets, and then two types of people wearing or not wearing the safety helmet in the image are directly detected by adopting a popular detection method and taken as a recognition result of whether each detected person wears the safety helmet or not. However, in an actual construction environment, the problems of complex and changeable scenes, target movement, mutual shielding, scale and illumination change and the like are very challenging, so that the existing detection method is difficult to obtain high accuracy when identifying whether a constructor wears a safety helmet or not.

Disclosure of Invention

The invention aims to provide a helmet wearing identification method based on cascade prediction aiming at the defects of the prior art, and the specific technical scheme is as follows:

a helmet wearing identification method based on cascade prediction specifically comprises the following steps:

s1: extracting each frame of image from a video shot by a monitoring camera, inputting the images into a pedestrian detection model based on a deep convolutional neural network frame by frame according to a time sequence, and outputting the positions of all pedestrian target frames detected in each image and the characteristics corresponding to each pedestrian target frame;

s2: based on the features of a pedestrian target frame and all pedestrian target frames detected by a plurality of frame images before a current frame, tracking pedestrians in a video by adopting a multi-target tracking algorithm, predicting the position of the pedestrian target frame of the current frame through a Kalman filter, then matching the pedestrian target frame of the current frame and the pedestrian target frame of the previous frame image through a Hungary algorithm, and correcting the predicted pedestrian target frame by using a matching result with high certainty factor to obtain the optimized pedestrian target frames of all frame images;

s3: and (4) cutting the optimized pedestrian target frames of all the frame images output in the step (S2), cutting the pedestrian images in the pedestrian target frames, inputting the pedestrian images into a classifier based on a deep convolutional neural network, and finally outputting a prediction result of whether each pedestrian in the images wears a safety helmet or not.

Further, in S1, the YOLOv3 deep convolutional neural network is selected as a pedestrian detection model, and before the model is used, the model is pre-trained on a CrowdHuman pedestrian detection data set, and then retraining is performed on manually labeled complex operation scene data to obtain a final pedestrian detection model.

Further, the step S2 is implemented by the following sub-steps:

s2.1: inputting a first frame image of a video, initializing and creating a new pedestrian target tracker by adopting a detected pedestrian target frame, and labeling the serial number id of the pedestrian target frame in the image;

s2.2: predicting the position of the pedestrian target frame of the current frame through a Kalman filter based on the characteristics of the pedestrian target frame and each pedestrian target frame detected by a plurality of frames of images before the current frame, and then calculating the intersection ratio IOU of the pedestrian target frame of the current frame output by a pedestrian detection model and the pedestrian target frame of the current frame predicted by the Kalman filter:

wherein, Box₁And Box₂Are respectively two pedestrian target frames, Box₁∩Box₂Representing the number of pixels, Box, contained in the intersection of two pedestrian target boxes₁∪Box₂Representing the number of pixels contained in the union of the two pedestrian target frames;

s2.3: performing association matching on the current frame and all pedestrian target frames of the previous frame of image through Hungarian algorithm to obtain the maximum unique matching of the intersection ratio of each pedestrian target frame in the current frame of image, and then removing the matching pairs with the matching values smaller than the set threshold value; in the association process, there are three cases:

(a) if the detected target is found in the pedestrian targets in the previous frame of image, the normal tracking is realized without additional operation;

(b) if the detected target is not found in the pedestrian target in the previous frame image, the target is newly appeared in the current frame, and the target is recorded for later tracking association;

(c) if a certain object exists in the previous frame but has no object associated with the certain object in the current frame, the object is removed if the object disappears from the visual field;

s2.4: updating the pedestrian target tracker of the S2.1 by using the target detection frame of the current frame obtained in the S2.3, calculating Kalman gain, state updating and covariance updating based on the cross-over ratio obtained in the S2.2, and outputting a state updating value as a pedestrian target frame of the current frame;

s2.5: and repeatedly executing S2.2-S2.4 until all the frame images are processed, and obtaining the optimized pedestrian target frames of all the frame images.

Further, the S3 is divided into a training phase and a prediction phase, and the training phase is implemented by the following sub-steps:

s3.1: selecting a ResNet50 deep convolutional neural network as a classifier for identifying the wearing of the safety helmet, firstly introducing a ResNet50 network model pre-trained on ImageNet, then selecting a pedestrian image in a manually marked complex operation scene, cutting out a square region image (X, Y, w, w) corresponding to the upper half of the human body, wherein X and Y are coordinates of the upper left corner of the target frame on X and Y axes respectively, w is the side length of the square region, zooming the cut-out image to a fixed size, and then performing normalization operation, and inputting ResNet 50;

s3.2: after each input image passes through network forward calculation and the last full connection layer, a two-dimensional vector s, s = { s [1], s [2] }isoutput, wherein s [1] and s [2] respectively represent probability values of wearing and not wearing safety helmets;

s3.3: calculating the cross entropy loss between the prediction result s and the true value class:

wherein, the true value class represents that the image belongs to a class label of a helmet which is worn or not worn, exp is an exponential function taking a natural constant e as a base, and log is a logarithmic function taking e as a base;

s3.4: based on the cross entropy loss calculated by S3, performing reverse propagation operation on the network, and continuously updating network parameters through a gradient descent algorithm, so that the predicted value of the network is finally close to the true value, and the operation is finished after the set iteration number is reached, so as to obtain an optimized ResNet 50;

the prediction stage specifically comprises: the pedestrian target frame after optimization of all the frame images output in S2 is clipped to a square region image (x, y, w, w) corresponding to the upper half of the human body, the clipped image is also scaled to a fixed size and normalized, and then input to ResNet50 after optimization in S3.4, and the result of prediction of whether or not a crash helmet is worn is output.

The invention has the following beneficial effects:

(1) the method of the invention decomposes the wearing identification of the safety helmet into three cascaded steps, so that each step can achieve respective target pertinently, the realization method is simple, and the selection of modules in each step is more flexible;

(2) the method of the invention classifies the pedestrian detection and the safety helmet wearing independently, the two are not interfered with each other, and compared with the mode of directly detecting two types of pedestrian targets wearing and not wearing the safety helmet, the accuracy of the safety helmet wearing identification can be obviously improved;

(3) the cascade prediction method adopted by the invention can be better suitable for complex and changeable operation places.

Drawings

Fig. 1 is a flowchart of a helmet wearing identification method based on cascade prediction.

FIG. 2 is a schematic diagram of clipping a labeled pedestrian object frame.

Fig. 3 is an example of the helmet wearing recognition result in a factory complex environment.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and preferred embodiments, and the objects and effects of the present invention will become more apparent, it being understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.

As shown in fig. 1, the method for identifying wearing of a helmet based on cascade prediction of the present invention specifically includes the following steps:

s1: extracting each frame of image from a video shot by a monitoring camera, inputting the images into a pedestrian detection model based on a deep convolutional neural network frame by frame according to a time sequence, and outputting the positions of all pedestrian target frames detected in each image and the characteristics corresponding to each pedestrian target frame.

The method comprises the steps of selecting a YOLOv3 deep convolution neural network as a pedestrian detection model, pre-training the model on a crowdHuman pedestrian detection data set before using the model, and then re-training the model on complex operation scene data of a manually marked factory and the like to obtain a final pedestrian detection model.

The target frame obtained by the single-frame processing method inevitably has noise, so that the position of the target frame is not accurate enough, and the stability of the detection result is difficult to ensure in time sequence. Aiming at the problem, the pedestrian in the video is tracked through a multi-target tracking algorithm, so that an optimized pedestrian target frame is obtained.

a) if the detected target is found in the pedestrian targets in the previous frame of image, the normal tracking is realized without additional operation;

b) if the detected target is not found in the pedestrian target in the previous frame image, the target is newly appeared in the current frame, and the target is recorded for later tracking association;

c) if a certain object exists in the previous frame but has no object associated with the certain object in the current frame, the object is removed if the object disappears from the visual field;

S3 is divided into a training stage and a prediction stage, wherein the training stage is realized by the following sub-steps:

s3.3: calculating the cross entropy loss between the prediction result s and the true value class

As shown in fig. 2, it is a pedestrian labeling frame and a cutting frame used in the identification method of the present invention.

As shown in fig. 3, the identification method of the present invention identifies whether a pedestrian wears a crash helmet in a complex factory environment. Under the verification data set collected in complex operation scenes such as factories and the like, the invention can reach the Average accuracy (MAP) of 69.7 percent, and the index (AP 50) can reach 93.7 percent when only the detection result exceeding 50 percent of the real target frame IOU is calculated, compared with the existing method, the wearing and identifying performance of the safety helmet is effectively improved.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and although the invention has been described in detail with reference to the foregoing examples, it will be apparent to those skilled in the art that various changes in the form and details of the embodiments may be made and equivalents may be substituted for elements thereof. All modifications, equivalents and the like which come within the spirit and principle of the invention are intended to be included within the scope of the invention.

Claims

1. A helmet wearing identification method based on cascade prediction is characterized by comprising the following steps:

2. The cascade prediction-based helmet wearing identification method of claim 1, wherein in S1, YOLOv3 deep convolutional neural network is selected as a pedestrian detection model, and is pre-trained on a CrowdHuman pedestrian detection data set before the model is used, and then is retrained on manually labeled complex work scene data to obtain a final pedestrian detection model.

3. The cascade prediction-based headgear wearing identification method according to claim 1, wherein the step S2 is implemented by the following sub-steps:

4. The cascade prediction-based helmet wearing identification method according to claim 1, wherein the step S3 is divided into a training phase and a prediction phase, and the training phase is implemented by the following sub-steps: