CN112364734B

CN112364734B - Abnormal dressing detection method based on yolov4 and CenterNet

Info

Publication number: CN112364734B
Application number: CN202011188613.4A
Authority: CN
Inventors: 柯逍; 陈文垚
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2023-02-21
Anticipated expiration: 2040-10-30
Also published as: CN112364734A

Abstract

The invention relates to an abnormal dressing detection method based on yolov4 and CenterNet, which comprises the following steps: s1, acquiring abnormal dressing data, preprocessing the abnormal dressing data, and constructing an abnormal dressing detection data set; s2, respectively optimizing and training the hyperparameters of the yolov4 model and the CenterNet model; s3, performing target detection according to the trained yolov4 detection model to obtain a prediction result, and screening out a final prediction frame by utilizing non-maximum inhibition after decoding the prediction result; s4, performing target detection according to the trained CenterNet detection model, calculating a central point thermodynamic diagram and a corner point thermodynamic diagram of input data, predicting the positions of key points, normalizing the obtained thermodynamic diagrams, and finding K maximum value points as candidate targets; calculating through the central point to obtain a final target frame; and S5, drawing the final prediction frame and the final target frame on the original image, and adding information such as category names, confidence degrees and the like to obtain a result image. The invention can effectively identify the abnormal dressing phenomenon.

Description

Abnormal dressing detection method based on yolov4 and CenterNet

Technical Field

The invention relates to the field of image recognition and computer vision, in particular to an abnormal dressing detection method based on yolov4 and CenterNet.

Background

In the process of daily production and life, certain specific occasions have strict requirements on dressing of people, so that the safety of people in the process of production and life is ensured. In a building site, a constructor needs to wear protective devices such as a safety helmet, a safety vest and the like to ensure the working safety of the constructor. The construction industry is one of the more dangerous industries in all industries, and data of the United states' labor statistics office (BLS) shows that the number of deaths in the construction industry is far higher than that in all other industries, and the fatal injury rate is higher than the national average level of all the industries. The responsible personnel can not monitor the force, the constructors do not strictly follow the relevant safety regulations of the construction site, and safety protection equipment such as safety helmets, safety vests and the like are worn, so that the safety accidents are frequently caused. In hospitals, medical staff and patients with various infectious diseases are very easy to contact, and have higher infection risks. Respiratory infectious diseases are one of the most dangerous diseases in various infectious diseases at present, and the special transmission path of the respiratory infectious diseases causes the respiratory infectious diseases to be difficult to prevent and control. Medical personnel and patients can be protected from infection by a simple but very useful method of wearing a mask which provides a bi-directional protective effect between the medical personnel and the patient. The wearing of the mask is of great significance for protecting medical care personnel and patients and preventing infectious diseases from spreading widely. In these special occasions, abnormal dressing behaviors such as wearing no safety helmet, safety vest and no mask can greatly increase the danger in the production and life processes of people.

At present, the identification of the abnormal dressing condition is mainly based on a manual inspection mode. The manual inspection is easily interfered by various factors to reduce the efficiency, and the all-weather monitoring can not be carried out and the human resources are wasted, so the efficiency of the method is low, and the actual requirements of all safety supervision departments can not be fully met. The abnormal dressing phenomenon can be monitored in real time, the abnormal dressing phenomenon can be automatically identified by monitoring equipment such as a camera and by utilizing a detection algorithm based on machine vision, the abnormal dressing condition can be monitored in real time, the problems of missed detection, false detection and the like caused by various subjective factors in manual detection are avoided, and the efficiency and the automation level of supervision can be improved while manpower resources are saved.

Disclosure of Invention

In view of the above, the present invention provides an abnormal dressing detection method based on yolov4 and centrnet, which can effectively identify the abnormal dressing phenomenon, and can maintain a relatively good detection speed while achieving a good detection accuracy.

In order to achieve the purpose, the invention adopts the following technical scheme:

an abnormal dressing detection method based on yolov4 and CenterNet comprises the following steps:

s1, acquiring abnormal dressing data, preprocessing the abnormal dressing data, and respectively constructing an abnormal dressing detection data set according to a yolov4 model requirement and a CenterNet model requirement;

s2, respectively optimizing the hyperparameters of the yolov4 model and the CenterNet model, and respectively training according to corresponding data sets to obtain a yolov4 detection model and a CenterNet detection model;

s3, carrying out target detection according to the trained yolov4 detection model to obtain a prediction result, and screening out a final prediction frame by utilizing non-maximum inhibition after decoding the prediction result;

s4, performing target detection according to the trained CenterNet detection model, calculating the central point and angular point thermodynamic diagrams of input data, predicting the positions of key points, normalizing the obtained thermodynamic diagrams, and finding K maximum value points as candidate targets; calculating through the central point to obtain a final prediction frame;

and S5, drawing the final prediction frames obtained respectively according to the two detection models on the original image, and adding information such as category names, confidence degrees and the like to obtain a result image.

Further, the step S1 specifically includes:

s11, acquiring data pictures related to abnormal dresses, screening data and finishing data preprocessing;

step S12: manufacturing a data set according to the requirements of a yolov4 model, dividing all data into a training set and a testing set according to a proportion, and generating a txt file required by a training model according to an xml file containing image data labeling information;

step S13: and (3) making a data set according to the requirements of the CenterNet model, dividing all data into a training set, a testing set and a verification set according to a proportion, and generating a json file required by the training model according to the xml file containing the image data labeling information.

Further, the step S2 specifically includes:

step S21: acquiring an optimal value of the hyper-parameter, and adjusting the hyper-parameter to optimize the training model;

step S22: setting the momentum parameter momentum in the momentum gradient descent to 0.9 and setting the regularization coefficient of weight attenuation to 0.0005 in a training file of yolov 4;

step S23: a step method is adopted for the learning rate adjustment mode;

step S24: calculating an anchor by using a k-means clustering algorithm, and normalizing the width and the height of the bounding box by using the width and the height of the data picture, namely:

let anchor = (w) _a ,h _a ),box＝(w _b ,h _b ) Using the IOU as a metric, the calculation is as follows:

the value of the IOU is between 0 and 1, the more similar the two box are, the larger the value of the IOU is, and the final measurement formula is as follows:

d(box,anchor)＝1-IOU(box,anchor)

randomly selecting k bounding boxes in the data set as initial anchors, using IOU measurement to allocate each bounding box to the anchor closest to the bounding box, traversing all the bounding boxes, calculating the average value of the width and the height of all the bounding boxes in each cluster, updating the anchors, and repeating the steps until the anchors are not changed or the maximum iteration number is reached.

Further, the step S3 specifically includes:

step S31: three feature graphs with different sizes are obtained after input data pass through a feature extraction network darknet 53;

step S32, according to the extracted feature maps with different sizes, one part of results obtained by convolving the three initial feature maps is used for outputting a prediction result corresponding to the feature map, and the other part of results is used for combining with other feature maps after deconvolution, so that the prediction results of the three effective feature maps are finally obtained;

step S33: adjusting a preset priori frame according to the obtained prediction result to obtain the size and the position information of the prediction frame;

step S34: according to the adjusted prediction frame, carrying out maximum suppression, carrying out local search in the candidate target to find the prediction frame with the highest confidence coefficient and suppress the prediction frame with the lower confidence coefficient,

further, in step S33, specifically:

(a) The characteristic diagram is divided into S multiplied by S grids, and then a preset prior frame is adjusted to the effective characteristic diagram

(b) Then obtaining coordinate information x _ offset, y _ offset, h and w of a prior frame from the yolov4 network prediction result;

(c) And (3) carrying out sigmoid function processing on the central point coordinates of the prior frame corresponding to the grid, then adding corresponding x _ offset and y _ offset to obtain the center of the prediction frame, then utilizing h and w to calculate the width and height of the prediction frame, and finally obtaining the size and position information of the prediction frame.

Further, step S34 specifically includes:

(a) When the non-maximum value is restrained, sorting the prediction frames of the same target from high confidence to low confidence, taking out the prediction frame with the highest confidence and calculating the IOU with the other prediction frames,

setting two detection boxes B according to the process of finding local maximum value by using intersection ratio IOU ₁ And B ₂ Then the intersection between the two is as follows:

(b) If the calculation result is larger than the set threshold value, the prediction frame is restrained and cannot be output as a result, and after all the prediction frames are calculated, the prediction frame with the maximum confidence coefficient in the rest prediction frames is taken out to repeat the operation.

Further, the step S4 specifically includes the following steps:

step S41: extracting the maximum values of the central point in the horizontal direction and the vertical direction through centerpooling, and adding the maximum values to provide information except the position of the central point, so as to obtain a central point thermodynamic diagram;

step S42: predicting upper left sum of object by using cascade corner po olingLower right corner point and extract maximum value max of object boundary ₁ Finding the maximum max inwards along the boundary maximum ₂ Then, adding and capturing the boundary characteristics so as to provide semantic information of the associated object for the corner characteristics and obtain a corner thermodynamic diagram;

step S43: normalizing the obtained thermodynamic diagram by using a sigmoid function, wherein the formula of the sigmoid function is as follows:

step S44: performing 3 × 3 maximal pooling convolution on the thermodynamic diagram, finding and screening the first k maximum points by a _ topk function to represent the existence of a target, and obtaining topk _ score, topk _ inds, topk _ clses, topk _ ys and topk _ xs, wherein topk _ score represents the first k maximal scores of each category, topk _ inds represents the lower corner marks of k scores corresponding to each category, the first k maximal scores of all categories of topk _ clses, and topk _ ys and topk _ xs are used for extracting results from corresponding tensors according to the lower corner marks;

step S45: and calculating the box and the corresponding category and confidence coefficient thereof according to the topk _ ys and the topk _ xs.

Further, the step S5 specifically includes:

step S51: calculating to obtain the coordinates of the upper left corner point of the prediction frame according to the coordinates of the center point and the width and the height of the center point so as to obtain the position information of the prediction frame in the output picture;

step S52: and drawing the prediction frame, the prediction category and the confidence coefficient on an output picture by using a drawing function to obtain a final result.

Compared with the prior art, the invention has the following beneficial effects:

the invention can effectively identify the abnormal dressing phenomenon, and can keep relatively good detection speed while achieving good detection precision.

Drawings

Fig. 1 is a schematic diagram of the principle of the present invention.

Detailed Description

The invention is further explained by the following embodiments in conjunction with the drawings.

Referring to fig. 1, the present invention provides a method for detecting abnormal dressing based on yolov4 and centrnet, comprising the following steps:

s3, carrying out target detection according to the trained yolov4 detection model, extracting a feature map by a trunk feature extraction network dark net53, carrying out operations such as convolution and the like on the feature map, predicting the feature map to obtain a prediction result, and screening out a final prediction frame by utilizing non-maximum value inhibition after decoding the prediction result;

s4, performing target detection according to the trained CenterNet detection model, calculating a central point thermodynamic diagram and a corner point thermodynamic diagram of input data, predicting the positions of key points, normalizing the obtained thermodynamic diagrams, and finding K maximum value points as candidate targets; calculating through the central point to obtain a final prediction frame;

In this embodiment, the step S1 specifically includes:

step S12: manufacturing a data set according to requirements of a yolov4 model, dividing all data into a training set and a testing set according to a proportion, and generating a txt file required by a training model according to an xml file containing picture data labeling information; the txt file comprises the category and position information of each target in the picture data;

step S13: and manufacturing a data set according to the requirements of the CenterNet model, dividing all data into a training set, a testing set and a verification set according to a proportion, and generating a json file required by the training model according to the xml file containing the image data labeling information, wherein the json file contains information such as an image name, an image size, a target type, a position coordinate, a size and the like in the image data.

In this embodiment, the step S2 specifically includes:

step S21: obtaining the optimal value of the hyper-parameters in the relevant training files according to multiple experiments, and optimizing the hyper-parameters to enable the training model to be optimal;

step S22: the momentum parameter momentum in the momentum gradient descent is set to be 0.9 in the yolov4 training file, so that the loss function in network training can be effectively prevented from falling into a local minimum value, the speed of gradient convergence to an optimal value is increased, and the regular coefficient decade of weight attenuation is set to be 0.0005, so that overfitting can be effectively prevented;

step S23: if the learning rate is too large, the weight updating speed is high, but the selection of the optimal value is easily missed, if the learning rate is too small, the weight updating speed is slow, the training efficiency is low, the training speed and the selection of the optimal value can be effectively improved by setting a relatively proper learning rate, a steps method is adopted in the learning rate adjusting mode, and the learning rate is attenuated by a certain multiple when a certain number of iterations is reached;

d(box,anchor)＝1-IOU(box,anchor)

In this embodiment, the step S3 specifically includes:

step S31: the darknet53 is yolov 4's backhaul, after inputting data, the darknet53 firstly uses Conv2D to process the input data, L2 regularization is carried out during convolution, batch standardization is carried out after convolution is completed, a LeakyReLU activation function is used, then a residual convolution block in the darknet53 is used for carrying out convolution on the output for 3 x 3 times with the step length being 2, then the convolution layer is stored, convolution is carried out for 1 x 1 convolution and 3 x 3 convolution again, and the result and the stored convolution layer are added to serve as a final result, so that not only is the depth of the neural network increased, but also the gradient disappearance problem caused by depth increase in the deep neural network is relieved. The input data will get three feature maps of different sizes after passing through the feature extraction network darknet53, respectively (52, 256), (26, 512), (13, 1024);

in step S32, three feature maps of different sizes are extracted through the feature extraction network darknet53, and a part of results obtained by performing operations such as convolution on the three initial feature maps is used to output a prediction result corresponding to the feature map, and another part is used to combine with other feature maps after performing deconvolution. Finally obtaining the prediction results of three effective feature maps, wherein the sizes of the prediction results are (255, 13), (255, 26, 13) and (255, 52), and the prediction results correspond to the positions of 3 prediction frames on the grids divided into 13 × 13, 26 × 26 and 52 × 52 in each map;

step S33: the prediction result obtained after processing the feature map is not a final prediction frame, a preset prior frame needs to be adjusted by using the prediction result of the yolov4 network, the feature map is firstly divided into S multiplied by S grids, the preset prior frame is adjusted to an effective feature map, then coordinate information x _ offset, y _ offset, h and w of the prior frame is obtained from the yolov4 network prediction result, the center point coordinate of the prior frame corresponding to the grids is subjected to sigmoid function processing and then is added with the corresponding x _ offset and y _ offset to obtain the center of the prediction frame, then the width and height of the prediction frame are obtained by using h and w to calculate, and finally the size and the position information of the prediction frame are obtained;

step S34: each target usually has a plurality of prediction boxes, maximum suppression is needed to perform local search in candidate targets to find the prediction box with the highest confidence and suppress the prediction box with the lower confidence, essentially a process of finding local maxima according to the cross-over ratio IOU. The cross-over ratio IOU is an important index for measuring the overlapping degree of two prediction frames, and is provided with two detection frames B ₁ And B ₂ Then the intersection between the two is as follows:

when the non-maximum value is restrained, firstly, the prediction frames of the same target are sorted from high confidence to low confidence, then the prediction frame with the highest confidence is taken out to be respectively used for calculating the IOU with the other prediction frames, if the calculation result is larger than the set threshold value, the prediction frame is restrained and cannot be output as a result, after all the prediction frames are calculated, the prediction frame with the highest confidence in the rest prediction frames is taken out, and the operation is repeated.

In this embodiment, the step S4 specifically includes the following steps:

step S42: cascade corerpoling is utilized to predict upper left corner point and lower right corner point of object and extract maximum value max of object boundary ₁ Finding the maximum max inwards along the boundary maximum ₂ Then, adding and capturing the boundary features to provide semantic information of the associated object for the corner features so as to obtain a corner thermodynamic diagram;

In this embodiment, the step S5 specifically includes:

step S51: calculating to obtain a coordinate of a point at the upper left corner of the prediction frame according to the central point coordinate and the width and the height of the detection frame according to the obtained position information of the detection frame, thereby obtaining the position information of the prediction frame in an output picture;

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. An abnormal dressing detection method based on yolov4 and CenterNet is characterized by comprising the following steps:

s1, acquiring abnormal dressing data, preprocessing the abnormal dressing data, and respectively constructing an abnormal dressing detection data set according to yolov4 model requirements and CenterNet model requirements;

s2, respectively adjusting and optimizing the hyperparameters of the yolov4 model and the CenterNet model, and respectively training according to corresponding data sets to obtain a yolov4 detection model and a CenterNet detection model;

s3, performing target detection according to the trained yolov4 detection model to obtain a prediction result, and screening out a final prediction frame by utilizing non-maximum inhibition after decoding the prediction result;

s5, drawing the final prediction frames obtained respectively according to the two detection models on the original image, and adding information such as category names, confidence degrees and the like to obtain a result image;

the step S2 specifically includes:

step S23: a step method is adopted as a learning rate adjusting mode;

step S24: calculating an anchor by using a k-means clustering algorithm, and normalizing the width and the height of a bounding box by using the width and the height of a data picture, namely:

let anchor = (w) _a ,h _a ),box＝(w _b ,h _b ) Using the IOU as a metric, it is calculated as follows:

d(box,anchor)＝1-IOU(box,anchor)

randomly selecting k bounding boxes in the data set as initial anchors, using IOU measurement to allocate each bounding box to the anchor closest to the bounding box, traversing all the bounding boxes, calculating the mean value of the width and the height of all the bounding boxes in each cluster, updating the anchors, and repeating the steps until the anchors are not changed or the maximum iteration number is reached;

the step S3 specifically includes:

step S34: according to the adjusted prediction frame, local search is carried out in the candidate target for the prediction frame with the highest confidence coefficient and the prediction frame with the lower confidence coefficient is restrained by carrying out maximum suppression;

the step S33 specifically includes:

(c) Performing sigmoid function processing on the central point coordinates of the prior frame corresponding to the grid, adding corresponding x _ offset and y _ offset to obtain the center of a prediction frame, calculating by using h and w to obtain the width and height of the prediction frame, and finally obtaining the size and position information of the prediction frame;

the step S34 specifically includes:

(b) If the calculation result is larger than the set threshold value, the prediction frame is restrained and cannot be output as a result, and after all the prediction frames are calculated, the prediction frame with the maximum confidence coefficient in the rest prediction frames is taken out to repeat the operation;

the step S4 specifically includes the following steps:

step S44: performing 3 × 3 maximum pooling convolution on the thermodynamic diagram, finding and screening the top k maximum points by a _ topk function to represent the existence target, and obtaining topk _ score, topk _ inds, topk _ clses, topk _ ys and topk _ xs, wherein topk _ score represents the top k maximum scores of each category, topk _ inds represents the lower corner marks of k scores corresponding to each category, the top k maximum scores of all categories of topk _ clses, and topk _ ys and topk _ xs are used for extracting results from corresponding tensors according to the lower corner marks;

2. The abnormal dressing detection method based on yolov4 and centrnet according to claim 1, wherein the step S1 specifically comprises:

3. The abnormal wear detection method based on yolov4 and centrnet according to claim 1, characterized in that the step S5 is specifically: