CN112364734A

CN112364734A - Abnormal dressing detection method based on yolov4 and CenterNet

Info

Publication number: CN112364734A
Application number: CN202011188613.4A
Authority: CN
Inventors: 柯逍; 陈文垚
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-02-12
Anticipated expiration: 2040-10-30
Also published as: CN112364734B

Abstract

The invention relates to an abnormal dressing detection method based on yolov4 and CenterNet, which comprises the following steps: s1, acquiring abnormal dressing data and preprocessing the abnormal dressing data to construct an abnormal dressing detection data set; step S2, respectively optimizing and respectively training the hyperparameters of the yolov4 model and the CenterNet model; s3, carrying out target detection according to the trained yolov4 detection model to obtain a prediction result, decoding the prediction result, and screening out a final prediction frame by utilizing non-maximum inhibition; step S4, carrying out target detection according to the trained CenterNet detection model, calculating the center point and corner point thermodynamic diagrams of input data, predicting the positions of key points, normalizing the obtained thermodynamic diagrams, and finding K maximum value points as candidate targets; calculating through the central point to obtain a final target frame; and step S5, drawing the final prediction frame and the final target frame on the original image, and adding information such as category names, confidence degrees and the like to obtain a result image. The invention can effectively identify the abnormal dressing phenomenon.

Description

Abnormal dressing detection method based on yolov4 and CenterNet

Technical Field

The invention relates to the field of image recognition and computer vision, in particular to an abnormal dressing detection method based on yolov4 and CenterNet.

Background

In the process of daily production and life, certain specific occasions have strict requirements on dressing of people, so that the safety of people in the process of production and life is ensured. In a construction site, a constructor needs to wear protective devices such as safety helmets, safety vests and the like to ensure the working safety of the constructor. The construction industry is one of the dangerous industries in all industries, and data of the American labor statistical service (BLS) shows that the number of deaths in the construction industry is far higher than that in all other industries, and the fatal injury rate is also higher than the national average level of all the industries. The responsible personnel can not supervise the force, the constructors can not strictly abide by the relevant safety regulations of the construction site, and safety protection equipment such as safety helmets, safety vests and the like are worn, so that the safety accidents are frequently caused. In hospitals, medical staff and various infectious patients are very easy to contact, and have higher infection risk. Respiratory infectious diseases are one of the most dangerous diseases in various infectious diseases at present, and the special transmission path of the respiratory infectious diseases causes the respiratory infectious diseases to be difficult to prevent and control. Medical personnel and patients can be protected from infection by a simple but very useful method of wearing a mask which provides a bi-directional protective effect between the medical personnel and the patient. The wearing of the mask is of great significance for protecting medical care personnel and patients and preventing infectious diseases from spreading widely. In these special occasions, abnormal dressing behaviors such as wearing no safety helmet, safety vest, or gauze mask can greatly increase the danger in the production and life processes of people.

At present, the identification of the abnormal dressing condition is mainly based on a manual inspection mode. The manual inspection is easily interfered by various factors to reduce the efficiency, and the all-weather monitoring can not be carried out and the human resources are wasted, so the efficiency of the method is low, and the actual requirements of all safety supervision departments can not be fully met. To monitor the abnormal dressing phenomenon in real time, the abnormal dressing phenomenon can be automatically identified by monitoring equipment such as a camera and a detection algorithm based on machine vision, so that the abnormal dressing condition can be monitored in real time, the problems of missed detection, false detection and the like caused by various subjective factors in manual detection are avoided, and the efficiency and the automation level of supervision can be improved while manpower resources are saved.

Disclosure of Invention

In view of the above, the present invention provides an abnormal dressing detection method based on yolov4 and centrnet, which can effectively identify the abnormal dressing phenomenon, and can maintain a relatively good detection speed while achieving a good detection accuracy.

In order to achieve the purpose, the invention adopts the following technical scheme:

an abnormal dressing detection method based on yolov4 and CenterNet comprises the following steps:

s1, acquiring abnormal dressing data, preprocessing the abnormal dressing data, and respectively constructing an abnormal dressing detection data set according to yolov4 model requirements and CenterNet model requirements;

step S2, respectively optimizing the hyperparameters of the yolov4 model and the CenterNet model, and respectively training according to corresponding data sets to obtain a yolov4 detection model and a CenterNet detection model;

s3, carrying out target detection according to the trained yolov4 detection model to obtain a prediction result, decoding the prediction result, and screening out a final prediction frame by utilizing non-maximum inhibition;

step S4, carrying out target detection according to the trained CenterNet detection model, calculating the center point and corner point thermodynamic diagrams of input data, predicting the positions of key points, normalizing the obtained thermodynamic diagrams, and finding K maximum value points as candidate targets; calculating through the central point to obtain a final prediction frame;

and step S5, drawing the final prediction frames obtained respectively according to the two detection models on the original image, and adding information such as category names, confidence degrees and the like to obtain a result image.

Further, in step S1, specifically, the step includes:

step S11, acquiring data pictures related to abnormal dresses, screening data and finishing data preprocessing;

step S12: manufacturing a data set according to the requirements of a yolov4 model, dividing all data into a training set and a testing set according to a proportion, and generating a txt file required by a training model according to an xml file containing image data marking information;

step S13: and (3) making a data set according to the requirements of the CenterNet model, dividing all data into a training set, a testing set and a verification set according to a proportion, and generating a json file required by the training model according to the xml file containing the image data labeling information.

Further, in step S2, specifically, the step includes:

step S21: acquiring an optimal value of the super-parameter, and adjusting the super-parameter to make the training model reach the best;

step S22: setting the momentum parameter momentum in the momentum gradient descent to 0.9 and setting the regularization coefficient of weight attenuation to 0.0005 in the training file of yolov 4;

step S23: a step method is adopted for the learning rate adjustment mode;

step S24: calculating an anchor by using a k-means clustering algorithm, and normalizing the width and the height of the bounding box by using the width and the height of the data picture, namely:

let anchor ═ w_a,h_a),box＝(w_b,h_b) Using the IOU as a metric, the calculation is as follows:

the value of the IOU is between 0 and 1, the more similar the two box are, the larger the IOU value is, and the final measurement formula is as follows:

d(box,anchor)＝1-IOU(box,anchor)

randomly selecting k bounding boxes in the data set as initial anchors, using IOU measurement to allocate each bounding box to the anchor closest to the bounding box, traversing all the bounding boxes, calculating the average value of the width and the height of all the bounding boxes in each cluster, updating the anchors, and repeating the steps until the anchors are not changed or the maximum iteration number is reached.

Further, in step S3, specifically, the step includes:

step S31: after input data passes through a feature extraction network darknet53, three feature maps with different sizes are obtained;

step S32, according to the three extracted feature maps with different sizes, one part of the results obtained by convolving the three initial feature maps is used for outputting the prediction results corresponding to the feature maps, and the other part of the results is used for combining with other feature maps after deconvolution, so that the prediction results of the three effective feature maps are finally obtained;

step S33: adjusting a preset prior frame according to the obtained prediction result to obtain the size and the position information of the prediction frame;

step S34: according to the adjusted prediction frame, carrying out maximum suppression, carrying out local search in the candidate target to find the prediction frame with the highest confidence coefficient and suppress the prediction frame with the lower confidence coefficient,

further, in step S33, specifically, the step includes:

(a) the characteristic diagram is divided into S multiplied by S grids, and then a preset prior frame is adjusted to the effective characteristic diagram

(b) Then obtaining coordinate information x _ offset, y _ offset, h and w of a prior frame from the yolov4 network prediction result;

(c) and (3) carrying out sigmoid function processing on the center point coordinates of the prior frame corresponding to the grid, adding corresponding x _ offset and y _ offset to obtain the center of the prediction frame, calculating by using h and w to obtain the width and height of the prediction frame, and finally obtaining the size and position information of the prediction frame.

Further, in step S34, specifically, the step includes:

(a) when non-maximum value suppression is carried out, the prediction frames of the same target are sorted from high confidence to low confidence, the prediction frame with the highest confidence is taken out to be respectively used for calculating the IOU with the other prediction frames,

setting two detection boxes B according to the process of finding local maximum value by using intersection ratio IOU₁And B₂Then the intersection between the two is as follows:

(b) and if the calculation result is greater than the set threshold value, the prediction frame is restrained and cannot be output as a result, and after all the prediction frames are calculated, the prediction frame with the maximum confidence coefficient in the rest prediction frames is taken out to repeat the operation.

Further, the step S4 specifically includes the following steps:

step S41: extracting the maximum values of the central point in the horizontal direction and the vertical direction through centerpooling, and adding the maximum values to provide information except the position of the central point, so as to obtain a central point thermodynamic diagram;

step S42: predicting the upper left corner point and the lower right corner point of the object by using a cascade coder firing and extracting the maximum value max of the boundary of the object₁Finding the maximum max inwards along the boundary maximum₂Then, adding and capturing the boundary features to provide semantic information of the associated object for the corner features so as to obtain a corner thermodynamic diagram;

step S43: normalizing the obtained thermodynamic diagram by using a sigmoid function, wherein the formula of the sigmoid function is as follows:

step S44: performing 3 × 3 maximum pooling convolution on the thermodynamic diagram, finding and screening the top k maximum points by a _ topk function to represent the existence target, and obtaining topk _ score, topk _ inds, topk _ clses, topk _ ys and topk _ xs, wherein topk _ score represents the top k maximum scores of each category, topk _ inds represents the lower corner marks of k scores corresponding to each category, top k maximum scores of all categories of topk _ clses, and topk _ ys and topk _ xs are used for extracting results from corresponding tensors according to the lower corner marks;

step S45: and calculating the box and the corresponding category and confidence coefficient thereof according to the topk _ ys and the topk _ xs.

Further, the step S5 is specifically:

step S51: calculating to obtain the coordinates of the upper left corner point of the prediction frame according to the coordinates of the center point and the width and the height of the center point so as to obtain the position information of the prediction frame in the output picture;

step S52: and drawing the prediction frame, the prediction category and the confidence coefficient on an output picture by using a drawing function to obtain a final result.

Compared with the prior art, the invention has the following beneficial effects:

the invention can effectively identify the abnormal dressing phenomenon, and can keep relatively good detection speed while achieving good detection precision.

Drawings

Fig. 1 is a schematic diagram of the principle of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

Referring to fig. 1, the present invention provides an abnormal wearing detection method based on yolov4 and centrnet, comprising the following steps:

step S3, carrying out target detection according to the trained yolov4 detection model, extracting a feature map by a trunk feature extraction network darknet53, carrying out operations such as convolution and the like on the feature map, predicting the feature map to obtain a prediction result, and screening out a final prediction frame by utilizing non-maximum value inhibition after decoding the prediction result;

In this embodiment, the step S1 specifically includes:

step S12: manufacturing a data set according to the requirements of a yolov4 model, dividing all data into a training set and a testing set according to a proportion, and generating a txt file required by a training model according to an xml file containing image data marking information; the txt file comprises the category and position information of each target in the picture data;

step S13: and manufacturing a data set according to the requirements of the CenterNet model, dividing all data into a training set, a testing set and a verification set according to a proportion, and generating a json file required by the training model according to the xml file containing the image data labeling information, wherein the json file contains information such as an image name, an image size, a target category, a position coordinate, a size and the like in the image data.

In this embodiment, the step S2 specifically includes:

step S21: obtaining the optimal value of the hyper-parameters in the relevant training files according to multiple experiments, and optimizing the hyper-parameters to enable the training model to be optimal;

step S22: setting the momentum parameter momentum in the momentum gradient descent to 0.9 in the training file of yolov4 can effectively prevent the loss function from falling into the local minimum value in the network training, accelerate the speed of gradient convergence to the optimal value, and set the regular coefficient decay of weight attenuation to 0.0005, thereby effectively preventing overfitting;

step S23: if the learning rate is too large, the weight updating speed is high, but the selection of the optimal value is easy to miss, if the learning rate is too small, the weight updating speed is slow, the training efficiency is low, the training speed and the selection of the optimal value can be effectively improved by setting a suitable learning rate, the learning rate is adjusted by adopting a steps method, and the learning rate is attenuated by a certain multiple when a certain number of iterations is reached;

d(box,anchor)＝1-IOU(box,anchor)

In this embodiment, the step S3 specifically includes:

step S31: the darknet53 is a backbone of yolov4, after data is input, the darknet53 firstly uses Conv2D to process the input data, L2 regularization is carried out during convolution, batch standardization is carried out after the convolution is finished, a LeakyReLU activation function is used, then, a residual convolution block in the darknet53 is used for carrying out convolution on the output once by 3 x 3 and with the step length of 2, then, the convolution layer is stored and is carried out again by carrying out convolution by 1 x 1 and 3 x 3, and the result and the stored convolution layer are added to serve as a final result, so that the depth of the neural network is increased, and the problem of gradient disappearance caused by the increase of the depth in the deep neural network is relieved. After the input data passes through the feature extraction network darknet53, three feature maps with different sizes are obtained, namely (52, 256), (26, 512), (13, 1024);

step S32 is to extract three feature maps of different sizes through the feature extraction network darknet53, and a part of results obtained by performing operations such as convolution on the three initial feature maps is used to output a prediction result corresponding to the feature map, and another part is used to perform deconvolution and then combine with other feature maps. Finally, the prediction results of three effective feature maps are obtained, the sizes of the prediction results are (255,13,13), (255,26,13) and (255,52,52), and the positions of 3 prediction frames on the grids which are divided into 13 × 13, 26 × 26 and 52 × 52 corresponding to each map are obtained;

step S33: the prediction result obtained after processing the feature map is not a final prediction frame, a preset prior frame needs to be adjusted by using the prediction result of the yolov4 network, the feature map is firstly divided into S multiplied by S grids, the preset prior frame is adjusted to an effective feature map, then coordinate information x _ offset, y _ offset, h and w of the prior frame is obtained from the prediction result of the yolov4 network, the center point coordinate of the prior frame corresponding to the grid is subjected to sigmoid function processing and then is added with the corresponding x _ offset and y _ offset to obtain the center of the prediction frame, then the width and height of the prediction frame are obtained by using h and w to calculate, and finally the size and position information of the prediction frame are obtained;

step S34: each target usually has a plurality of prediction boxes, and maximum suppression is required to be performed to search for a prediction box with the highest confidence degree and suppress a prediction box with a lower confidence degree in a local search among candidate targets, which is essentially a process of searching for a local maximum value according to an intersection ratio IOU. The cross-over ratio IOU is an important index for measuring the overlapping degree of two prediction frames, and is provided with two detection frames B₁And B₂Then the intersection between the two is as follows:

when the non-maximum value is restrained, firstly, the prediction frames of the same target are sorted from large to small according to the confidence level, then the prediction frame with the highest confidence level is taken out to be respectively compared with the rest prediction frames to calculate the IOU, if the calculation result is larger than the set threshold value, the prediction frame is restrained and cannot be output as a result, and after all the prediction frames are calculated, the prediction frame with the highest confidence level in the rest prediction frames is taken out to repeat the operation.

In this embodiment, the step S4 specifically includes the following steps:

step S42: cascade corerpoling is utilized to predict upper left corner and lower right corner points of the object and extract the maximum value max of the object boundary₁Finding the maximum max inwards along the boundary maximum₂Then, adding and capturing the boundary features to provide semantic information of the associated object for the corner features so as to obtain a corner thermodynamic diagram;

In this embodiment, the step S5 specifically includes:

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. An abnormal dressing detection method based on yolov4 and CenterNet is characterized by comprising the following steps:

2. The abnormal dressing detection method based on yolov4 and centrnet of claim 1, wherein the step S1 specifically comprises:

step S12: manufacturing a data set according to the requirements of a yolov4 model, dividing all data into a training set and a testing set according to a proportion, and generating a txt file required by a training model according to an xml file containing image data labeling information;

3. The abnormal dressing detection method based on yolov4 and centrnet of claim 1, wherein the step S2 specifically comprises:

step S21: acquiring an optimal value of the hyper-parameter, and adjusting the hyper-parameter to optimize the training model;

step S23: a step method is adopted for the learning rate adjustment mode;

the value of the IOU is between 0 and 1, the more similar the two box are, the larger the value of the IOU is, and the final measurement formula is as follows:

d(box,anchor)＝1-IOU(box,anchor)

4. The abnormal dressing detection method based on yolov4 and centrnet of claim 1, wherein the step S3 specifically comprises:

step S32, according to the extracted feature maps with different sizes, one part of the results obtained by convolving the three initial feature maps is used for outputting the prediction results corresponding to the feature maps, and the other part of the results is used for combining with other feature maps after deconvolution, so that the prediction results of the three effective feature maps are finally obtained;

step S34: and according to the adjusted prediction frame, carrying out maximum suppression, carrying out local search in the candidate target to find the prediction frame with the highest confidence coefficient and suppressing the prediction frame with the lower confidence coefficient.

5. The abnormal dressing detection method based on yolov4 and CenterNet of claim 4, wherein the step S33 specifically comprises:

6. The abnormal dressing detection method based on yolov4 and CenterNet of claim 4, wherein the step S34 specifically comprises:

7. The abnormal dressing detection method based on yolov4 and CenterNet as claimed in claim 1, wherein said step S4 specifically comprises the following steps:

step S41: extracting the maximum values of the central point in the horizontal direction and the vertical direction through center firing, and adding the maximum values to provide information except the position of the central point, thereby obtaining a central point thermodynamic diagram;

step S42: cascade coder firing is utilized to predict the upper left corner point and the lower right corner point of the object and extract the maximum value max of the boundary of the object₁Finding the maximum max inwards along the boundary maximum₂Then, adding and capturing the boundary features to provide semantic information of the associated object for the corner features so as to obtain a corner thermodynamic diagram;

step S44: performing 3 × 3 maximum pooling convolution on the thermodynamic diagram, finding and screening the top k maximum points by a _ topk function to represent the existence target, and obtaining topk _ score, topk _ inds, topk _ clses, topk _ ys and topk _ xs, wherein topk _ score represents the top k maximum scores of each category, topk _ inds represents the lower corner marks of k scores corresponding to each category, the top k maximum scores of all categories of topk _ clses, and topk _ ys and topk _ xs are used for extracting results from corresponding tensors according to the lower corner marks;

8. The abnormal dressing detection method based on yolov4 and CenterNet as claimed in claim 1, wherein the step S5 specifically comprises: