CN113920475A

CN113920475A - Security protection equipment identification method based on autonomous learning strategy and storage medium

Info

Publication number: CN113920475A
Application number: CN202111270186.9A
Authority: CN
Inventors: 马碧芳; 王伟; 吴衍
Original assignee: Fujian Polytechnic Normal University
Current assignee: Fujian Polytechnic Normal University
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-11

Abstract

The invention belongs to the technical field of security monitoring, and particularly relates to a safety protection equipment identification method based on an autonomous learning strategy and a storage medium. The method comprises the following steps: s1, collecting a training picture set; s2, preprocessing the training picture set; s3, training the constructed depth network model according to the training picture set; s4, inputting the picture set to be detected into the deep network model for recognition to obtain a recognition result set; s5, classifying the recognition result set into a recognition success set and a recognition failure set; s6, outputting a recognition success set; and S7, taking the picture set to be detected corresponding to the unrecognizable set as a new training picture set, and jumping to S2 for continuous execution. The method does not need to occupy a large amount of artificial resources and computing resources, generates new training samples in a semi-automatic mode, and is suitable for different complex scenes. By introducing the weight of the features and a label smoothing mechanism, the accuracy of obtaining the network features is guaranteed and overfitting is effectively prevented.

Description

Security protection equipment identification method based on autonomous learning strategy and storage medium

Technical Field

The invention belongs to the technical field of security monitoring, and particularly relates to a safety protection equipment identification method based on an autonomous learning strategy and a storage medium.

Background

The safety behavior of wearing protective equipment such as safety helmets and the like is an effective means for preventing unsafe accidents in construction places and protecting constructors.

Traditional protective equipment dresses detection method and adopts artifical the detection, has defects such as consuming time, human cost height, false retrieval rate height, and along with computer technology's development, more and more construction places progressively introduce intelligent detecting system, but most intelligent detection technique all is based on traditional machine vision technique, includes:

1) based on the traditional digital image processing plus machine learning method. The method is characterized by firstly manufacturing features (feature engineering) manually based on human experience, then training samples by utilizing a machine learning method (such as support vector machine and template matching), and further judging whether workers wear safety protection equipment.

2) A safety protection device wearing detection method based on deep learning. Compared with the traditional machine learning method, deep learning generally does not need various characteristic projects, adaptability and popularization are stronger, more accurate target detection results can be obtained under the training of a large number of samples, and the method is widely applied in recent years, but model reasoning has high requirements on sample resources and cannot meet the use requirements under different resource scenes.

Aiming at the problems, how to design a safety protection equipment identification method of an autonomous learning strategy only needs a small part of sample training to carry out identification work, and can continuously optimize and perfect a model in the identification process is a technical problem to be solved urgently.

Disclosure of Invention

One of the objectives of the present invention is to overcome the above disadvantages and to provide a method for identifying a safety protection device with an autonomous learning strategy, in which a small number of manually labeled training sample sets are required, and a first trained network model is only a target network model capable of better detecting an input video frame, so as to reduce the time and effort spent on manually labeled samples, and then, in actual production practice, artificial factors are added twice, and a new training sample is generated semi-automatically to readjust network model parameters and optimize model parameters.

In order to solve the above problem, the present application provides a method for identifying a security protection device based on an autonomous learning policy, including the following steps:

s1, collecting a training picture set;

s2, preprocessing the training picture set;

s3, training the constructed depth network model according to the training picture set;

s4, inputting the picture set to be detected into the deep network model for identification to obtain an identification result set consisting of identification results of each picture to be detected;

s5, classifying the recognition result set into a recognition success set and a recognition failure set;

s6, outputting a recognition success set;

and S7, taking the picture set to be detected corresponding to the unrecognizable set as a new training picture set, and jumping to S2 for continuous execution.

Further, the preprocessing the training picture set includes the following steps:

s21, carrying out normalization processing on each training picture;

s22, marking the region containing the target to be detected in each training picture;

and S23, generating a corresponding training data file for each marked training picture.

Further, the training of the constructed deep network model according to the training picture set includes the following steps:

s31, obtaining the prior frame position of the target to be detected in the training picture by using a k-means clustering method according to the training data file;

s32, extracting the network features of the training pictures by using a weighted bidirectional feature pyramid method;

s33, predicting the target to be detected in the training picture according to the position of the prior frame and the network characteristics of the training picture to obtain a prediction result;

s34, comparing the prediction result with the labeled training picture, and adjusting model parameters according to the difference to optimize a depth network model;

the above-mentioned S31 to S34 are repeatedly executed, and the processing of all the training pictures in the training picture set is completed.

Further, the method for acquiring the prior frame position of the target to be detected in the training picture by using a k-means clustering method according to the training data file comprises the following steps:

s311, randomly selecting k marking frames from the training picture as initial prior frames;

s312, calculating IoU values of each labeling box and each prior box in the training picture; the IoU is the ratio of the intersection of the prior box and the labeled box to the union of the prior box and the labeled box;

s313, calculating the distance between each labeling frame and each clustering center, wherein the clustering center is the center point of the prior frame, and the distance calculation formula is as follows:

d＝1-IoU[(x_i，y_i，w_i，h_i)，(x_j，y_j，W_j，H_j)]

wherein i belongs to {1, 2.. eta.,. k }, j belongs to {1, 2.. eta.,. N }, k is the number of prior frames, N is the number of marking frames, and x belongs to {1, 2.. eta.,. K }, wherein k is the number of prior frames, N is the number of marking frames, and x belongs to_iAbscissa, y, representing the center point of the ith prior box_iOrdinate, w, representing the centre point of the ith prior frame_iIs the width of the ith prior frame, h_iIs the height, x, of the ith prior box_jAbscissa, y, representing the center point of the jth mark box_jOrdinate, W, representing the centre point of the jth mark box_jFor the jth width of the box, H_jThe height of the jth marking box;

s314, allocating each marking frame to the nearest clustering center;

s315, after all the label boxes are distributed, recalculating the clustering center for each cluster, and updating the width and the height of each prior box, wherein the calculation method is as follows:

wherein, W'_iIs the width of the ith prior box, H'_iIs the height of the ith prior frame, i belongs to {1, 2., k }, k is the number of prior frames, i.e. the number of clusters, N_iIs the number of the labeled boxes in the ith cluster, w_ijIs the width value, h, of the jth label box in the ith cluster_ijIs the height value of the jth marking box in the ith cluster, and j belongs to {1, 2_i}；

And repeating the steps S312 to S315 until the distance d between each labeling frame and each clustering center is small and a set threshold value is reached, so as to obtain the final prior frame position.

Further, the network features of the training picture are extracted by using a weighted bidirectional feature pyramid method, and the weight calculation mode of the features is as follows:

wherein, w_iRepresents a weight obtained by learning, I_iAnd the output of the upper network is represented, i belongs to {1, 2., M }, j belongs to {1, 2., M }, and M is the input number of the neural network.

Further, the prediction result comprises: the method comprises the steps of predicting the center coordinates of a frame, the width and the height of the frame, the probability that the frame contains a target to be detected and the probability that the target to be detected in the frame belongs to the class of the target.

Further, the method for optimizing the deep network model according to the difference adjustment model parameters comprises the steps of regularizing a classifier through marginal effect discarded by a pre-estimation training process label, and calculating q' (k | x) to replace an actual sample x, wherein the calculation formula is as follows:

q′(k|x)＝(1-θ)δ_k，y+θu(k)

wherein u (k) is label distribution, x is a training sample, θ is a smoothing parameter, y is an input label, δ_k，yAre parameters obtained by learning.

Accordingly, the present application also provides a computer readable storage medium having one or more programs stored thereon that are executable by one or more processors to perform the steps of any of the above methods for autonomous learning policy based secure gatekeeper identification.

The technical scheme of the invention has the beneficial effects that:

1. the depth network model trained for the first time in the method is only a target network model capable of well detecting the input video frame, time and energy spent on manually marking the sample are reduced as much as possible, a large amount of manual resources and calculation resources are not required to be occupied, and the method is suitable for different resource complex scenes. In actual production practice, artificial factors are added secondarily, and a new training sample is generated semi-automatically and used for readjusting network model parameters and optimizing the model parameters.

2. According to the method, the prior frame position of the target to be detected in the training picture is obtained by using a k-means clustering method, so that the classification can be ensured to be more accurate, and the accuracy of subsequent target detection can be improved.

3. According to the method, weight calculation of the features is introduced into the network features of the training picture extracted by using a weighted bidirectional feature pyramid method, different weight values can be given according to the importance of different features, the contribution degrees of different features can be reflected, and the accuracy of obtaining the network features is guaranteed.

4. According to the method, a mechanism for regularizing the classifier by pre-estimating marginal effect of label discarding in the training process is provided, a label smoothing mechanism is provided to encourage the model to reduce prediction, and overfitting can be effectively prevented.

Drawings

Fig. 1 is a flow chart of steps of an autonomous learning policy-based safety protection device identification method of the present invention.

FIG. 2 is a flowchart illustrating the steps of pre-processing a set of training pictures according to the present invention.

FIG. 3 is a flowchart illustrating the steps of training a deep web model according to a set of training pictures according to the present invention.

FIG. 4 is a flow chart of the steps of the present invention for obtaining the prior box position using the k-means clustering method.

FIG. 5 is a schematic diagram of multi-scale feature fusion in accordance with the present invention.

FIG. 6 is a schematic diagram of a classification prediction network structure model according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the method for identifying a security protection device based on an autonomous learning policy according to the present invention includes the following steps:

and S1, collecting a training picture set. Extracting video key frames containing detection targets from videos collected by a high-definition camera to obtain a plurality of original sample pictures to form a training picture set. The detection target refers to safety protection equipment on a construction site, such as a safety helmet or a reflective garment.

And S2, preprocessing the training picture set. As shown in fig. 2, it is a flowchart of the steps of preprocessing the training picture set according to the present invention, and the method includes the following steps:

and S21, carrying out normalization processing on each training picture. Image normalization refers to a process of transforming an image into a fixed standard form by performing a series of standard processing transformations on the image, and the standard image is called a normalized image. The original image can obtain a plurality of duplicate images after being subjected to some processing or attack, and the images can obtain standard images in the same form after being subjected to image normalization processing with the same parameters. The image normalization technique can be classified into linear normalization and non-linear normalization.

And S22, labeling the region containing the target to be detected in each training picture.

The pixel area containing personnel wearing safety protection equipment in each training picture is marked by a rectangular frame manually, wherein one training picture can contain a plurality of rectangular frames, namely a plurality of target areas, but each rectangular frame only contains one target to be detected.

And storing the labeled label result in an xml file according to the VOC data set format. In general, each training picture corresponds to an xml file with the same name as the training picture, and the xml file contains the coordinate information (x, y) of the center point of each rectangular frame in the picture, the width w of the rectangular frame and the height h of the rectangular frame. Meanwhile, the size normalization is performed on the label in each picture, namely, (x, y, w, h), namely, the label is mapped to a sub-region under 64 × 64 coordinates according to the unchanged aspect ratio of the original image in the original image coordinate system, and the transformed coordinate information is stored in an xml file and used as the input of a subsequent training target domain model.

And S3, training the constructed depth network model according to the training picture set. As shown in fig. 3, it is a flowchart of the steps of training the deep network model according to the training picture set in the present invention, and the method includes the following steps:

and S31, obtaining the prior frame position of the target to be detected in the training picture by using a k-means clustering method according to the training data file. As shown in fig. 4, it is a flowchart of the step of obtaining the prior frame position by using the k-means clustering method of the present invention, and the method includes the following steps:

s311, randomly selecting k marking frames from the training picture as initial prior frames.

S312, IoU values of each mark box and each prior box in the training picture are calculated, wherein IoU is the ratio of the intersection of the areas of the prior box and the mark box to the union of the areas of the prior box and the mark box. Usually, when IoU values are calculated, the center point of each labeling box coincides with the center point of the prior box, and then the intersection and union of the areas are calculated, and finally the ratio is calculated.

d＝1-IoU[(x_i，y_i，w_i，h_i)，(x_j，y_j，W_j，H_j)]

wherein i belongs to {1, 2.. eta.,. k }, j belongs to {1, 2.. eta.,. N }, k is the number of prior frames, N is the number of marking frames, and x belongs to {1, 2.. eta.,. K }, wherein k is the number of prior frames, N is the number of marking frames, and x belongs to_iAbscissa, y, representing the center point of the ith prior box_iOrdinate, w, representing the centre point of the ith prior frame_iIs the width of the ith prior frame, h_iIs the height, x, of the ith prior box_jAbscissa, y, representing the center point of the jth mark box_jOrdinate, W, representing the centre point of the jth mark box_jFor the jth width of the box, H_jThe height of the box is marked for the jth.

And S314, allocating each marking frame to the nearest cluster center, and dividing the N marking frames into k clusters.

wherein, W'_iIs the width of the ith prior box, H'_iIs the height of the ith prior frame, i belongs to {1, 2., k }, k is the number of prior frames, i.e. the number of clusters, N_iIs the number of the labeled boxes in the ith cluster, w_ijIs the width value, h, of the jth label box in the ith cluster_ijIs the height value of the jth marking box in the ith cluster, and j belongs to {1, 2_i}。

And repeating the steps S312 to S315, continuously classifying the marking frames again and updating the width and the height of the prior frame until the distance d between each marking frame and each clustering center is small and a set threshold value is reached to obtain the final prior frame position.

According to the method, the prior frame position of the target to be detected in the training picture is obtained by using a k-means clustering method, so that the classification can be ensured to be more accurate, and the accuracy of subsequent target detection can be improved.

And S32, extracting the network features of the training picture by using a weighted bidirectional feature pyramid method.

The network features of the training pictures are extracted by using the weighted bidirectional feature pyramid, so that multi-scale feature fusion extraction features can be quickly and simply realized, and multi-scale feature fusion from bottom to top and from top to bottom is repeatedly used, as shown in figures 5-6, the multi-scale feature fusion diagram and the classification prediction network structure model diagram are shown. In the process of extracting the network characteristics of the training picture by using a weighted bidirectional characteristic pyramid method, the input is

Wherein P represents the fused feature scale, li represents the different levels of input, i.e., different network depths, and a transformation is aimed at converting a series of features of different scales into an output by multi-scale feature fusion

In the multi-scale feature fusion process, the importance of different input features is different, and the contribution degree of each input feature to the output feature is also different, so the method provides a way of introducing additional weight in the feature fusion process to embody the importance of different input features, the main way is rapid normalized fusion, and the weight calculation formula of the features is as follows:

wherein, w_iRepresents a weight obtained by learning, I_iAnd the output of the upper network is represented, i belongs to {1, 2., M }, j belongs to {1, 2., M }, and M is the input number of the neural network. Weight w_iAll use the ReLu activation function to ensure w_iNot less than 0, and a very small value of 0.0001. In addition, the final weight value is normalized to 0 to 1 by such a process.

According to the method, weight calculation of the features is introduced into the network features of the training picture extracted by using a weighted bidirectional feature pyramid method, different weight values can be given according to the importance of different features, the contribution degrees of different features can be reflected, and the accuracy of obtaining the network features is guaranteed.

S33, predicting the target to be detected in the training picture according to the position of the prior frame and the network characteristics of the training picture to obtain a prediction result. In a specific embodiment, the prediction result includes: the method comprises the steps of predicting the center coordinates of a frame, the width and the height of the frame, the probability that the frame contains a target to be detected and the probability that the target to be detected in the frame belongs to the class of the target.

And S34, comparing the prediction result with the labeled training picture, and adjusting model parameters according to the difference to optimize the depth network model. For example, parameters such as the learning rate of the neural network model are adjusted to obtain a better prediction result. In addition, in order to prevent overfitting, the method of the application provides a mechanism for regularizing the classifier by pre-estimating the marginalization effect discarded by the training process label, and for each training sample x, the probability that the model predicts the label K ∈ {1 … K } is as follows:

where Zi is an unnormalized logical probability value, considering that q' (k | x) is actually used instead of p (k | x), the cross entropy loss function is defined as:

the minimization loss function is equivalent to the maximum likelihood function, where Z_kCan be measured by:

the above method fits the true probability with the predicted probability, two problems may occur: (1): model overfitting (2): the predicted value is encouraged to be far larger than the label value, and the combination of gradient bounding property can weaken the model generalization, so the method provided by the application encourages the model to reduce the prediction, namely label smoothing. Let us assume a label distribution u (k), introduce a smoothing parameter θ for a training sample x, and calculate q' (k | x) instead of the actual sample x, the calculation formula is as follows:

q′(k|x)＝(1-θ)δ_k，y+θu(k)

wherein u (k) is label distribution, x is a training sample, θ is a smoothing parameter, y is an input label, δ_k，yAccording to the method, a mechanism for regularizing the classifier by pre-estimating marginal effect of label discarding in the training process is provided, a label smoothing mechanism is provided to encourage the model to reduce prediction, and overfitting can be effectively prevented.

And repeatedly executing the steps from S31 to S34 to complete the processing of all the training pictures in the training picture set, so as to obtain the depth network model after the first training.

And S4, inputting the picture set to be detected into the deep network model for identification to obtain an identification result set consisting of identification results of each picture to be detected. And inputting the framed area suspected to contain the detection target into the trained deep network model for detection, and judging whether the suspected area contains the target to be detected. At this time, the result output by the deep network model detection also includes the center coordinate of the prediction frame, the width and the height of the prediction frame, the probability that the prediction frame contains the target to be detected, and the probability that the target to be detected in the prediction frame belongs to the object class.

And S5, classifying the recognition result set into a recognition success set and a recognition failure set. And according to the result output by the detection of the deep network model, if a detection target is identified, classifying the detection target into a successful identification set, and if the target cannot be identified, classifying the detection target into a non-identification set.

And S6, outputting the identification success set.

And S7, taking the picture set to be detected corresponding to the unrecognizable set as a new training picture set, sequencing according to the probability of the predicted target, manually checking and marking the picture set, taking the picture set as a training sample set again, jumping to S2 to continue execution, and retraining and adjusting the model.

The depth network model trained for the first time in the method is only a target network model capable of well detecting the input video frame, time and energy spent on manually marking the sample are reduced as much as possible, a large amount of manual resources and calculation resources are not required to be occupied, and the method is suitable for different resource complex scenes. In actual production practice, artificial factors are added secondarily, and a new training sample is generated semi-automatically and used for readjusting network model parameters and optimizing the model parameters.

Preferably, the present application further provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of any of the above autonomous learning policy-based secure guard identification methods.

The above embodiments are merely illustrative of the technical solutions of the present invention, and the present invention is not limited to the above embodiments, and any modifications or alterations according to the principles of the present invention should be within the protection scope of the present invention.

Claims

1. A safety protection device identification method based on an autonomous learning strategy is characterized by comprising the following steps:

s1, collecting a training picture set;

s2, preprocessing the training picture set;

s6, outputting a recognition success set;

2. The method for identifying safety protection equipment based on the autonomous learning strategy as claimed in claim 1, wherein the preprocessing the training picture set comprises the following steps:

s21, carrying out normalization processing on each training picture;

3. The method for identifying safety protection equipment based on the autonomous learning strategy as claimed in claim 2, wherein the training of the constructed deep network model according to the training picture set comprises the following steps:

4. The safety protection device identification method based on the autonomous learning strategy as claimed in claim 3, wherein the obtaining of the prior frame position of the object to be detected in the training picture by using a k-means clustering method according to the training data file comprises the following steps:

d＝1-IoU[(x_i，y_i，w_i，h_i)，(x_j，y_j，W_j，H_j)]

s314, allocating each marking frame to the nearest clustering center;

wherein, W_i'is the width of the ith prior box, H'_iIs the height of the ith prior frame, i belongs to {1, 2., k }, k is the number of prior frames, i.e. the number of clusters, N_iIs the number of the labeled boxes in the ith cluster, w_ijIs the width value, h, of the jth label box in the ith cluster_ijIs the height value of the jth marking box in the ith cluster, and j belongs to {1, 2_i}；

5. The method for identifying safety protection equipment based on the autonomous learning strategy as claimed in claim 3, wherein the network features of the training picture are extracted by using a weighted bidirectional feature pyramid method, and the weight calculation mode of the features is as follows:

6. The autonomous learning strategy based security device identification method of claim 3 wherein the predicted outcome comprises: the method comprises the steps of predicting the center coordinates of a frame, the width and the height of the frame, the probability that the frame contains a target to be detected and the probability that the target to be detected in the frame belongs to the class of the target.

7. The method for identifying safety equipment based on an autonomous learning strategy according to claim 3, wherein the adjusting model parameters according to the difference to optimize the deep network model comprises: regularizing the classifier by pre-estimating marginal effect discarded by the labels in the training process, and calculating q' (k | x) to replace an actual sample x, wherein the calculation formula is as follows:

q′(k|x)＝(1-θ)δ_k，y+θu(k)

8. A computer readable storage medium, storing one or more programs, which are executable by one or more processors to perform the steps of the method for autonomous learning policy based secure guard identification according to any of claims 1-7.