CN111832443A

CN111832443A - Construction method and application of construction violation detection model

Info

Publication number: CN111832443A
Application number: CN202010601260.XA
Authority: CN
Inventors: 韩守东; 陈国荣; 马迪; 刘巾英; 陈阳
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-10-27
Anticipated expiration: 2040-06-28
Also published as: CN111832443B

Abstract

The invention relates to the field of illegal behavior detection, and particularly discloses a construction method and application of a construction illegal behavior detection model, wherein the construction method comprises the following steps: carrying out similarity matching on images in the original image sample set, carrying out foreground exchange on each group of image pairs and carrying out style rendering on all exchanged images to obtain an expanded image sample set; training a single-stage target detection network based on an original image sample set and an extended image sample set, wherein network parameters are optimized according to image information and prediction results of a class prediction branch and a position prediction branch in the detection network during training, and finally, detecting violation behaviors based on a constructed violation behavior detection model. The method enhances the matching similarity of the original samples, greatly improves the scale of the training samples and the training precision of the model, considers the context information during training, and regularizes the network by adding the context information to the network, thereby greatly improving the accuracy of target detection.

Description

Construction method and application of construction violation detection model

Technical Field

The invention belongs to the field of illegal behavior detection, and particularly relates to a construction method and application of a construction illegal behavior detection model.

Background

In the construction process, the construction safety guarantee is an important link for improving the production efficiency, improving the enterprise benefit and guaranteeing the safety of staff. A set of intelligent monitoring system is designed and produced, so that automatic real-time and accurate identification, judgment and positioning of workers can be completed in a complex and changeable industrial environment, and the intelligent monitoring system has important significance. Specifically, the system requires that workers in the lens range can be detected in real time through the camera, whether behaviors of not wearing safety helmets and not wearing safety belts exist or not is judged, and if the workers with illegal behaviors are found, an alarm is immediately sent through the alarm system, so that reliable information is provided for background monitoring personnel. The method can improve the safety of workers in the production environment and can improve the guarantee of safety production of enterprises.

The current schemes for violation detection are mainly: if the input is set as a video segment, a method based on multi-frame processing, such as C3D, I3D, a dual stream method, skeleton-based behavior detection, or the like, may be employed; if the input is set as a single frame, methods based on a single frame image, such as image classification, object detection, may be employed. For the current mainstream video processing method based on multiple frames, such as a double-flow method, simultaneously inputting a single-frame image and calculating a light flow graph based on a multi-frame video, extracting characteristics through a convolutional neural network, and fusing the single-frame characteristics and the light flow characteristics as characteristic expressions of a video segment, wherein the light flow extraction time is long, and the requirement on real-time performance cannot be met; the 3D convolution-based method, such as C3D and I3D, endows the neural network with the capability of capturing time sequence behaviors by adding the dimension of time to the input of the neural network, requires the input video to be convolved in time and space at the same time, has high computational complexity, is difficult to train, and may not be used in some monitoring systems with limitations on computational resources. In addition, many mainstream methods based on video segment behavior detection at present can only give frame level prediction, that is, only detect the time period in which the violation occurs, and cannot give frame location of the specific violation personnel. As the human body key point localization technology matures, researchers have begun to attempt behavior recognition through human body skeletons. Most of methods based on human body skeletons extract key points from people in videos, the key points form the human body skeletons, and the features of the human body skeletons are extracted through a convolutional neural network, so that behavior recognition is performed. However, the input of such methods is a human body key point, and the interaction between the human body and the object cannot be modeled. However, in the illegal behavior, a large number of human-object interaction scenes exist, and if the safety helmet is worn, the method is not suitable for detecting the illegal behavior of the monitoring view angle. (2) The image processing method based on the single frame is characterized in that pure image classification can only give a single frame prediction result and cannot give the occurrence position of the illegal action.

Disclosure of Invention

The invention provides a construction violation detection model construction method and application thereof, which are used for solving the technical problem that the type and position of a violation cannot be accurately and simultaneously detected in the conventional construction violation detection method.

The technical scheme for solving the technical problems is as follows: a construction method of a construction violation detection model comprises the following steps:

acquiring an original image sample set of construction violation behaviors;

carrying out similarity matching on the images in the original image sample set to obtain a plurality of groups of image pairs, carrying out foreground exchange on each group of image pairs, and carrying out style rendering on all the exchanged images to eliminate edge effect caused by the exchange to obtain an extended image sample set;

and training a single-stage target detection network based on the original image sample set and the extended image sample set to obtain a violation behavior detection model, wherein network parameters are optimized according to the information of the image and the prediction results of the class prediction branch and the position prediction branch in the detection network during training.

The invention has the beneficial effects that: the method firstly performs similarity matching on images in an original image sample set, performs foreground exchange on each group of image pairs, and performs style rendering on all exchanged images, so as to realize expansion of the original image sample set, is a matching similarity data enhancement method for detection of construction violation, enhances sample diversity of the violation data set, is suitable for most excellent deep learning networks, and has remarkable effect and high robustness; the invention further provides a violation behavior detection model based on the context information, the context information is considered during training, the network is regularized by adding the context information, the trained detection model can provide the context information for the violation behavior identification stage, the accuracy of target detection is obviously improved, the problem that the classification accuracy is not high due to the fact that the feature extraction of the full-image information is omitted in the prior art is solved, and the violation behavior detection model is compatible with most target detection networks, simple in construction method, strong in manufacturability and high in robustness. The construction violation detection model construction method integrating the context information and the pairing similarity data enhancement only increases the training time, does not lose the testing speed of the network or the actual violation detection speed at all, and meets the requirement of industrial application on the speed.

On the basis of the technical scheme, the invention can be further improved as follows.

Further, the similarity matching is realized in the following manner:

training a violation multi-classification network by adopting a part of samples in the original image sample set;

randomly extracting two images from the original image sample set, adopting a trained feature extraction unit in the multi-classification network to respectively extract features of the two images and calculate the cosine similarity between the features of the two images; if the similarity is greater than the threshold value, the pairing is successful; otherwise, the matching fails, new two images are randomly extracted from the original image sample set again, and the process is repeated until the stop requirement is met, so that a plurality of groups of successfully matched image pairs are obtained.

The invention has the further beneficial effects that: by randomly extracting samples, pictures in various scenes can be sampled for matching, and the diversity of the samples can be enriched.

Further, the foreground exchange is realized in the following manner:

respectively cutting the target foreground of each image in each group of image pairs, and filling up the cut holes in each image to obtain the image background image of each image;

calculating a target foreground A of the image A₁Pasting the image background image B to the image B matched with the image A₂On pasting confidence map B₃(ii) a Randomly selecting the pasting confidence coefficient map B₃Any pixel point with the middle confidence value exceeding the threshold value is arranged in the image background image B₂The target foreground A₁And pasting the image A in an area with the pixel point as the center, wherein the image A is each image in the multiple groups of image pairs.

Further, the cavity is a rectangular frame which encloses the target contour.

The invention has the further beneficial effects that: the time cost of the rectangular frame labeling is lower than that of the fine mask labeling, and the advantage of the rectangular frame labeling cost is more obvious when the number of the pictures is larger, so that the method is more in line with the requirement of industrial application on the cost.

Further, the image background image B₂The target foreground A₁Pasting the foreground object in an area with the pixel point as the center, and then, implementing the foreground exchange further comprises:

pasting a target foreground A₁The image background image B₂In the middle, the pasted edge is processed with Gaussian blur to weaken the edge effect caused by pasting.

The invention has the following further beneficial effects: by the aid of the Gaussian fuzzy edge, the pasted image is not too abrupt, and reasonability of a training sample for training the detection network is guaranteed.

Further, the style rendering is realized by:

all images after foreground exchange are regarded as a domain I and the original image sample set is regarded as a domain II and used as input of an annular generation countermeasure network, and the annular generation countermeasure network is trained, so that the annular generation countermeasure network can transfer the images of the domain I into the style of the images of the domain II, and the edge effect caused by exchange is reduced;

and inputting each image after foreground exchange into the trained annular countermeasure network again to obtain an image after style migration, wherein all the images after style migration form the extended image sample set.

The invention has the further beneficial effects that: the image can be better integrated into the background after being generated into the countermeasure network through the ring, so that the influence of the edge effect caused by the rectangular frame can be further reduced, and the colors of the foreground and the background can be more uniform and more real.

Further, the network parameters are adjusted and optimized according to the information on the image and the prediction results of the category prediction branch and the position prediction branch in the detection network, and the implementation mode is as follows:

the feature graph output by each layer of convolution layer of the FPN feature extraction unit in the single-stage target detection network is connected with a category prediction branch and a position prediction branch, and the feature graph output by the last layer of convolution layer is also connected with a context information branch which is used for carrying out whole-graph classification prediction;

and training the FPN characteristic extraction unit parameters based on the addition and the back transmission of the prediction loss values of all the branches.

The invention has the further beneficial effects that: the feature graph output by the last layer of the convolutional layer of the FPN feature extraction unit is connected with a context information branch, and the context information branch is used for carrying out full-graph classification prediction during network training, so that the FPN feature extraction unit is enabled to adaptively retain effective context information until the last layer, and the useful information can help the class prediction branch to be better classified. The accuracy of the class prediction and the position prediction of the detection model is greatly improved.

Further, the context information branch is a full-graph multi-classification branch, and is specifically configured to:

respectively carrying out global average pooling and global maximum pooling on the feature map output by the last layer of convolutional layer to obtain two feature vectors;

taking the sum of the two feature vectors as a final feature vector, or taking the feature vector obtained by the global average pooling as the final feature vector, or taking the spliced vector of the two feature vectors as the final feature vector;

and enabling the final feature vector to sequentially pass through two full-connection layers and one sigmoid layer to obtain a full-graph classification prediction result.

The invention also provides a construction violation detection method, which comprises the following steps:

and acquiring a construction image to be detected, inputting the construction image to be detected into the construction violation detection model constructed by the construction method of the construction violation detection model, and obtaining whether the construction violation exists and the category and the position of the construction violation based on the output results of the category prediction branch and the position prediction branch.

The invention has the beneficial effects that: the construction violation detection method integrating the context information and the pairing similarity data enhancement simultaneously detects the type and the occurrence position of the violation, can obviously improve the violation detection precision, realizes real-time and effective alarm on the violation of workers, and meets the requirement of industrial application on speed.

The present invention also provides a computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to implement a construction violation detection model construction method as described above and/or a construction violation detection method as described above.

Drawings

Fig. 1 is a flowchart of a construction violation detection model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of image similarity matching according to an embodiment of the present invention;

FIG. 3 is a drawing showing a step D according to an embodiment of the present invention_AAnd D_BA schematic diagram;

fig. 4 is a general flowchart of a method for enhancing matching similarity data according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a Net-Context network structure according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a branch structure of context information according to an embodiment of the present invention;

fig. 7 is a general flowchart of construction violation detection model construction and application thereof according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example one

A method 100 for constructing a construction violation detection model, as shown in fig. 1, includes:

step 110, obtaining an original image sample set of construction violation behaviors;

step 120, performing similarity matching on the images in the original image sample set to obtain a plurality of groups of image pairs, performing foreground exchange on each group of image pairs, and performing style rendering on all the exchanged images to eliminate edge effects caused by the exchange to obtain an extended image sample set;

and step 130, training the single-stage target detection network based on the original image sample set and the extended image sample set to obtain a violation detection model, wherein during training, network parameters are optimized according to the information of the image and the prediction results of the class prediction branch and the position prediction branch in the detection network.

It should be noted that, in step 110, a construction scene monitoring video may be specifically obtained, a construction violation video segment is intercepted and frame extraction is performed to obtain a construction violation original image sample set, and in addition, as required by subsequent training, a label marking tool needs to be used to mark targets related to violations, such as a safety helmet, a safety belt, a mobile phone, and the like, so as to obtain a label value for training. For example, first, the construction violation category and number are set as: the safety helmet-1 is not worn, the mobile phone-2 is carried, the goggles-3 are not worn, the bicycle is ridden on a construction site-4, the safety valve operation-5 is stepped, and the number can be used for code representation of the class of the illegal behaviors in the actual training. Then, relevant illegal action images and videos in the construction scene are collected, 4330 illegal action images in the construction scene are marked (marking type and position) by a labelImg marking tool, and an illegal action original image sample set in the construction scene and label data corresponding to each sample can be obtained.

Secondly, because the deep learning algorithm is a big data-based learning algorithm, the larger the scale and the higher the quality of the training data are, the better the recognition and generalization of the model are. Therefore, a large amount of labeled data is needed for training a violation detection model, but the method is limited by acquisition conditions and labeling cost, the scale of the current data set is not large enough to cover the problems caused by illumination, visual angle and scene change in video monitoring, and data enhancement has very important significance in the detection task with the serious shortage of sample size.

Data enhancement is a collective term for a range of efficient methods for expanding the size of data samples. Data enhancement can not only increase the data volume of training, improves the generalization ability of model, can also increase noise data simultaneously, promotes the robustness of model. The common basic data enhancement method comprises the steps of turning, rotating, zooming, cutting, translating, adding noise and randomly erasing, and although basic data enhancement can bring some effect promotion to a target detection algorithm, the promotion is very limited, the problem of too few samples cannot be solved, and the requirement of actual production cannot be met. The industry explored a new method for enhancing target detection data in the last two years: a data enhancement method based on cut-paste (cut-paste). The method specifically comprises the steps of cutting out the foreground in the image, randomly pasting the foreground in the image to other images under the condition that the foreground in other images cannot be covered, and automatically generating new labels. Specifically, for example, the instaboost algorithm takes the pixel mean value of each pixel point in a certain range on a graph as a measurement standard, picks out all pixel points with similarity greater than a certain threshold with the original center coordinate of the foreground, randomly selects one of the pixel points as a new center coordinate, and then pastes the foreground to a new position and fills the original center position. The install boost method is based on fine mask (mask) labeling for clipping and pasting, although the mask labeling will be well promoted. However, the mask marking cost is very high, and is often tens of times of the frame marking cost, and the effect of the target detection algorithm based on the frame level is not good.

Therefore, aiming at the problem of detection of the illegal behavior of the construction scene, the method improves the existing cutting-pasting method in step 120, provides a matching similarity data enhancement method facing to the detection of the construction illegal behavior, pairs the images in the original image sample set, exchanges the foreground through cutting-pasting, generates a new image, and achieves the purpose of enhancing the diversity of the sample. Because the image generated by direct cutting-pasting is not real enough, the generated image is further rendered, so that more vivid training data is generated, and the sample diversity is successfully enhanced.

In addition, the method is based on a typical single-stage target detection network, a violation behavior detection model Net-Context based on Context information is designed, the network is regularized by adding the Context information to the network, the network precision is improved, and a more accurate current frame violation target detection result is obtained.

Therefore, in the embodiment, similarity matching is performed on images in an original image sample set, foreground exchange is performed on each group of image pairs, and style rendering is performed on all exchanged images, so that expansion of the original image sample set is realized, the method is a matching similarity data enhancement method for detection of construction violation, sample diversity of the violation data set is enhanced, the method is suitable for most excellent deep learning networks, and the effect is remarkable and the robustness is high; the invention further provides a violation behavior detection model based on the context information, the context information is considered during training, the network is regularized by adding the context information, the trained detection model can provide the context information for the violation behavior identification stage, the accuracy of target detection is obviously improved, the problem that the classification accuracy is not high due to the fact that the feature extraction of the full-image information is omitted in the prior art is solved, and the violation behavior detection model is compatible with most target detection networks, simple in construction method, strong in manufacturability and high in robustness. The construction violation detection model construction method integrating the context information and the pairing similarity data enhancement only increases the training time, does not lose the testing speed of the network or the actual violation detection speed at all, and meets the requirement of industrial application on the speed.

The key point of the embodiment is that firstly, a construction scene violation detection data set is manufactured in consideration of the lack of a large public construction scene violation data set at present; secondly, a matching similarity data enhancement method facing illegal behavior detection is designed for expanding a data set, sample diversity is improved, and detection precision of the model for illegal behaviors is generally and effectively improved; thirdly, a violation detection network based on context information is designed, the context information is considered during training to optimize parameters, the context information is introduced into feature extraction of a target detection network, prediction of category prediction and position prediction is guaranteed based on the context information, and detection capability of a model is enhanced; and fourthly, the construction violation behavior detection model construction method integrating the context information and the pairing similarity enhancement only increases the training time, greatly improves the detection capability of the constructed detection model, and does not damage the detection speed of the network.

Preferably, the similarity matching is implemented by:

firstly, a multi-classification network is trained, violations are classified, and the network is used as a feature extraction network (in order to prove the generalization of the feature extraction network and prevent overfitting, the multi-classification network can be trained by only using 50% of images in an original image sample set). Then, two images (marked as image A and image B) are randomly extracted from the original image sample set to form an input image pair, and features are extracted from the two images based on a feature extraction network trained before and the cosine similarity is calculated. If the similarity is larger than a certain threshold value, the pairing is successful, otherwise, the pairing is failed. If the pairing fails, a new image pair is extracted from the original image sample set again, and the above process is repeated until a stop requirement (for example, the number of image pairs requirement) is met, and the pairing unit is as shown in fig. 2.

Preferably, the foreground exchange is implemented by:

After the feature extraction and the similarity comparison are completed, if the images are successfully matched, the foreground (the foreground, namely the target and the worker) of the two images is respectively cut. This leaves a hole in the original image. For the holes, the holes can be filled by adopting the existing image filling method.

A pair of images consisting of image A and image B is formed as follows, and the object foreground of image A is pasted to the image background image B₂For example, the paste process is explained in detail:

using the hole-filled image B (i.e. image B background, image background image B as described above)₂) Calculating a pasting confidence map, specifically, in pasting, in order to ensure consistency of local patterns in pasting so as not to be too disobeyed (for example, pasting a dark object to a bright light place), pixel-by-pixel calculation is requiredFilled-in image B background B₂And the target foreground A in the image A₁Local similarity of the position to obtain a background B₂Equal-size pasting confidence map B₃. Paste confidence map B₃The calculation method comprises the following steps:

a. from the target prospect A₁Starting from the boundary, the image is expanded outward three times, each time the image is expanded by K (K is 5) pixel widths, three boundary regions belonging to the current original image are formed, and the region is recorded as D_A。

b. Using the midpoint p of the background of the image B as the center, the boundary to be pasted is expanded with a to obtain a sum D_ARegion D of the same size_B。

c.d(D_A,D_B) Representing a distance measure of the two regions, then:

wherein C is_AiTo mean D_AThe ith (i ═ 1,2,3), C, of the three regions from the inside out_BiTo mean D_BThe ith region (i ═ 1,2,3), w, of the three regions from the inside out_iTo mean D_AAnd D_BWeight of ith of three regions (I ═ 1,2,3), I_j(x_j,y_j) Finger D_jIs located in (x)_j,y_j) RGB pixel values (j ═ a, B), Δ denotes I_A(x_A,y_A) And I_B(x_B,y_B) Any distance in any metric. D_AAnd D_BThe schematic diagram is shown in fig. 3.

d. And c, solving the distance measure of each pixel point in the background of the image B to form a distance heatmap H.

e. Normalizing heatmap H, where x denotes the current value of a point on heatmap, the normalized value H (x) is expressed as:

where M denotes the maximum value on heatmap H, and the value of each point on H after normalization represents the position of the point on BAnd (5) normalizing the calculated heatmap H to 0-255 to obtain a final pasting confidence coefficient graph B₃。

Obtaining a pasting confidence coefficient map B through calculation₃Then, a position q (as shown in fig. 3) with a confidence greater than the threshold T2(T2 ═ 200) is randomly selected for pasting. Specifically, A is₁With the diagram B₂The point q in (1) is placed on the background of the image B as the center, wherein the image A after cutting (marked as A) can be obtained by cutting the image and supplementing 0 to the background₄) The same size as figure B. And generating a mask (mask) by using the cut image A, so that the mask value of the position of the foreground at the moment is 1, and the rest positions are 0.

Preferably, in the paste A₁To picture B background B₂In the middle time, the pasted edge is processed with Gaussian blur to weaken the edge effect brought by pasting. Specifically, in order to weaken the edge effect during pasting, a gaussian kernel with a gaussian radius Δ is further generated, and the mask is filtered. The following formula is adopted during pasting: i ═ I_A×m+I_BX (1-m), wherein I represents a new synthesized image, I_AIs an image A₄，I_BIs an image B₂And m denotes a mask after gaussian filtering.

Preferably, the style rendering is implemented by: all images after foreground exchange are regarded as a domain I and an original image sample set is regarded as a domain II, the domain I and the original image sample set are used as input of an annular generation countermeasure network, and the annular generation countermeasure network is trained, so that the annular generation countermeasure network can transfer the images of the domain I into the style of the images of the domain II, and the edge effect caused by exchange is further eliminated; and inputting each image after foreground exchange into the trained annular countermeasure network again to obtain the image after style migration, wherein all the images after style migration form the extended image sample set.

The scheme adopts a ring to generate an image set generated after the antithetical couplet cycleGAN renders, cuts and sticks. And (3) taking the image set generated based on the cropping-pasting and the original image sample set as 2 domains, recording the domains as a domain I and a domain II, and taking the domains as input of the cycleGAN to train the ring generation countermeasure network. To fool the classifier, cycleGAN would migrate the image of domain I into the style of domain II as realistic as possible. After the training is completed, the image set generated based on the cropping-pasting is tested through the cycleGAN to obtain a final rendered image, namely the extended image sample set.

Preferably, as shown in fig. 3, the hollow is a rectangular frame enclosing the contour of the target.

The whole pairing similarity enhancement flow is shown in fig. 4. The matching similarity data enhancement method provided in this embodiment matches images through a matching algorithm and detects an optimal matching position, exchanges a foreground to generate a new image, and further renders the generated image through a generation countermeasure network (GAN), so as to generate more realistic training data, which significantly improves the accuracy of the existing target detection method.

Preferably, the network parameters are optimized according to the information on the image and the prediction results of the category prediction branch and the location prediction branch in the detection network, and the implementation manner is as follows:

the feature graph output by each layer of convolution layer of an FPN feature extraction unit in the single-stage target detection network is connected with a category prediction branch and a position prediction branch, and the feature graph output by the last layer of convolution layer is also connected with a context information branch which is used for carrying out whole-graph classification prediction; and training the FPN characteristic extraction unit parameters based on the addition and the reverse transmission of the prediction loss values of the branches.

Based on a typical single-stage target detection network, designing a violation behavior detection network Net-Context based on Context information, and using the violation behavior detection network Net-Context to enhance the network identification capability of violation behaviors and obtain a current frame violation target detection result. The structure of the Net-Context model is shown in fig. 5.

As can be seen from fig. 5, the feature extraction module of the network Net-Context for detecting an illegal behavior adopts an FPN network structure, performs prediction by adopting a plurality of feature maps which are reversely fused, each feature map is connected with a category prediction branch and a position prediction branch, the last layer of feature map is also connected with a Context information branch, and the Context information branch is realized by adopting a full-map multi-classification branch and is used for predicting a full-map category. The general target detection network may lose context information due to limited receptive field, and the full-graph class prediction proposed by the scheme can prompt the FPN network to adaptively retain effective context information until the last layer, and the useful information can help the class prediction branch to be better classified.

Preferably, as shown in fig. 6, the context information branches into full-graph multi-classification branches, which include global average pooling gap (gap) and global maximum pooling gmp (global max pooling);

respectively carrying out GAP and GMP on the feature graph output by the last layer of convolution layer to obtain two feature vectors; using the sum of the two feature vectors as a final feature vector, or using a separate GAP vector as the final feature vector, or using a GAP and GMP stitching vector as the final feature vector; and enabling the final feature vector to sequentially pass through two FC layers and a sigmoid layer to obtain a whole-image classification prediction result.

For example, two feature vectors are obtained after gap (global average potential) and gmp (global max potential) are respectively performed on the final layer of feature map of FPN, and the sum of the two feature vectors is used as the final feature vector. And finally, obtaining a full-image multi-classification prediction result by the characteristic vector through two FC layers and a sigmoid layer. The classification prediction result adopts a BCE (binary Cross Engine) loss function to calculate loss, and the loss is added into total loss (sum of prediction loss of a class prediction branch and a position prediction branch) through a certain weight to be reversely trained. Wherein the BCE loss function expression is as follows:

L_BCE(x_i,y_i)＝-w_i[y_ilogx_i+(1-y_i)log(1-x_i)]

in the formula, x_iFor multi-class prediction, y_iAs a multi-class label, w_iFor class weights, i is the element code.

The loss function of the final inspection network model is expressed as follows:

L_Net-Context＝(L_cls+L_reg)+λL_Context

wherein L is_Net-ContextRefers to the Net-Context network total loss, L_clsRefers to class prediction branch loss, L_regReferences to location predicted branch loss, L_ContextDenotes the loss of a branch of context information, λ is L_ContextAnd (4) weighting.

In summary, in the method of the embodiment, a construction scene monitoring video is obtained, a construction violation picture data set is obtained by intercepting a construction violation video segment and performing frame extraction, and targets related to the violation are labeled by using a labelimg picture labeling tool, wherein the targets include a safety helmet, goggles, a mobile phone and the like; based on an improved cutting-pasting method, the pictures are matched through a matching algorithm, the optimal matching position is detected through the algorithm, a new picture is generated by exchanging the foreground, the sample diversity is enhanced, and because the image generated by directly cutting-pasting is often not true enough, the generated image is further rendered through generating a countermeasure network (GAN), so that more vivid training data is generated; and finally, a traditional single-stage target detection algorithm is improved, and the model is enabled to better meet the detection requirement of targets related to violation behaviors in a construction scene by adding a context information branch. The method is widely tested on the construction violation data set, can prove that the accuracy of the existing target detection method is obviously improved, a series of construction violations of workers can be effectively detected in real time, and guarantee is provided for construction safety of construction sites.

Example two

A construction violation detection method comprises the following steps:

and acquiring a construction image to be detected, inputting the construction image to be detected into the construction violation detection model constructed by the construction method of the construction violation detection model according to the first embodiment, and obtaining whether the construction violation exists and the type and the position of the construction violation based on the output results of the type prediction branch and the position prediction branch. The related technical solution is the same as the first embodiment, and is not described herein again.

In the existing image processing method based on the single frame, pure image classification can only give a single frame prediction result and cannot give the occurrence position of the violation. In the embodiment, a behavior detection method based on a single-frame image is adopted, and the model construction method of the first embodiment is adopted, so that a certain violation in an input image is defined as a target to be detected, and the occurrence position (position in the image) of the target is marked.

EXAMPLE III

A computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to implement a method of constructing a construction violation detection model as described in embodiment one above and/or a construction violation detection method as described in embodiment two above. The related technical solutions are the same as those of the first embodiment and the second embodiment, and are not described herein again.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A construction method of a construction violation detection model is characterized by comprising the following steps:

acquiring an original image sample set of construction violation behaviors;

2. The construction method of the construction violation detection model according to claim 1, wherein the similarity matching is realized in a manner that:

3. The construction method of the detection model for the construction violation behavior according to claim 1, wherein the foreground exchange is implemented by:

4. The method for constructing the construction violation detection model according to claim 3, wherein the cavity is a rectangular box enclosing a target contour.

5. The construction method of the construction violation detection model according to claim 3, wherein the image background image B is a merged image₂The target foreground A₁Pasting the foreground object in an area with the pixel point as the center, and then, implementing the foreground exchange further comprises:

pasting a target foreground A₁The image background image B₂In pair A₁The pasted edge is processed with Gaussian blur to weaken the edge effect caused by pasting.

6. The construction method of the construction violation detection model according to claim 1, wherein the style rendering is implemented by:

all images after foreground exchange are regarded as a domain I and the original image sample set is regarded as a domain II and used as input of an annular generation countermeasure network, and the annular generation countermeasure network is trained, so that the annular generation countermeasure network can transfer the images of the domain I into the image style of the domain II, and the edge effect caused by exchange is eliminated;

7. The construction method of the construction violation detection model according to any one of claims 1 to 6, wherein the network parameters are optimized according to the information on the image and the prediction results of the category prediction branch and the position prediction branch in the detection network, and the implementation manner is as follows:

8. The method for constructing the detection model for the construction violation behavior according to claim 7, wherein the context information branch is a full-graph multi-classification branch, and is specifically configured to:

9. A construction violation detection method is characterized by comprising the following steps:

collecting a construction image to be detected, inputting the construction image into a construction violation detection model constructed by the construction method of the construction violation detection model according to any one of claims 1 to 8, and obtaining whether a construction violation exists and the type and position of the construction violation based on the output results of the type prediction branch and the position prediction branch.

10. A computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to implement a construction violation detection model building method according to any one of claims 1-8 and/or a construction violation detection method according to claim 9.