CN111832443A - Construction method and application of construction violation detection model - Google Patents

Construction method and application of construction violation detection model Download PDF

Info

Publication number
CN111832443A
CN111832443A CN202010601260.XA CN202010601260A CN111832443A CN 111832443 A CN111832443 A CN 111832443A CN 202010601260 A CN202010601260 A CN 202010601260A CN 111832443 A CN111832443 A CN 111832443A
Authority
CN
China
Prior art keywords
image
construction
network
violation
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010601260.XA
Other languages
Chinese (zh)
Other versions
CN111832443B (en
Inventor
韩守东
陈国荣
马迪
刘巾英
陈阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202010601260.XA priority Critical patent/CN111832443B/en
Publication of CN111832443A publication Critical patent/CN111832443A/en
Application granted granted Critical
Publication of CN111832443B publication Critical patent/CN111832443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Abstract

The invention relates to the field of illegal behavior detection, and particularly discloses a construction method and application of a construction illegal behavior detection model, wherein the construction method comprises the following steps: carrying out similarity matching on images in the original image sample set, carrying out foreground exchange on each group of image pairs and carrying out style rendering on all exchanged images to obtain an expanded image sample set; training a single-stage target detection network based on an original image sample set and an extended image sample set, wherein network parameters are optimized according to image information and prediction results of a class prediction branch and a position prediction branch in the detection network during training, and finally, detecting violation behaviors based on a constructed violation behavior detection model. The method enhances the matching similarity of the original samples, greatly improves the scale of the training samples and the training precision of the model, considers the context information during training, and regularizes the network by adding the context information to the network, thereby greatly improving the accuracy of target detection.

Description

Construction method and application of construction violation detection model
Technical Field
The invention belongs to the field of illegal behavior detection, and particularly relates to a construction method and application of a construction illegal behavior detection model.
Background
In the construction process, the construction safety guarantee is an important link for improving the production efficiency, improving the enterprise benefit and guaranteeing the safety of staff. A set of intelligent monitoring system is designed and produced, so that automatic real-time and accurate identification, judgment and positioning of workers can be completed in a complex and changeable industrial environment, and the intelligent monitoring system has important significance. Specifically, the system requires that workers in the lens range can be detected in real time through the camera, whether behaviors of not wearing safety helmets and not wearing safety belts exist or not is judged, and if the workers with illegal behaviors are found, an alarm is immediately sent through the alarm system, so that reliable information is provided for background monitoring personnel. The method can improve the safety of workers in the production environment and can improve the guarantee of safety production of enterprises.
The current schemes for violation detection are mainly: if the input is set as a video segment, a method based on multi-frame processing, such as C3D, I3D, a dual stream method, skeleton-based behavior detection, or the like, may be employed; if the input is set as a single frame, methods based on a single frame image, such as image classification, object detection, may be employed. For the current mainstream video processing method based on multiple frames, such as a double-flow method, simultaneously inputting a single-frame image and calculating a light flow graph based on a multi-frame video, extracting characteristics through a convolutional neural network, and fusing the single-frame characteristics and the light flow characteristics as characteristic expressions of a video segment, wherein the light flow extraction time is long, and the requirement on real-time performance cannot be met; the 3D convolution-based method, such as C3D and I3D, endows the neural network with the capability of capturing time sequence behaviors by adding the dimension of time to the input of the neural network, requires the input video to be convolved in time and space at the same time, has high computational complexity, is difficult to train, and may not be used in some monitoring systems with limitations on computational resources. In addition, many mainstream methods based on video segment behavior detection at present can only give frame level prediction, that is, only detect the time period in which the violation occurs, and cannot give frame location of the specific violation personnel. As the human body key point localization technology matures, researchers have begun to attempt behavior recognition through human body skeletons. Most of methods based on human body skeletons extract key points from people in videos, the key points form the human body skeletons, and the features of the human body skeletons are extracted through a convolutional neural network, so that behavior recognition is performed. However, the input of such methods is a human body key point, and the interaction between the human body and the object cannot be modeled. However, in the illegal behavior, a large number of human-object interaction scenes exist, and if the safety helmet is worn, the method is not suitable for detecting the illegal behavior of the monitoring view angle. (2) The image processing method based on the single frame is characterized in that pure image classification can only give a single frame prediction result and cannot give the occurrence position of the illegal action.
Disclosure of Invention
The invention provides a construction violation detection model construction method and application thereof, which are used for solving the technical problem that the type and position of a violation cannot be accurately and simultaneously detected in the conventional construction violation detection method.
The technical scheme for solving the technical problems is as follows: a construction method of a construction violation detection model comprises the following steps:
acquiring an original image sample set of construction violation behaviors;
carrying out similarity matching on the images in the original image sample set to obtain a plurality of groups of image pairs, carrying out foreground exchange on each group of image pairs, and carrying out style rendering on all the exchanged images to eliminate edge effect caused by the exchange to obtain an extended image sample set;
and training a single-stage target detection network based on the original image sample set and the extended image sample set to obtain a violation behavior detection model, wherein network parameters are optimized according to the information of the image and the prediction results of the class prediction branch and the position prediction branch in the detection network during training.
The invention has the beneficial effects that: the method firstly performs similarity matching on images in an original image sample set, performs foreground exchange on each group of image pairs, and performs style rendering on all exchanged images, so as to realize expansion of the original image sample set, is a matching similarity data enhancement method for detection of construction violation, enhances sample diversity of the violation data set, is suitable for most excellent deep learning networks, and has remarkable effect and high robustness; the invention further provides a violation behavior detection model based on the context information, the context information is considered during training, the network is regularized by adding the context information, the trained detection model can provide the context information for the violation behavior identification stage, the accuracy of target detection is obviously improved, the problem that the classification accuracy is not high due to the fact that the feature extraction of the full-image information is omitted in the prior art is solved, and the violation behavior detection model is compatible with most target detection networks, simple in construction method, strong in manufacturability and high in robustness. The construction violation detection model construction method integrating the context information and the pairing similarity data enhancement only increases the training time, does not lose the testing speed of the network or the actual violation detection speed at all, and meets the requirement of industrial application on the speed.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the similarity matching is realized in the following manner:
training a violation multi-classification network by adopting a part of samples in the original image sample set;
randomly extracting two images from the original image sample set, adopting a trained feature extraction unit in the multi-classification network to respectively extract features of the two images and calculate the cosine similarity between the features of the two images; if the similarity is greater than the threshold value, the pairing is successful; otherwise, the matching fails, new two images are randomly extracted from the original image sample set again, and the process is repeated until the stop requirement is met, so that a plurality of groups of successfully matched image pairs are obtained.
The invention has the further beneficial effects that: by randomly extracting samples, pictures in various scenes can be sampled for matching, and the diversity of the samples can be enriched.
Further, the foreground exchange is realized in the following manner:
respectively cutting the target foreground of each image in each group of image pairs, and filling up the cut holes in each image to obtain the image background image of each image;
calculating a target foreground A of the image A1Pasting the image background image B to the image B matched with the image A2On pasting confidence map B3(ii) a Randomly selecting the pasting confidence coefficient map B3Any pixel point with the middle confidence value exceeding the threshold value is arranged in the image background image B2The target foreground A1And pasting the image A in an area with the pixel point as the center, wherein the image A is each image in the multiple groups of image pairs.
Further, the cavity is a rectangular frame which encloses the target contour.
The invention has the further beneficial effects that: the time cost of the rectangular frame labeling is lower than that of the fine mask labeling, and the advantage of the rectangular frame labeling cost is more obvious when the number of the pictures is larger, so that the method is more in line with the requirement of industrial application on the cost.
Further, the image background image B2The target foreground A1Pasting the foreground object in an area with the pixel point as the center, and then, implementing the foreground exchange further comprises:
pasting a target foreground A1The image background image B2In the middle, the pasted edge is processed with Gaussian blur to weaken the edge effect caused by pasting.
The invention has the following further beneficial effects: by the aid of the Gaussian fuzzy edge, the pasted image is not too abrupt, and reasonability of a training sample for training the detection network is guaranteed.
Further, the style rendering is realized by:
all images after foreground exchange are regarded as a domain I and the original image sample set is regarded as a domain II and used as input of an annular generation countermeasure network, and the annular generation countermeasure network is trained, so that the annular generation countermeasure network can transfer the images of the domain I into the style of the images of the domain II, and the edge effect caused by exchange is reduced;
and inputting each image after foreground exchange into the trained annular countermeasure network again to obtain an image after style migration, wherein all the images after style migration form the extended image sample set.
The invention has the further beneficial effects that: the image can be better integrated into the background after being generated into the countermeasure network through the ring, so that the influence of the edge effect caused by the rectangular frame can be further reduced, and the colors of the foreground and the background can be more uniform and more real.
Further, the network parameters are adjusted and optimized according to the information on the image and the prediction results of the category prediction branch and the position prediction branch in the detection network, and the implementation mode is as follows:
the feature graph output by each layer of convolution layer of the FPN feature extraction unit in the single-stage target detection network is connected with a category prediction branch and a position prediction branch, and the feature graph output by the last layer of convolution layer is also connected with a context information branch which is used for carrying out whole-graph classification prediction;
and training the FPN characteristic extraction unit parameters based on the addition and the back transmission of the prediction loss values of all the branches.
The invention has the further beneficial effects that: the feature graph output by the last layer of the convolutional layer of the FPN feature extraction unit is connected with a context information branch, and the context information branch is used for carrying out full-graph classification prediction during network training, so that the FPN feature extraction unit is enabled to adaptively retain effective context information until the last layer, and the useful information can help the class prediction branch to be better classified. The accuracy of the class prediction and the position prediction of the detection model is greatly improved.
Further, the context information branch is a full-graph multi-classification branch, and is specifically configured to:
respectively carrying out global average pooling and global maximum pooling on the feature map output by the last layer of convolutional layer to obtain two feature vectors;
taking the sum of the two feature vectors as a final feature vector, or taking the feature vector obtained by the global average pooling as the final feature vector, or taking the spliced vector of the two feature vectors as the final feature vector;
and enabling the final feature vector to sequentially pass through two full-connection layers and one sigmoid layer to obtain a full-graph classification prediction result.
The invention also provides a construction violation detection method, which comprises the following steps:
and acquiring a construction image to be detected, inputting the construction image to be detected into the construction violation detection model constructed by the construction method of the construction violation detection model, and obtaining whether the construction violation exists and the category and the position of the construction violation based on the output results of the category prediction branch and the position prediction branch.
The invention has the beneficial effects that: the construction violation detection method integrating the context information and the pairing similarity data enhancement simultaneously detects the type and the occurrence position of the violation, can obviously improve the violation detection precision, realizes real-time and effective alarm on the violation of workers, and meets the requirement of industrial application on speed.
The present invention also provides a computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to implement a construction violation detection model construction method as described above and/or a construction violation detection method as described above.
Drawings
Fig. 1 is a flowchart of a construction violation detection model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of image similarity matching according to an embodiment of the present invention;
FIG. 3 is a drawing showing a step D according to an embodiment of the present inventionAAnd DBA schematic diagram;
fig. 4 is a general flowchart of a method for enhancing matching similarity data according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a Net-Context network structure according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a branch structure of context information according to an embodiment of the present invention;
fig. 7 is a general flowchart of construction violation detection model construction and application thereof according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example one
A method 100 for constructing a construction violation detection model, as shown in fig. 1, includes:
step 110, obtaining an original image sample set of construction violation behaviors;
step 120, performing similarity matching on the images in the original image sample set to obtain a plurality of groups of image pairs, performing foreground exchange on each group of image pairs, and performing style rendering on all the exchanged images to eliminate edge effects caused by the exchange to obtain an extended image sample set;
and step 130, training the single-stage target detection network based on the original image sample set and the extended image sample set to obtain a violation detection model, wherein during training, network parameters are optimized according to the information of the image and the prediction results of the class prediction branch and the position prediction branch in the detection network.
It should be noted that, in step 110, a construction scene monitoring video may be specifically obtained, a construction violation video segment is intercepted and frame extraction is performed to obtain a construction violation original image sample set, and in addition, as required by subsequent training, a label marking tool needs to be used to mark targets related to violations, such as a safety helmet, a safety belt, a mobile phone, and the like, so as to obtain a label value for training. For example, first, the construction violation category and number are set as: the safety helmet-1 is not worn, the mobile phone-2 is carried, the goggles-3 are not worn, the bicycle is ridden on a construction site-4, the safety valve operation-5 is stepped, and the number can be used for code representation of the class of the illegal behaviors in the actual training. Then, relevant illegal action images and videos in the construction scene are collected, 4330 illegal action images in the construction scene are marked (marking type and position) by a labelImg marking tool, and an illegal action original image sample set in the construction scene and label data corresponding to each sample can be obtained.
Secondly, because the deep learning algorithm is a big data-based learning algorithm, the larger the scale and the higher the quality of the training data are, the better the recognition and generalization of the model are. Therefore, a large amount of labeled data is needed for training a violation detection model, but the method is limited by acquisition conditions and labeling cost, the scale of the current data set is not large enough to cover the problems caused by illumination, visual angle and scene change in video monitoring, and data enhancement has very important significance in the detection task with the serious shortage of sample size.
Data enhancement is a collective term for a range of efficient methods for expanding the size of data samples. Data enhancement can not only increase the data volume of training, improves the generalization ability of model, can also increase noise data simultaneously, promotes the robustness of model. The common basic data enhancement method comprises the steps of turning, rotating, zooming, cutting, translating, adding noise and randomly erasing, and although basic data enhancement can bring some effect promotion to a target detection algorithm, the promotion is very limited, the problem of too few samples cannot be solved, and the requirement of actual production cannot be met. The industry explored a new method for enhancing target detection data in the last two years: a data enhancement method based on cut-paste (cut-paste). The method specifically comprises the steps of cutting out the foreground in the image, randomly pasting the foreground in the image to other images under the condition that the foreground in other images cannot be covered, and automatically generating new labels. Specifically, for example, the instaboost algorithm takes the pixel mean value of each pixel point in a certain range on a graph as a measurement standard, picks out all pixel points with similarity greater than a certain threshold with the original center coordinate of the foreground, randomly selects one of the pixel points as a new center coordinate, and then pastes the foreground to a new position and fills the original center position. The install boost method is based on fine mask (mask) labeling for clipping and pasting, although the mask labeling will be well promoted. However, the mask marking cost is very high, and is often tens of times of the frame marking cost, and the effect of the target detection algorithm based on the frame level is not good.
Therefore, aiming at the problem of detection of the illegal behavior of the construction scene, the method improves the existing cutting-pasting method in step 120, provides a matching similarity data enhancement method facing to the detection of the construction illegal behavior, pairs the images in the original image sample set, exchanges the foreground through cutting-pasting, generates a new image, and achieves the purpose of enhancing the diversity of the sample. Because the image generated by direct cutting-pasting is not real enough, the generated image is further rendered, so that more vivid training data is generated, and the sample diversity is successfully enhanced.
In addition, the method is based on a typical single-stage target detection network, a violation behavior detection model Net-Context based on Context information is designed, the network is regularized by adding the Context information to the network, the network precision is improved, and a more accurate current frame violation target detection result is obtained.
Therefore, in the embodiment, similarity matching is performed on images in an original image sample set, foreground exchange is performed on each group of image pairs, and style rendering is performed on all exchanged images, so that expansion of the original image sample set is realized, the method is a matching similarity data enhancement method for detection of construction violation, sample diversity of the violation data set is enhanced, the method is suitable for most excellent deep learning networks, and the effect is remarkable and the robustness is high; the invention further provides a violation behavior detection model based on the context information, the context information is considered during training, the network is regularized by adding the context information, the trained detection model can provide the context information for the violation behavior identification stage, the accuracy of target detection is obviously improved, the problem that the classification accuracy is not high due to the fact that the feature extraction of the full-image information is omitted in the prior art is solved, and the violation behavior detection model is compatible with most target detection networks, simple in construction method, strong in manufacturability and high in robustness. The construction violation detection model construction method integrating the context information and the pairing similarity data enhancement only increases the training time, does not lose the testing speed of the network or the actual violation detection speed at all, and meets the requirement of industrial application on the speed.
The key point of the embodiment is that firstly, a construction scene violation detection data set is manufactured in consideration of the lack of a large public construction scene violation data set at present; secondly, a matching similarity data enhancement method facing illegal behavior detection is designed for expanding a data set, sample diversity is improved, and detection precision of the model for illegal behaviors is generally and effectively improved; thirdly, a violation detection network based on context information is designed, the context information is considered during training to optimize parameters, the context information is introduced into feature extraction of a target detection network, prediction of category prediction and position prediction is guaranteed based on the context information, and detection capability of a model is enhanced; and fourthly, the construction violation behavior detection model construction method integrating the context information and the pairing similarity enhancement only increases the training time, greatly improves the detection capability of the constructed detection model, and does not damage the detection speed of the network.
Preferably, the similarity matching is implemented by:
firstly, a multi-classification network is trained, violations are classified, and the network is used as a feature extraction network (in order to prove the generalization of the feature extraction network and prevent overfitting, the multi-classification network can be trained by only using 50% of images in an original image sample set). Then, two images (marked as image A and image B) are randomly extracted from the original image sample set to form an input image pair, and features are extracted from the two images based on a feature extraction network trained before and the cosine similarity is calculated. If the similarity is larger than a certain threshold value, the pairing is successful, otherwise, the pairing is failed. If the pairing fails, a new image pair is extracted from the original image sample set again, and the above process is repeated until a stop requirement (for example, the number of image pairs requirement) is met, and the pairing unit is as shown in fig. 2.
Preferably, the foreground exchange is implemented by:
respectively cutting the target foreground of each image in each group of image pairs, and filling up the cut holes in each image to obtain the image background image of each image;
calculating a target foreground A of the image A1Pasting the image background image B to the image B matched with the image A2On pasting confidence map B3(ii) a Randomly selecting the pasting confidence coefficient map B3Any pixel point with the middle confidence value exceeding the threshold value is arranged in the image background image B2The target foreground A1And pasting the image A in an area with the pixel point as the center, wherein the image A is each image in the multiple groups of image pairs.
After the feature extraction and the similarity comparison are completed, if the images are successfully matched, the foreground (the foreground, namely the target and the worker) of the two images is respectively cut. This leaves a hole in the original image. For the holes, the holes can be filled by adopting the existing image filling method.
A pair of images consisting of image A and image B is formed as follows, and the object foreground of image A is pasted to the image background image B2For example, the paste process is explained in detail:
using the hole-filled image B (i.e. image B background, image background image B as described above)2) Calculating a pasting confidence map, specifically, in pasting, in order to ensure consistency of local patterns in pasting so as not to be too disobeyed (for example, pasting a dark object to a bright light place), pixel-by-pixel calculation is requiredFilled-in image B background B2And the target foreground A in the image A1Local similarity of the position to obtain a background B2Equal-size pasting confidence map B3. Paste confidence map B3The calculation method comprises the following steps:
a. from the target prospect A1Starting from the boundary, the image is expanded outward three times, each time the image is expanded by K (K is 5) pixel widths, three boundary regions belonging to the current original image are formed, and the region is recorded as DA
b. Using the midpoint p of the background of the image B as the center, the boundary to be pasted is expanded with a to obtain a sum DARegion D of the same sizeB
c.d(DA,DB) Representing a distance measure of the two regions, then:
Figure BDA0002558655380000101
wherein C isAiTo mean DAThe ith (i ═ 1,2,3), C, of the three regions from the inside outBiTo mean DBThe ith region (i ═ 1,2,3), w, of the three regions from the inside outiTo mean DAAnd DBWeight of ith of three regions (I ═ 1,2,3), Ij(xj,yj) Finger DjIs located in (x)j,yj) RGB pixel values (j ═ a, B), Δ denotes IA(xA,yA) And IB(xB,yB) Any distance in any metric. DAAnd DBThe schematic diagram is shown in fig. 3.
d. And c, solving the distance measure of each pixel point in the background of the image B to form a distance heatmap H.
e. Normalizing heatmap H, where x denotes the current value of a point on heatmap, the normalized value H (x) is expressed as:
Figure BDA0002558655380000111
where M denotes the maximum value on heatmap H, and the value of each point on H after normalization represents the position of the point on BAnd (5) normalizing the calculated heatmap H to 0-255 to obtain a final pasting confidence coefficient graph B3
Obtaining a pasting confidence coefficient map B through calculation3Then, a position q (as shown in fig. 3) with a confidence greater than the threshold T2(T2 ═ 200) is randomly selected for pasting. Specifically, A is1With the diagram B2The point q in (1) is placed on the background of the image B as the center, wherein the image A after cutting (marked as A) can be obtained by cutting the image and supplementing 0 to the background4) The same size as figure B. And generating a mask (mask) by using the cut image A, so that the mask value of the position of the foreground at the moment is 1, and the rest positions are 0.
Preferably, in the paste A1To picture B background B2In the middle time, the pasted edge is processed with Gaussian blur to weaken the edge effect brought by pasting. Specifically, in order to weaken the edge effect during pasting, a gaussian kernel with a gaussian radius Δ is further generated, and the mask is filtered. The following formula is adopted during pasting: i ═ IA×m+IBX (1-m), wherein I represents a new synthesized image, IAIs an image A4,IBIs an image B2And m denotes a mask after gaussian filtering.
Preferably, the style rendering is implemented by: all images after foreground exchange are regarded as a domain I and an original image sample set is regarded as a domain II, the domain I and the original image sample set are used as input of an annular generation countermeasure network, and the annular generation countermeasure network is trained, so that the annular generation countermeasure network can transfer the images of the domain I into the style of the images of the domain II, and the edge effect caused by exchange is further eliminated; and inputting each image after foreground exchange into the trained annular countermeasure network again to obtain the image after style migration, wherein all the images after style migration form the extended image sample set.
The scheme adopts a ring to generate an image set generated after the antithetical couplet cycleGAN renders, cuts and sticks. And (3) taking the image set generated based on the cropping-pasting and the original image sample set as 2 domains, recording the domains as a domain I and a domain II, and taking the domains as input of the cycleGAN to train the ring generation countermeasure network. To fool the classifier, cycleGAN would migrate the image of domain I into the style of domain II as realistic as possible. After the training is completed, the image set generated based on the cropping-pasting is tested through the cycleGAN to obtain a final rendered image, namely the extended image sample set.
Preferably, as shown in fig. 3, the hollow is a rectangular frame enclosing the contour of the target.
The whole pairing similarity enhancement flow is shown in fig. 4. The matching similarity data enhancement method provided in this embodiment matches images through a matching algorithm and detects an optimal matching position, exchanges a foreground to generate a new image, and further renders the generated image through a generation countermeasure network (GAN), so as to generate more realistic training data, which significantly improves the accuracy of the existing target detection method.
Preferably, the network parameters are optimized according to the information on the image and the prediction results of the category prediction branch and the location prediction branch in the detection network, and the implementation manner is as follows:
the feature graph output by each layer of convolution layer of an FPN feature extraction unit in the single-stage target detection network is connected with a category prediction branch and a position prediction branch, and the feature graph output by the last layer of convolution layer is also connected with a context information branch which is used for carrying out whole-graph classification prediction; and training the FPN characteristic extraction unit parameters based on the addition and the reverse transmission of the prediction loss values of the branches.
Based on a typical single-stage target detection network, designing a violation behavior detection network Net-Context based on Context information, and using the violation behavior detection network Net-Context to enhance the network identification capability of violation behaviors and obtain a current frame violation target detection result. The structure of the Net-Context model is shown in fig. 5.
As can be seen from fig. 5, the feature extraction module of the network Net-Context for detecting an illegal behavior adopts an FPN network structure, performs prediction by adopting a plurality of feature maps which are reversely fused, each feature map is connected with a category prediction branch and a position prediction branch, the last layer of feature map is also connected with a Context information branch, and the Context information branch is realized by adopting a full-map multi-classification branch and is used for predicting a full-map category. The general target detection network may lose context information due to limited receptive field, and the full-graph class prediction proposed by the scheme can prompt the FPN network to adaptively retain effective context information until the last layer, and the useful information can help the class prediction branch to be better classified.
Preferably, as shown in fig. 6, the context information branches into full-graph multi-classification branches, which include global average pooling gap (gap) and global maximum pooling gmp (global max pooling);
respectively carrying out GAP and GMP on the feature graph output by the last layer of convolution layer to obtain two feature vectors; using the sum of the two feature vectors as a final feature vector, or using a separate GAP vector as the final feature vector, or using a GAP and GMP stitching vector as the final feature vector; and enabling the final feature vector to sequentially pass through two FC layers and a sigmoid layer to obtain a whole-image classification prediction result.
For example, two feature vectors are obtained after gap (global average potential) and gmp (global max potential) are respectively performed on the final layer of feature map of FPN, and the sum of the two feature vectors is used as the final feature vector. And finally, obtaining a full-image multi-classification prediction result by the characteristic vector through two FC layers and a sigmoid layer. The classification prediction result adopts a BCE (binary Cross Engine) loss function to calculate loss, and the loss is added into total loss (sum of prediction loss of a class prediction branch and a position prediction branch) through a certain weight to be reversely trained. Wherein the BCE loss function expression is as follows:
LBCE(xi,yi)=-wi[yilogxi+(1-yi)log(1-xi)]
in the formula, xiFor multi-class prediction, yiAs a multi-class label, wiFor class weights, i is the element code.
The loss function of the final inspection network model is expressed as follows:
LNet-Context=(Lcls+Lreg)+λLContext
wherein L isNet-ContextRefers to the Net-Context network total loss, LclsRefers to class prediction branch loss, LregReferences to location predicted branch loss, LContextDenotes the loss of a branch of context information, λ is LContextAnd (4) weighting.
In summary, in the method of the embodiment, a construction scene monitoring video is obtained, a construction violation picture data set is obtained by intercepting a construction violation video segment and performing frame extraction, and targets related to the violation are labeled by using a labelimg picture labeling tool, wherein the targets include a safety helmet, goggles, a mobile phone and the like; based on an improved cutting-pasting method, the pictures are matched through a matching algorithm, the optimal matching position is detected through the algorithm, a new picture is generated by exchanging the foreground, the sample diversity is enhanced, and because the image generated by directly cutting-pasting is often not true enough, the generated image is further rendered through generating a countermeasure network (GAN), so that more vivid training data is generated; and finally, a traditional single-stage target detection algorithm is improved, and the model is enabled to better meet the detection requirement of targets related to violation behaviors in a construction scene by adding a context information branch. The method is widely tested on the construction violation data set, can prove that the accuracy of the existing target detection method is obviously improved, a series of construction violations of workers can be effectively detected in real time, and guarantee is provided for construction safety of construction sites.
Example two
A construction violation detection method comprises the following steps:
and acquiring a construction image to be detected, inputting the construction image to be detected into the construction violation detection model constructed by the construction method of the construction violation detection model according to the first embodiment, and obtaining whether the construction violation exists and the type and the position of the construction violation based on the output results of the type prediction branch and the position prediction branch. The related technical solution is the same as the first embodiment, and is not described herein again.
In the existing image processing method based on the single frame, pure image classification can only give a single frame prediction result and cannot give the occurrence position of the violation. In the embodiment, a behavior detection method based on a single-frame image is adopted, and the model construction method of the first embodiment is adopted, so that a certain violation in an input image is defined as a target to be detected, and the occurrence position (position in the image) of the target is marked.
EXAMPLE III
A computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to implement a method of constructing a construction violation detection model as described in embodiment one above and/or a construction violation detection method as described in embodiment two above. The related technical solutions are the same as those of the first embodiment and the second embodiment, and are not described herein again.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A construction method of a construction violation detection model is characterized by comprising the following steps:
acquiring an original image sample set of construction violation behaviors;
carrying out similarity matching on the images in the original image sample set to obtain a plurality of groups of image pairs, carrying out foreground exchange on each group of image pairs, and carrying out style rendering on all the exchanged images to eliminate edge effect caused by the exchange to obtain an extended image sample set;
and training a single-stage target detection network based on the original image sample set and the extended image sample set to obtain a violation behavior detection model, wherein network parameters are optimized according to the information of the image and the prediction results of the class prediction branch and the position prediction branch in the detection network during training.
2. The construction method of the construction violation detection model according to claim 1, wherein the similarity matching is realized in a manner that:
training a violation multi-classification network by adopting a part of samples in the original image sample set;
randomly extracting two images from the original image sample set, adopting a trained feature extraction unit in the multi-classification network to respectively extract features of the two images and calculate the cosine similarity between the features of the two images; if the similarity is greater than the threshold value, the pairing is successful; otherwise, the matching fails, new two images are randomly extracted from the original image sample set again, and the process is repeated until the stop requirement is met, so that a plurality of groups of successfully matched image pairs are obtained.
3. The construction method of the detection model for the construction violation behavior according to claim 1, wherein the foreground exchange is implemented by:
respectively cutting the target foreground of each image in each group of image pairs, and filling up the cut holes in each image to obtain the image background image of each image;
calculating a target foreground A of the image A1Pasting the image background image B to the image B matched with the image A2On pasting confidence map B3(ii) a Randomly selecting the pasting confidence coefficient map B3Any pixel point with the middle confidence value exceeding the threshold value is arranged in the image background image B2The target foreground A1And pasting the image A in an area with the pixel point as the center, wherein the image A is each image in the multiple groups of image pairs.
4. The method for constructing the construction violation detection model according to claim 3, wherein the cavity is a rectangular box enclosing a target contour.
5. The construction method of the construction violation detection model according to claim 3, wherein the image background image B is a merged image2The target foreground A1Pasting the foreground object in an area with the pixel point as the center, and then, implementing the foreground exchange further comprises:
pasting a target foreground A1The image background image B2In pair A1The pasted edge is processed with Gaussian blur to weaken the edge effect caused by pasting.
6. The construction method of the construction violation detection model according to claim 1, wherein the style rendering is implemented by:
all images after foreground exchange are regarded as a domain I and the original image sample set is regarded as a domain II and used as input of an annular generation countermeasure network, and the annular generation countermeasure network is trained, so that the annular generation countermeasure network can transfer the images of the domain I into the image style of the domain II, and the edge effect caused by exchange is eliminated;
and inputting each image after foreground exchange into the trained annular countermeasure network again to obtain an image after style migration, wherein all the images after style migration form the extended image sample set.
7. The construction method of the construction violation detection model according to any one of claims 1 to 6, wherein the network parameters are optimized according to the information on the image and the prediction results of the category prediction branch and the position prediction branch in the detection network, and the implementation manner is as follows:
the feature graph output by each layer of convolution layer of the FPN feature extraction unit in the single-stage target detection network is connected with a category prediction branch and a position prediction branch, and the feature graph output by the last layer of convolution layer is also connected with a context information branch which is used for carrying out whole-graph classification prediction;
and training the FPN characteristic extraction unit parameters based on the addition and the back transmission of the prediction loss values of all the branches.
8. The method for constructing the detection model for the construction violation behavior according to claim 7, wherein the context information branch is a full-graph multi-classification branch, and is specifically configured to:
respectively carrying out global average pooling and global maximum pooling on the feature map output by the last layer of convolutional layer to obtain two feature vectors;
taking the sum of the two feature vectors as a final feature vector, or taking the feature vector obtained by the global average pooling as the final feature vector, or taking the spliced vector of the two feature vectors as the final feature vector;
and enabling the final feature vector to sequentially pass through two full-connection layers and one sigmoid layer to obtain a full-graph classification prediction result.
9. A construction violation detection method is characterized by comprising the following steps:
collecting a construction image to be detected, inputting the construction image into a construction violation detection model constructed by the construction method of the construction violation detection model according to any one of claims 1 to 8, and obtaining whether a construction violation exists and the type and position of the construction violation based on the output results of the type prediction branch and the position prediction branch.
10. A computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to implement a construction violation detection model building method according to any one of claims 1-8 and/or a construction violation detection method according to claim 9.
CN202010601260.XA 2020-06-28 2020-06-28 Construction method and application of construction violation detection model Active CN111832443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010601260.XA CN111832443B (en) 2020-06-28 2020-06-28 Construction method and application of construction violation detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010601260.XA CN111832443B (en) 2020-06-28 2020-06-28 Construction method and application of construction violation detection model

Publications (2)

Publication Number Publication Date
CN111832443A true CN111832443A (en) 2020-10-27
CN111832443B CN111832443B (en) 2022-04-12

Family

ID=72898997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010601260.XA Active CN111832443B (en) 2020-06-28 2020-06-28 Construction method and application of construction violation detection model

Country Status (1)

Country Link
CN (1) CN111832443B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347916A (en) * 2020-11-05 2021-02-09 安徽继远软件有限公司 Power field operation safety monitoring method and device based on video image analysis
CN112633159A (en) * 2020-12-22 2021-04-09 北京迈格威科技有限公司 Human-object interaction relation recognition method, model training method and corresponding device
CN112990378A (en) * 2021-05-08 2021-06-18 腾讯科技(深圳)有限公司 Scene recognition method and device based on artificial intelligence and electronic equipment
CN112989085A (en) * 2021-01-29 2021-06-18 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN113011476A (en) * 2021-03-05 2021-06-22 桂林电子科技大学 User behavior safety detection method based on self-adaptive sliding window GAN
CN113688947A (en) * 2021-10-11 2021-11-23 国网智能科技股份有限公司 Infrared image fault identification method and system for power distribution equipment
CN115170894A (en) * 2022-09-05 2022-10-11 深圳比特微电子科技有限公司 Smoke and fire detection method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470187A (en) * 2018-02-26 2018-08-31 华南理工大学 A kind of class imbalance question classification method based on expansion training dataset
EP3506081A1 (en) * 2017-12-27 2019-07-03 Nokia Technologies Oy Audio copy-paste function
CN110766660A (en) * 2019-09-25 2020-02-07 上海众壹云计算科技有限公司 Integrated circuit defect image recognition and classification system based on fusion depth learning model
US20200090028A1 (en) * 2018-09-19 2020-03-19 Industrial Technology Research Institute Neural network-based classification method and classification device thereof
CN111091151A (en) * 2019-12-17 2020-05-01 大连理工大学 Method for generating countermeasure network for target detection data enhancement
CN111178283A (en) * 2019-12-31 2020-05-19 哈尔滨工业大学(深圳) Unmanned aerial vehicle image-based ground object identification and positioning method for established route
CN111191695A (en) * 2019-12-19 2020-05-22 杭州安恒信息技术股份有限公司 Website picture tampering detection method based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3506081A1 (en) * 2017-12-27 2019-07-03 Nokia Technologies Oy Audio copy-paste function
CN108470187A (en) * 2018-02-26 2018-08-31 华南理工大学 A kind of class imbalance question classification method based on expansion training dataset
US20200090028A1 (en) * 2018-09-19 2020-03-19 Industrial Technology Research Institute Neural network-based classification method and classification device thereof
CN110766660A (en) * 2019-09-25 2020-02-07 上海众壹云计算科技有限公司 Integrated circuit defect image recognition and classification system based on fusion depth learning model
CN111091151A (en) * 2019-12-17 2020-05-01 大连理工大学 Method for generating countermeasure network for target detection data enhancement
CN111191695A (en) * 2019-12-19 2020-05-22 杭州安恒信息技术股份有限公司 Website picture tampering detection method based on deep learning
CN111178283A (en) * 2019-12-31 2020-05-19 哈尔滨工业大学(深圳) Unmanned aerial vehicle image-based ground object identification and positioning method for established route

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HAO-SHU FANG ET AL.: "InstaBoost:boosting instance segmentation via probability map guided copy-pasting", 《ARXIV》 *
NIKITA DVORNIK ET AL.: "On the importance of visual context for data augmentation in scene understanding", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
STEFAN HITERSTOISSER ET AL.: "On pre-trained image features and synthetic images for deep learning", 《THE COMPUTER VISION FOUNDATION》 *
万兵等: "基于颜色矢量角度直方图与DCT压缩的鲁棒哈希算法", 《包装工程》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347916A (en) * 2020-11-05 2021-02-09 安徽继远软件有限公司 Power field operation safety monitoring method and device based on video image analysis
CN112347916B (en) * 2020-11-05 2023-11-17 安徽继远软件有限公司 Video image analysis-based power field operation safety monitoring method and device
CN112633159A (en) * 2020-12-22 2021-04-09 北京迈格威科技有限公司 Human-object interaction relation recognition method, model training method and corresponding device
CN112633159B (en) * 2020-12-22 2024-04-12 北京迈格威科技有限公司 Human-object interaction relation identification method, model training method and corresponding device
CN112989085A (en) * 2021-01-29 2021-06-18 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN112989085B (en) * 2021-01-29 2023-07-25 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN113011476A (en) * 2021-03-05 2021-06-22 桂林电子科技大学 User behavior safety detection method based on self-adaptive sliding window GAN
CN112990378A (en) * 2021-05-08 2021-06-18 腾讯科技(深圳)有限公司 Scene recognition method and device based on artificial intelligence and electronic equipment
CN113688947A (en) * 2021-10-11 2021-11-23 国网智能科技股份有限公司 Infrared image fault identification method and system for power distribution equipment
CN113688947B (en) * 2021-10-11 2024-03-15 国网智能科技股份有限公司 Method and system for identifying faults of infrared image of power distribution equipment
CN115170894A (en) * 2022-09-05 2022-10-11 深圳比特微电子科技有限公司 Smoke and fire detection method and device

Also Published As

Publication number Publication date
CN111832443B (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN111832443B (en) Construction method and application of construction violation detection model
CN109829443B (en) Video behavior identification method based on image enhancement and 3D convolution neural network
CN104050471B (en) Natural scene character detection method and system
CN111611874B (en) Face mask wearing detection method based on ResNet and Canny
CN107133943A (en) A kind of visible detection method of stockbridge damper defects detection
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN110909690A (en) Method for detecting occluded face image based on region generation
CN111582095B (en) Light-weight rapid detection method for abnormal behaviors of pedestrians
CN109558806A (en) The detection method and system of high score Remote Sensing Imagery Change
CN111582092B (en) Pedestrian abnormal behavior detection method based on human skeleton
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN110298297A (en) Flame identification method and device
CN112307886A (en) Pedestrian re-identification method and device
CN115393596B (en) Garment image segmentation method based on artificial intelligence
CN107944403A (en) Pedestrian's attribute detection method and device in a kind of image
CN114092793A (en) End-to-end biological target detection method suitable for complex underwater environment
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
Sun et al. IRDCLNet: Instance segmentation of ship images based on interference reduction and dynamic contour learning in foggy scenes
CN110210561B (en) Neural network training method, target detection method and device, and storage medium
CN110334703B (en) Ship detection and identification method in day and night image
Tan et al. Automobile component recognition based on deep learning network with coarse-fine-grained feature fusion
CN107403192A (en) A kind of fast target detection method and system based on multi-categorizer
Guo et al. Robust and automatic skyline detection algorithm based on mssdn
CN111160262A (en) Portrait segmentation method fusing human body key point detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant