CN117197746A

CN117197746A - Safety monitoring system and method based on deep learning

Info

Publication number: CN117197746A
Application number: CN202311204262.5A
Authority: CN
Inventors: 沈国生
Original assignee: Zhisen Computer Technology Huzhou Zhejiang Co ltd
Current assignee: Zhisen Computer Technology Huzhou Zhejiang Co ltd
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2023-12-08

Abstract

The application relates to the technical field of intelligent monitoring, and particularly discloses a safety monitoring system and method based on deep learning. Therefore, the efficiency and timeliness of safety monitoring management can be effectively improved, and the workload of manual inspection is reduced.

Description

Safety monitoring system and method based on deep learning

Technical Field

The application relates to the technical field of intelligent monitoring, in particular to a safety monitoring system and method based on deep learning.

Background

With the rapid development of China society, government departments have put higher demands on the supervision of construction engineering quality, safety and civilization construction. Digital transformation is a hot topic in the construction industry in recent years, the construction of a propulsion project is accelerated, and the energization and the holding of a new technology are required. The intelligent construction site introduces the Internet into a construction site to realize intelligent management of engineering construction so as to improve the informatization level of engineering management, thereby realizing green construction and ecological construction gradually.

The safety helmet has important significance for the safety guarantee of workers, but a part of workers have a lucky mind, and the phenomenon of no safety helmet is frequently and frequently seen on the construction site. Because the construction operation area of the construction site is large and the environment is complex, the traditional construction site management mode depends on manual field inspection, the time consumption is long, the process is tedious and repeated, the timeliness of illegal behaviors and abnormal state discovery is poor, and the comprehensive and whole-process safety inspection and control cannot be realized.

Accordingly, a deep learning based safety monitoring system and method are desired.

Disclosure of Invention

The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a safety monitoring system and a safety monitoring method based on deep learning, which are characterized in that a monitoring image of a construction site is acquired through a camera, the monitoring image is framed in a head area of a worker by utilizing an image processing technology based on the deep learning, and high-dimensional implicit characteristic information of the head area of the worker is extracted after the image is subjected to the definition processing so as to judge whether personnel do not take a safety helmet. Therefore, the efficiency and timeliness of safety monitoring management can be effectively improved, and the workload of manual inspection is reduced.

Accordingly, according to one aspect of the present application, there is provided a deep learning based security monitoring system comprising:

the monitoring module is used for acquiring a monitoring image of a site personnel acquired by a camera deployed at the site;

the personnel target detection module is used for passing the construction site personnel monitoring image through a target detection network to obtain at least one region of interest;

a region of interest pixel enhancement module for sharpening the region of interest by an image sharpness enhancer based on a challenge-generating network;

the feature extraction module is used for enabling the clear region of interest to pass through a convolutional neural network model containing a significant target detector so as to obtain a feature map of the region of interest;

the feature enhancement module is used for enabling the region-of-interest feature map to pass through a residual dual-attention mechanism model to obtain an enhanced region-of-interest feature map;

the optimizing module is used for carrying out feature manifold modulation on the enhanced region of interest feature map so as to obtain a classification feature map;

and the safety monitoring result generation module is used for enabling the classification characteristic diagram to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether personnel are not provided with safety caps.

In the deep learning-based security monitoring system, the target detection network is an anchor window-based target detection network, and the anchor window-based target detection network is Fast R-CNN, fast R-CNN or RetinaNet.

In the above safety monitoring system based on deep learning, the personnel object detection module includes: the convolution coding unit is used for enabling the monitoring image of the site personnel to pass through a plurality of convolution layers of the target detection network so as to obtain a target detection feature map; the target detection unit is used for processing the target detection characteristic map by using the target detection network based on the anchor window according to the following target detection formula so as to obtain the at least one region of interest;

wherein, the target detection formula is:

ROI＝H(ψ _det ，B)＝(cls(ψ _det ，B)，Regr(ψ _det ，B))

wherein, psi is _det Representing the object detection feature map, B representing an anchor window, ROI representing the region of interest, cls (ψ _det B) represents a classification function, regr (ψ) _det B) represents a regression function.

In the deep learning based security monitoring system, the countermeasure generation network includes a discriminator and a generator, wherein the region of interest pixel enhancement module is configured to input the region of interest into the generator of the image sharpness enhancer of the countermeasure generation network to deconvolute the region of interest by the generator to obtain the sharpened region of interest.

In the above safety monitoring system based on deep learning, the feature extraction module is configured to: each layer of the convolutional neural network model containing the salient object detector is used for respectively carrying out input data in forward transfer of the layer: performing convolution processing on the input data by using a first convolution kernel to obtain a first convolution feature map; performing convolution processing on the first convolution feature map by using a second convolution kernel to obtain a second convolution feature map, wherein the size of the first convolution kernel is larger than that of the second convolution kernel; carrying out mean value pooling processing based on a local feature matrix on the second convolution feature map to obtain a pooled feature map; performing activation processing on the pooled feature map to obtain an activated feature map; the output of the last layer of the first convolutional neural network model is the region of interest feature map, and the input of the first layer of the first convolutional neural network model is the sharpened region of interest.

In the above safety monitoring system based on deep learning, the feature enhancement module includes: a spatial attention unit, configured to input the region of interest feature map into a spatial attention module of the residual dual-attention mechanism model to obtain a spatial attention map; a channel attention unit, configured to input the region of interest feature map into a channel attention module of the residual dual-attention mechanism model to obtain a channel attention map; an attention fusion unit for fusing the spatial attention map and the channel attention map to obtain a fused attention map; the activating unit is used for inputting the fusion attention map into a Sigmoid activating function to activate so as to obtain a fusion attention feature map; an attention applying unit, configured to calculate a weighted feature map obtained by multiplying the fused attention feature map and the region of interest feature map by position points; and the residual fusion unit is used for fusing the weighted feature map and the region-of-interest feature map to obtain the enhanced region-of-interest feature map.

In the above-described deep learning-based safety monitoring system, the spatial attention unit includes: the spatial perception subunit is used for carrying out convolution encoding on the region of interest feature map by using a convolution layer of a spatial attention module of the residual double-attention mechanism model so as to obtain an initial convolution feature map; a probability subunit, configured to pass the initial convolution feature map through a Softmax function to obtain a spatial attention score map; and the spatial attention applying subunit is used for multiplying the spatial attention score graph and the region of interest feature graph by position points to obtain the spatial attention graph.

In the above-described deep learning-based safety monitoring system, the channel attention unit includes: the channel dimension pooling subunit is used for carrying out global average pooling along the channel dimension on the region of interest feature map so as to obtain a channel feature vector; a nonlinear activation subunit, configured to pass the channel feature vector through a Softmax activation function to obtain a channel weight feature vector; and the channel attention applying subunit is used for weighting each characteristic matrix of the region of interest characteristic diagram along the channel dimension by taking the characteristic value of each position in the channel weight characteristic vector as a weight so as to obtain the channel attention map.

In the above safety monitoring system based on deep learning, the optimizing module includes: the feature descriptor construction unit is used for aiming at each pixel point of the enhanced interest region feature map, and takes the channel feature vector corresponding to each pixel point as a feature descriptor of each pixel point; a KL divergence calculating unit, configured to calculate KL divergence values between feature descriptors of the respective pixel points to obtain a pixel-level topology association matrix composed of a plurality of KL divergence values; the topological feature extraction unit is used for enabling the pixel-level topological association matrix to pass through a topological feature extractor based on a convolution layer to obtain a pixel-level topological association feature matrix; the probability unit is used for inputting the pixel-level topological association feature matrix into a Softmax activation function to obtain a probability pixel-level topological association feature matrix; and the weight applying unit is used for taking the probabilistic pixel-level topological association feature matrix as a weight matrix and multiplying each feature matrix along the channel dimension of the enhanced interest region feature map by position points to obtain the classification feature map.

According to another aspect of the present application, there is provided a safety monitoring method based on deep learning, comprising:

Acquiring a monitoring image of a worksite person acquired by a camera deployed at the worksite;

the monitoring image of the site personnel is passed through a target detection network to obtain at least one region of interest;

the region of interest is passed through an image sharpness enhancer based on an countermeasure generation network to obtain a sharpened region of interest;

the clear region of interest is passed through a convolutional neural network model containing a salient object detector to obtain a region of interest feature map;

the region of interest feature map is subjected to a residual error dual-attention mechanism model to obtain an enhanced region of interest feature map;

performing feature manifold modulation on the enhanced region of interest feature map to obtain a classification feature map;

and the classification characteristic diagram is passed through a classifier to obtain a classification result, wherein the classification result is used for indicating whether personnel are not provided with safety helmets.

Compared with the prior art, the safety monitoring system and the method based on the deep learning are characterized in that the monitoring image of the construction site is acquired through the camera, the image processing technology based on the deep learning is utilized to frame the head area of the worker on the monitoring image, and the high-dimensional implicit characteristic information of the head area of the worker is extracted after the image is subjected to the definition processing so as to judge whether the person does not take the safety helmet. Therefore, the efficiency and timeliness of safety monitoring management can be effectively improved, and the workload of manual inspection is reduced.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

FIG. 1 is a block diagram of a deep learning based security monitoring system in accordance with an embodiment of the present application.

Fig. 2 is a schematic architecture diagram of a deep learning based security monitoring system according to an embodiment of the present application.

FIG. 3 is a block diagram of a feature enhancement module in a deep learning based security monitoring system in accordance with an embodiment of the present application.

Fig. 4 is a block diagram of a spatial attention unit in a deep learning based safety monitoring system according to an embodiment of the present application.

FIG. 5 is a block diagram of a channel attention unit in a deep learning based security monitoring system in accordance with an embodiment of the present application.

Fig. 6 is a flow chart of a deep learning based security monitoring method according to an embodiment of the application.

Detailed Description

Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

Summary of the application

As described above, in the field, the safety helmet has an important meaning for the safety of workers, but there are still a number of offences that the workers have a lucky mind and that the safety helmet is not worn, and the offences are frequent in the field. Because the construction operation area of the construction site is large and the environment is complex, the traditional construction site management mode depends on manual field inspection, the time consumption is long, the process is tedious and repeated, the timeliness of illegal behaviors and abnormal state discovery is poor, and the comprehensive and whole-process safety inspection and control cannot be realized. Accordingly, a deep learning based safety monitoring system and method are desired.

At present, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, speech signal processing, and the like. In addition, deep learning and neural networks have also shown levels approaching and even exceeding humans in the fields of image classification, object detection, semantic segmentation, text translation, and the like.

In recent years, deep learning and development of neural networks have provided new solutions and solutions for safety monitoring on sites.

Accordingly, the existing detection on whether the worker wears the safety helmet on the site is usually performed by installing a camera on the site, but the mode still requires the worker to view the monitoring video in real time for distinguishing, so that the efficiency is low and the situation that supervision is not in place exists. Based on the above, in the technical scheme of the application, the monitoring image of the construction site is acquired through the camera, the image processing technology based on deep learning is utilized to frame the head area of the worker on the monitoring image, and the high-dimensional implicit characteristic information of the head area of the worker is extracted after the image is subjected to the definition processing so as to judge whether the personnel do not carry the safety helmet. Therefore, the efficiency and timeliness of safety monitoring management can be effectively improved, and the workload of manual inspection is reduced.

Specifically, in the technical scheme of the application, firstly, a site person monitors an image acquired by a camera deployed at a site. Then, considering that a large number of irrelevant environmental features exist in the image, the recognition of the head of the worker is interfered, and therefore, the construction site personnel monitoring image is further subjected to framing of the head area of the person in the image through the target detection network, so that at least one region of interest is obtained. It should be appreciated that object detection networks are a common computer vision technique that is capable of detecting and locating a particular object of interest in an image. In the technical scheme of the application, the personnel target in the monitoring image of the personnel on the construction site is detected through the target detection network, and the head area of the personnel is found. By the framing of the region of interest, attention can be focused on the region associated with the helmet, thereby reducing the computational effort of subsequent processing and improving the efficiency of the processing.

Next, the image may suffer from blurring, noise or detail ambiguity, etc. considering the complexity due to the site environment conditions. These problems may affect the feature extraction and classification of the region of interest in subsequent steps. Therefore, the region of interest is further processed by an image sharpness enhancer based on the countermeasure generation network to improve the image quality and sharpness of the region of interest, thereby obtaining a sharpened region of interest. It should be appreciated that an image sharpness enhancer based on a countermeasure generation network (GAN) is a machine learning method that can improve sharpness and visual effects of images by learning statistical properties and deblurring techniques of the images, better expressing details and features in the region of interest. In particular, here, the countermeasure-based generation network includes a discriminator for generating an image with enhanced image sharpness and a generator for calculating a difference between the image with enhanced data and a real image and updating network parameters of the generator by a gradient descent direction propagation algorithm to obtain the generator with enhanced image sharpness.

And then, carrying out feature mining on the clear region of interest through a convolutional neural network model containing a significant target detector so as to obtain a region of interest feature map. Convolutional Neural Networks (CNNs) are a deep learning model that is widely used in the field of computer vision, and that can effectively learn features in images and extract advanced representations of the images through layer-by-layer convolution and pooling operations. However, the method has large receptive field, the extracted pattern mode is rough, and details with resolution in the characteristic diagram are easily ignored. Whereas the saliency feature detector uses two convolution kernels per layer, one large and one small. By adding a small-scale convolution kernel after the traditional convolution, from the aspect of cross-channel pooling, the method is equivalent to implementing cascade cross-channel weighted pooling on a normal convolution layer, so that a model can learn the relation between channels, and better model local information.

And then, the information related to the safety helmet in the feature map of the region of interest is further enhanced through a residual double-attention mechanism model, and the representation capability and the discrimination are improved, so that the feature map of the region of interest is enhanced. The residual dual-attention mechanism model is an attention mechanism model for enhancing feature representation, and can automatically learn the importance degree of each part of features in the feature map of the region of interest, adaptively apply different weights to different features to enhance the representation of important features and inhibit unimportant background noise, so that feature information related to a safety helmet is better captured, and the quality and the expression capability of the feature map of the region of interest are improved.

In particular, it is contemplated that feature manifold modulation may enhance correlation between features by introducing a nonlinear transformation. In the region of interest there may be a plurality of relevant features, but they may not have obvious relevance in the original feature map. By means of feature manifold modulation, the relevant features can be mapped into a space with higher dimension, so that the relevant features are more closely related in a new feature map, and the expression capacity of the features is improved. At the same time, feature manifold modulation may also enhance the distinguishability of features. In the region of interest there may be some subtle but important feature differences that may not be sufficiently apparent in the original feature map. These subtle differences can be mapped into higher dimensional spaces by feature manifold modulation, making them more visible and separable in the new feature map, thereby improving the distinguishability of the features. And, feature manifold modulation can improve the expressive power of features. In the region of interest there may be some complex feature patterns or structures that may not be well represented in the original feature map. By means of feature manifold modulation, these complex feature patterns can be mapped into a higher dimensional space, making them easier to capture and represent in new feature maps, thereby improving the expressive power of the features.

Specifically, performing feature manifold modulation on the enhanced region of interest feature map to obtain a classification feature map, including: aiming at each pixel point of the enhanced interest region feature map, taking a channel feature vector corresponding to each pixel point as a feature descriptor of each pixel point; calculating KL divergence values among feature descriptors of the pixel points to obtain a pixel-level topological association matrix composed of a plurality of KL divergence values; the pixel-level topological association matrix passes through a topological feature extractor based on a convolution layer to obtain a pixel-level topological association feature matrix; inputting the pixel-level topological association feature matrix into a Softmax activation function to obtain a probabilistic pixel-level topological association feature matrix; and taking the probabilistic pixel-level topological association feature matrix as a weight matrix, and multiplying each feature matrix of the enhanced interest region feature map along the channel dimension according to position points to obtain the classification feature map.

In the technical scheme of the application, channel feature vectors of all pixels of the enhanced region-of-interest feature map are used as feature descriptors of all pixel points, KL divergence values among the feature descriptors are used as feature level implicit association information of two pixels of the enhanced region-of-interest feature map, and then high-dimensional implicit association mode features among pixel level feature association information of the enhanced region-of-interest feature map are captured through convolution coding and activation processing, so that the pixel level feature association information of the enhanced region-of-interest feature map is fully utilized to optimize granularity and certainty of feature expression of the enhanced region-of-interest feature map.

And finally, the classification characteristic diagram passes through a classifier to judge whether the situation that the personnel wear the safety helmet exists or not. Therefore, whether the safety helmet is worn by the worker on the site is automatically judged by using a machine learning method, manual checking is not needed one by one, the efficiency of safety management on the site is greatly improved, and the condition that the worker does not wear the safety helmet can be timely found and solved, so that potential safety risks and accidents are prevented.

Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.

Exemplary System

FIG. 1 is a block diagram of a deep learning based security monitoring system in accordance with an embodiment of the present application. As shown in fig. 1, a deep learning based security monitoring system 100 according to an embodiment of the present application includes: a monitoring module 110 for acquiring a monitoring image of a worksite personnel acquired by a camera deployed at the worksite; a personnel target detection module 120, configured to pass the worksite personnel monitoring image through a target detection network to obtain at least one region of interest; a region of interest pixel enhancement module 130 for passing the region of interest through an image sharpness enhancer based on a countermeasure generation network to obtain a sharpened region of interest; the feature extraction module 140 is configured to pass the clarified region of interest through a convolutional neural network model including a significant object detector to obtain a feature map of the region of interest; the feature enhancement module 150 is configured to pass the region of interest feature map through a residual dual-attention mechanism model to obtain an enhanced region of interest feature map; an optimization module 160, configured to perform feature manifold modulation on the enhanced region of interest feature map to obtain a classification feature map; the safety monitoring result generating module 170 is configured to pass the classification feature map through a classifier to obtain a classification result, where the classification result is used to indicate whether a person does not take a helmet.

Fig. 2 is a schematic architecture diagram of a deep learning based security monitoring system according to an embodiment of the present application. As shown in fig. 2, first, a worksite personnel monitoring image acquired by a camera disposed at a worksite is acquired. The worksite personnel monitoring image is then passed through an object detection network to obtain at least one region of interest. The region of interest is then passed through an image sharpness enhancer that generates a network based on the countermeasure to obtain a sharpened region of interest. And secondly, passing the clarified region of interest through a convolutional neural network model containing a significant target detector to obtain a region of interest feature map. And then, the region of interest feature map is passed through a residual dual-attention mechanism model to obtain an enhanced region of interest feature map. And then, carrying out feature manifold modulation on the enhanced region of interest feature map to obtain a classification feature map. And finally, the classification characteristic diagram is passed through a classifier to obtain a classification result, wherein the classification result is used for indicating whether personnel are not provided with safety helmets.

In the deep learning based safety monitoring system 100, the monitoring module 110 is configured to acquire a monitoring image of a worksite person acquired by a camera disposed at a worksite. As described above, in the background art, safety helmets are important for the safety of workers on the construction site, but there are still some kinds of safety helmets, and the offensiveness of not wearing safety helmets is common on the construction site. Because the construction operation area of the construction site is large and the environment is complex, the traditional construction site management mode depends on manual field inspection, the time consumption is long, the process is tedious and repeated, the timeliness of illegal behaviors and abnormal state discovery is poor, and the comprehensive and whole-process safety inspection and control cannot be realized. It is therefore desirable to be able to intelligently monitor whether a worker wears a helmet.

Accordingly, the existing detection on whether the worker wears the safety helmet on the site is usually performed by installing a camera on the site, but the mode still requires the worker to view the monitoring video in real time for distinguishing, so that the efficiency is low and the situation that supervision is not in place exists. Based on the above, in the technical scheme of the application, the monitoring image of the construction site is acquired through the camera, the image processing technology based on deep learning is utilized to frame the head area of the worker on the monitoring image, and the high-dimensional implicit characteristic information of the head area of the worker is extracted after the image is subjected to the definition processing, so that whether the person does not take the safety helmet or not is automatically judged. Therefore, the efficiency and timeliness of safety monitoring management can be effectively improved, and the workload of manual inspection is reduced. Specifically, in the technical scheme of the application, firstly, a site person monitors an image acquired by a camera deployed at a site.

In the deep learning based safety monitoring system 100, the personnel object detection module 120 is configured to pass the worksite personnel monitoring image through an object detection network to obtain at least one region of interest. In consideration of the fact that a large number of irrelevant environmental features exist in the image, interference can be caused to the recognition of the head of a worker, the construction site personnel monitoring image is further subjected to framing of the head area of the person in the image through the target detection network, and at least one region of interest is obtained. It should be appreciated that object detection networks are a common computer vision technique that is capable of detecting and locating a particular object of interest in an image. In the technical scheme of the application, the personnel target in the monitoring image of the personnel on the construction site is detected through the target detection network, and the head area of the personnel is found. By the framing of the region of interest, attention can be focused on the region associated with the helmet, thereby reducing the computational effort of subsequent processing and improving the efficiency of the processing. Specifically, the target detection network is an anchor window-based target detection network, and the anchor window-based target detection network is Fast R-CNN, fast R-CNN or RetinaNet.

Accordingly, in one specific example, the person target detection module 120 includes: the convolution coding unit is used for enabling the monitoring image of the site personnel to pass through a plurality of convolution layers of the target detection network so as to obtain a target detection feature map; the target detection unit is used for processing the target detection characteristic map by using the target detection network based on the anchor window according to the following target detection formula so as to obtain the at least one region of interest;

wherein, the target detection formula is:

ROI＝H(ψ _det ，B)＝(cls(ψ _det ，B)，Regr(ψ _det ，B))

In the deep learning based safety monitoring system 100, the region of interest pixel enhancement module 130 is configured to pass the region of interest through an image sharpness enhancer based on a challenge-generating network to obtain a sharpened region of interest. In view of the complexity of the site environment, there may be problems with blurring, noise or unclear details of the image. These problems may affect the feature extraction and classification of the region of interest in subsequent steps. Therefore, the region of interest is further processed by an image sharpness enhancer based on the countermeasure generation network to improve the image quality and sharpness of the region of interest, thereby obtaining a sharpened region of interest. It should be appreciated that an image sharpness enhancer based on a countermeasure generation network (GAN) is a machine learning method that can improve sharpness and visual effects of images by learning statistical properties and deblurring techniques of the images, better expressing details and features in the region of interest. Specifically, here, the antagonism-based generation network includes a discriminator for performing deconvolution processing on the region of interest to obtain the sharpened region of interest, and a generator for calculating a difference between the data-enhanced image and the real image and updating network parameters of the generator by a gradient descent direction propagation algorithm to obtain the generator having an image sharpness enhancement function.

In the deep learning based safety monitoring system 100, the feature extraction module 140 is configured to pass the sharpened region of interest through a convolutional neural network model including a salient object detector to obtain a feature map of the region of interest. Convolutional Neural Networks (CNNs) are a deep learning model that is widely used in the field of computer vision, and that can effectively learn features in images and extract advanced representations of the images through layer-by-layer convolution and pooling operations. However, the method has the advantages that the receptive field is large, the extracted pattern mode is rough, and details with resolution in the characteristic diagram are easily ignored. Whereas the saliency feature detector uses two convolution kernels per layer, one large and one small. By adding a small-scale convolution kernel after the traditional convolution, from the aspect of cross-channel pooling, the method is equivalent to implementing cascade cross-channel weighted pooling on a normal convolution layer, so that a model can learn the relation between channels, and better model local information.

Accordingly, in one specific example, the feature extraction module 140 is configured to: each layer of the convolutional neural network model containing the salient object detector is used for respectively carrying out input data in forward transfer of the layer: performing convolution processing on the input data by using a first convolution kernel to obtain a first convolution feature map; performing convolution processing on the first convolution feature map by using a second convolution kernel to obtain a second convolution feature map, wherein the size of the first convolution kernel is larger than that of the second convolution kernel; carrying out mean value pooling processing based on a local feature matrix on the second convolution feature map to obtain a pooled feature map; performing activation processing on the pooled feature map to obtain an activated feature map; the output of the last layer of the first convolutional neural network model is the region of interest feature map, and the input of the first layer of the first convolutional neural network model is the sharpened region of interest.

In the deep learning based safety monitoring system 100, the feature enhancement module 150 is configured to pass the region of interest feature map through a residual dual-attention mechanism model to obtain an enhanced region of interest feature map. The residual dual-attention mechanism model is an attention mechanism model for enhancing feature representation, and can automatically learn the importance degree of each part of features in the feature map of the region of interest, adaptively apply different weights to different features to enhance the representation of important features and inhibit unimportant background noise, so that feature information related to a safety helmet is better captured, and the quality and the expression capability of the feature map of the region of interest are improved.

FIG. 3 is a block diagram of a feature enhancement module in a deep learning based security monitoring system in accordance with an embodiment of the present application. As shown in fig. 3, the feature enhancement module 150 includes: a spatial attention unit 151 for inputting the region of interest feature map into a spatial attention module of the residual dual-attention mechanism model to obtain a spatial attention map; a channel attention unit 152, configured to input the region of interest feature map into a channel attention module of the residual dual-attention mechanism model to obtain a channel attention map; an attention fusion unit 153 for fusing the spatial attention profile and the channel attention profile to obtain a fused attention profile; an activating unit 154, configured to activate the fused attention map by inputting a Sigmoid activating function to obtain a fused attention profile; an attention applying unit 155 for calculating a weighted feature map obtained by multiplying the fused attention feature map and the region of interest feature map by the position points; and a residual fusion unit 156, configured to fuse the weighted feature map and the region of interest feature map to obtain the enhanced region of interest feature map.

Fig. 4 is a block diagram of a spatial attention unit in a deep learning based safety monitoring system according to an embodiment of the present application. As shown in fig. 4, the spatial attention unit 151 includes: a spatial perception subunit 1511, configured to convolutionally encode the region of interest feature map using a convolution layer of a spatial attention module of the residual dual-attention mechanism model to obtain an initial convolution feature map; a probabilizing subunit 1512, configured to pass the initial convolution feature map through a Softmax function to obtain a spatial attention score map; a spatial attention applying subunit 1513 is configured to multiply the spatial attention score map and the region of interest feature map by location points to obtain the spatial attention map.

FIG. 5 is a block diagram of a channel attention unit in a deep learning based security monitoring system in accordance with an embodiment of the present application. As shown in fig. 5, the channel attention unit 152 includes: a channel dimension pooling subunit 1521, configured to pool the region of interest feature map along a global average of channel dimensions to obtain a channel feature vector; a nonlinear activation subunit 1522, configured to obtain a channel weight feature vector by using a Softmax activation function on the channel feature vector; a channel attention applying subunit 1523, configured to weight each feature matrix of the feature map of the region of interest along the channel dimension with the feature value of each position in the channel weight feature vector as a weight to obtain the channel attention map.

In the deep learning-based safety monitoring system 100, the optimizing module 160 is configured to perform feature manifold modulation on the enhanced region of interest feature map to obtain a classification feature map. In particular, it is contemplated that feature manifold modulation may enhance correlation between features by introducing a nonlinear transformation. In the region of interest there may be a plurality of relevant features, but they may not have obvious relevance in the original feature map. By means of feature manifold modulation, the relevant features can be mapped into a space with higher dimension, so that the relevant features are more closely related in a new feature map, and the expression capacity of the features is improved. At the same time, feature manifold modulation may also enhance the distinguishability of features. In the region of interest there may be some subtle but important feature differences that may not be sufficiently apparent in the original feature map. These subtle differences can be mapped into higher dimensional spaces by feature manifold modulation, making them more visible and separable in the new feature map, thereby improving the distinguishability of the features. And, feature manifold modulation can improve the expressive power of features. In the region of interest there may be some complex feature patterns or structures that may not be well represented in the original feature map. By means of feature manifold modulation, these complex feature patterns can be mapped into a higher dimensional space, making them easier to capture and represent in new feature maps, thereby improving the expressive power of the features.

Specifically, the optimization module 160 includes: the feature descriptor construction unit is used for aiming at each pixel point of the enhanced interest region feature map, and takes the channel feature vector corresponding to each pixel point as a feature descriptor of each pixel point; a KL divergence calculating unit, configured to calculate KL divergence values between feature descriptors of the respective pixel points to obtain a pixel-level topology association matrix composed of a plurality of KL divergence values; the topological feature extraction unit is used for enabling the pixel-level topological association matrix to pass through a topological feature extractor based on a convolution layer to obtain a pixel-level topological association feature matrix; the probability unit is used for inputting the pixel-level topological association feature matrix into a Softmax activation function to obtain a probability pixel-level topological association feature matrix; and the weight applying unit is used for taking the probabilistic pixel-level topological association feature matrix as a weight matrix and multiplying each feature matrix along the channel dimension of the enhanced interest region feature map by position points to obtain the classification feature map.

In the deep learning-based safety monitoring system 100, the safety monitoring result generating module 170 is configured to pass the classification feature map through a classifier to obtain a classification result, where the classification result is used to indicate whether a person is not provided with a helmet. Therefore, whether the safety helmet is worn by the worker on the site is automatically judged by using a machine learning method, manual checking is not needed one by one, the efficiency of safety management on the site is greatly improved, and the condition that the worker does not wear the safety helmet can be timely found and solved, so that potential safety risks and accidents are prevented.

In summary, the safety monitoring system based on deep learning according to the embodiment of the application is explained, which collects a monitoring image of a construction site through a camera, frames the monitoring image into a head area of a worker by using an image processing technology based on deep learning, and extracts high-dimensional implicit characteristic information of the head area of the worker after the image is subjected to the definition processing so as to judge whether the person does not take a safety helmet. Therefore, the efficiency and timeliness of safety monitoring management can be effectively improved, and the workload of manual inspection is reduced.

Exemplary method

Fig. 6 is a flow chart of a deep learning based security monitoring method according to an embodiment of the application. As shown in fig. 6, the safety monitoring method based on deep learning according to the embodiment of the application comprises the following steps: s110, acquiring a monitoring image of a site personnel acquired by a camera deployed at the site; s120, passing the monitoring image of the site personnel through a target detection network to obtain at least one region of interest; s130, enabling the region of interest to pass through an image sharpness enhancer based on a countermeasure generation network to obtain a sharpened region of interest; s140, the clear region of interest is passed through a convolutional neural network model containing a significant target detector to obtain a region of interest feature map; s150, the region of interest feature map is subjected to a residual double-attention mechanism model to obtain an enhanced region of interest feature map; s160, carrying out feature manifold modulation on the enhanced region of interest feature map to obtain a classification feature map; and S170, the classification characteristic diagram is passed through a classifier to obtain a classification result, wherein the classification result is used for indicating whether personnel are not provided with safety helmets.

In a specific example, in the deep learning-based security monitoring method, the target detection network is an anchor window-based target detection network, and the anchor window-based target detection network is Fast R-CNN, or RetinaNet.

In a specific example, in the above-mentioned safety monitoring method based on deep learning, the step S120 of passing the site personnel monitoring image through an object detection network to obtain at least one region of interest includes: passing the monitoring image of the site personnel through a plurality of convolution layers of the target detection network to obtain a target detection feature map; and processing the object detection feature map with the following object detection formula using the anchor window-based object detection network to obtain the at least one region of interest;

wherein, the target detection formula is:

ROI＝H(ψ _det ，B)＝(cls(ψ _det ，B)，Regr(ψ _det ，B))

In a specific example, in the deep learning-based security monitoring method, the countermeasure generation network includes a discriminator and a generator, wherein the step S130 includes: inputting the region of interest into a generator of the image sharpness enhancer based on the countermeasure generation network to deconvolute the region of interest by the generator to obtain the sharpened region of interest.

In a specific example, in the above deep learning-based safety monitoring method, the step S140 of passing the sharpened region of interest through a convolutional neural network model including a salient object detector to obtain a region of interest feature map includes: each layer of the convolutional neural network model containing the salient object detector is used for respectively carrying out input data in forward transfer of the layer: performing convolution processing on the input data by using a first convolution kernel to obtain a first convolution feature map; performing convolution processing on the first convolution feature map by using a second convolution kernel to obtain a second convolution feature map, wherein the size of the first convolution kernel is larger than that of the second convolution kernel; carrying out mean value pooling processing based on a local feature matrix on the second convolution feature map to obtain a pooled feature map; performing activation processing on the pooled feature map to obtain an activated feature map; the output of the last layer of the first convolutional neural network model is the region of interest feature map, and the input of the first layer of the first convolutional neural network model is the sharpened region of interest.

In a specific example, in the above-mentioned safety monitoring method based on deep learning, the step S150 of passing the region of interest feature map through a residual dual-attention mechanism model to obtain an enhanced region of interest feature map includes: inputting the region of interest feature map into a spatial attention module of the residual dual-attention mechanism model to obtain a spatial attention map; inputting the region of interest feature map into a channel attention module of the residual dual-attention mechanism model to obtain a channel attention map; fusing the spatial attention map and the channel attention map to obtain a fused attention map; activating the fusion attention try to input a Sigmoid activation function to obtain a fusion attention feature map; calculating the weighted feature map obtained by multiplying the position-based points of the fused attention feature map and the region-of-interest feature map; and fusing the weighted feature map and the region of interest feature map to obtain the enhanced region of interest feature map.

In a specific example, in the above deep learning based safety monitoring method, inputting the region of interest feature map into a spatial attention module of the residual dual-attention mechanism model to obtain a spatial attention map includes: performing convolutional encoding on the region of interest feature map by using a convolutional layer of a spatial attention module of the residual double-attention mechanism model to obtain an initial convolutional feature map; the initial convolution characteristic diagram is subjected to a Softmax function to obtain a spatial attention score diagram; the spatial attention score map and the region of interest feature map are multiplied by position points to obtain the spatial attention map.

In a specific example, in the above deep learning based safety monitoring method, inputting the region of interest feature map into a channel attention module of the residual dual-attention mechanism model to obtain a channel attention map includes: carrying out global average pooling on the region of interest feature map along the channel dimension to obtain a channel feature vector; the channel feature vector is subjected to a Softmax activation function to obtain a channel weight feature vector; and weighting each feature matrix of the region of interest feature map along the channel dimension by taking the feature value of each position in the channel weight feature vector as a weight to obtain the channel attention map.

In a specific example, in the above-mentioned safety monitoring method based on deep learning, the step S160 performs feature manifold modulation on the enhanced region of interest feature map to obtain a classification feature map, including: aiming at each pixel point of the enhanced interest region feature map, taking a channel feature vector corresponding to each pixel point as a feature descriptor of each pixel point; calculating KL divergence values among feature descriptors of the pixel points to obtain a pixel-level topological association matrix composed of a plurality of KL divergence values; the pixel-level topological association matrix passes through a topological feature extractor based on a convolution layer to obtain a pixel-level topological association feature matrix; inputting the pixel-level topological association feature matrix into a Softmax activation function to obtain a probabilistic pixel-level topological association feature matrix; and taking the probabilistic pixel-level topological association feature matrix as a weight matrix, and multiplying each feature matrix of the enhanced interest region feature map along the channel dimension according to position points to obtain the classification feature map.

Here, it will be understood by those skilled in the art that the specific operations of the respective steps in the above-described deep learning-based safety monitoring method have been described in detail in the above description of the deep learning-based safety monitoring system with reference to fig. 1 to 5, and thus, repetitive descriptions thereof will be omitted.

Claims

1. A deep learning based security monitoring system, comprising:

2. The deep learning based security monitoring system of claim 1, wherein the object detection network is an anchor window based object detection network that is Fast R-CNN, or RetinaNet.

3. The deep learning based security monitoring system of claim 2, wherein the person target detection module comprises:

the convolution coding unit is used for enabling the monitoring image of the site personnel to pass through a plurality of convolution layers of the target detection network so as to obtain a target detection feature map;

the target detection unit is used for processing the target detection characteristic map by using the target detection network based on the anchor window according to the following target detection formula so as to obtain the at least one region of interest;

wherein, the target detection formula is:

ROI＝H(ψ _det ,B)＝(cls(ψ _det ,B)，Regr(Ψ _det ,B))

wherein ψ is _det Representing the object detection feature map, B representing an anchor window, ROI representing the region of interest, cls (ψ _det B) represents a classification function, regr (ψ) _det B) represents a regression function.

4. A deep learning based security monitoring system according to claim 3, wherein the countermeasure generation network comprises a discriminator and a generator, wherein the region of interest pixel enhancement module is configured to input the region of interest into the generator of the image sharpness enhancer of the countermeasure generation network to deconvolute the region of interest by the generator to obtain the sharpened region of interest.

5. The deep learning based security monitoring system of claim 4, wherein the feature extraction module is configured to: each layer of the convolutional neural network model containing the salient object detector is used for respectively carrying out input data in forward transfer of the layer:

performing convolution processing on the input data by using a first convolution kernel to obtain a first convolution feature map;

performing convolution processing on the first convolution feature map by using a second convolution kernel to obtain a second convolution feature map, wherein the size of the first convolution kernel is larger than that of the second convolution kernel;

carrying out mean value pooling processing based on a local feature matrix on the second convolution feature map to obtain a pooled feature map;

performing activation processing on the pooled feature map to obtain an activated feature map;

the output of the last layer of the first convolutional neural network model is the region of interest feature map, and the input of the first layer of the first convolutional neural network model is the sharpened region of interest.

6. The deep learning based security monitoring system of claim 5, wherein the feature enhancement module comprises:

A spatial attention unit, configured to input the region of interest feature map into a spatial attention module of the residual dual-attention mechanism model to obtain a spatial attention map;

a channel attention unit, configured to input the region of interest feature map into a channel attention module of the residual dual-attention mechanism model to obtain a channel attention map;

an attention fusion unit for fusing the spatial attention map and the channel attention map to obtain a fused attention map;

the activating unit is used for inputting the fusion attention map into a Sigmoid activating function to activate so as to obtain a fusion attention feature map;

an attention applying unit, configured to calculate a weighted feature map obtained by multiplying the fused attention feature map and the region of interest feature map by position points;

and the residual fusion unit is used for fusing the weighted feature map and the region-of-interest feature map to obtain the enhanced region-of-interest feature map.

7. The deep learning based safety monitoring system of claim 6, wherein the spatial attention unit comprises:

the spatial perception subunit is used for carrying out convolution encoding on the region of interest feature map by using a convolution layer of a spatial attention module of the residual double-attention mechanism model so as to obtain an initial convolution feature map;

A probability subunit, configured to pass the initial convolution feature map through a Softmax function to obtain a spatial attention score map;

and the spatial attention applying subunit is used for multiplying the spatial attention score graph and the region of interest feature graph by position points to obtain the spatial attention graph.

8. The deep learning based safety monitoring system of claim 7, wherein the channel attention unit comprises:

the channel dimension pooling subunit is used for carrying out global average pooling along the channel dimension on the region of interest feature map so as to obtain a channel feature vector;

a nonlinear activation subunit, configured to pass the channel feature vector through a Softmax activation function to obtain a channel weight feature vector;

and the channel attention applying subunit is used for weighting each characteristic matrix of the region of interest characteristic diagram along the channel dimension by taking the characteristic value of each position in the channel weight characteristic vector as a weight so as to obtain the channel attention map.

9. The deep learning based security monitoring system of claim 8, wherein the optimization module comprises:

the feature descriptor construction unit is used for aiming at each pixel point of the enhanced interest region feature map, and takes the channel feature vector corresponding to each pixel point as a feature descriptor of each pixel point;

A KL divergence calculating unit, configured to calculate KL divergence values between feature descriptors of the respective pixel points to obtain a pixel-level topology association matrix composed of a plurality of KL divergence values;

the topological feature extraction unit is used for enabling the pixel-level topological association matrix to pass through a topological feature extractor based on a convolution layer to obtain a pixel-level topological association feature matrix;

the probability unit is used for inputting the pixel-level topological association feature matrix into a Softmax activation function to obtain a probability pixel-level topological association feature matrix;

and the weight applying unit is used for taking the probabilistic pixel-level topological association feature matrix as a weight matrix and multiplying each feature matrix along the channel dimension of the enhanced interest region feature map by position points to obtain the classification feature map.

10. A safety monitoring method based on deep learning, comprising: