CN108921159B

CN108921159B - Method and device for detecting wearing condition of safety helmet

Info

Publication number: CN108921159B
Application number: CN201810836779.9A
Authority: CN
Inventors: 孟泉; 周淼; 王蔚; 范竣翔; 陈科第
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2022-09-27
Anticipated expiration: 2038-07-26
Also published as: CN108921159A

Abstract

The embodiment of the application discloses a method and a device for detecting wearing conditions of a safety helmet. One embodiment of the method comprises: acquiring a video obtained by shooting a construction site; selecting a video frame from the video as a first video frame to be detected, wherein the first video frame to be detected comprises a constructor to be detected and a helmet to be detected; inputting a first video frame to be detected into a pre-trained target detection model to obtain position information of a constructor to be detected and a safety helmet to be detected in the first video frame to be detected, wherein the target detection model is used for detecting the position information of the constructor and the safety helmet; and generating the wearing condition of the safety helmet to be detected based on the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected. This embodiment improves the detection accuracy of the wearing condition of the helmet.

Description

Method and device for detecting wearing condition of safety helmet

Technical Field

The embodiment of the application relates to the technical field of image recognition, in particular to a method and a device for detecting wearing conditions of a safety helmet.

Background

In construction sites such as construction sites or maintenance bases, the phenomenon of falling objects from high altitude sometimes occurs. If constructors are directly impacted by objects falling from the air, casualties can be caused. The safety helmet is used as a personal head protection tool, can effectively buffer the impact force of a falling object to the head of a constructor, so as to eliminate or reduce the impact injury degree of the impact force directly to the head of the constructor, and guarantee the personal safety of the constructor to a certain extent. Accordingly, when a worker enters a construction site, the worker is often required to wear a helmet. However, some constructors do not wear the helmet as required due to low awareness of safety. In order to eliminate the potential safety hazard, the real-time monitoring of the wearing condition of the safety helmet becomes important.

The wearing condition detection modes of the traditional safety helmet mainly comprise the following two modes: firstly, a security officer performs spot inspection on constructors on a construction site to determine the wearing condition of the safety helmet; secondly, shooting a construction site to obtain a video, and analyzing the color of the head area of the constructor in the video frame to determine whether the constructor wears the safety helmet or not.

Disclosure of Invention

The embodiment of the application provides a method and a device for detecting wearing conditions of a safety helmet.

In a first aspect, an embodiment of the present application provides a method for detecting a wearing condition of a safety helmet, including: acquiring a video obtained by shooting a construction site; selecting a video frame from the video as a first video frame to be detected, wherein the first video frame to be detected comprises a constructor to be detected and a helmet to be detected; inputting a first video frame to be detected into a pre-trained target detection model to obtain position information of a constructor to be detected and a safety helmet to be detected in the first video frame to be detected, wherein the target detection model is used for detecting the position information of the constructor and the safety helmet; and generating the wearing condition of the to-be-detected safety helmet based on the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame.

In some embodiments, the target detection model includes a deep neural network and a set of convolutional layers; inputting the first video frame to be detected into a pre-trained target detection model to obtain the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected, and the method comprises the following steps: inputting a first video frame to be detected into a deep neural network to obtain a first feature map corresponding to the first video frame to be detected; inputting a first feature map corresponding to a first video frame to be detected into a convolutional layer set to obtain a second feature map output from a convolutional layer in the convolutional layer set, and generating a second feature map set; and generating the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected based on the first characteristic diagram and the second characteristic diagram set.

In some embodiments, generating the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame based on the first feature map and the second feature map set comprises: carrying out convolution on the first characteristic diagram and a second characteristic diagram in a second characteristic diagram set by using convolution kernel, and generating position information of the constructor to be detected and the safety helmet to be detected, which correspond to the first characteristic diagram, and position information of the constructor to be detected and the safety helmet to be detected, which correspond to the second characteristic diagram in the second characteristic diagram set, based on the convolution result; and combining the position information of the constructor to be detected and the safety helmet to be detected corresponding to the first characteristic diagram and the position information of the constructor to be detected and the safety helmet to be detected corresponding to the second characteristic diagram in the second characteristic diagram set to generate the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected.

In some embodiments, the target detection model is trained by: acquiring a training sample set, wherein the training sample comprises a sample image and position information of a sample constructor and a sample safety helmet in the sample image; and taking the sample images in the training samples in the training sample set as input, taking the position information of sample constructors and sample safety caps in the input sample images as output, and training to obtain a target detection model.

In some embodiments, generating the wearing condition of the to-be-detected safety helmet based on the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame includes: determining whether the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame meets a preset condition; and generating the wearing condition of the to-be-detected safety helmet in response to the fact that the preset condition is met, wherein the wearing condition of the to-be-detected safety helmet indicates that the to-be-detected constructor wears the to-be-detected safety helmet.

In some embodiments, generating the wearing condition of the to-be-detected helmet based on the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame further includes: in response to the fact that the preset condition is not met, selecting a video frame associated with the first video frame to be detected from the video, and generating a second video frame set to be detected, wherein the second video frame to be detected comprises a constructor to be detected and a safety helmet to be detected; inputting a second video frame to be detected in a second video frame set to be detected into a target detection model to obtain position information of a constructor to be detected and a safety helmet to be detected in the second video frame to be detected; determining whether the position information of the to-be-detected constructor and the to-be-detected safety helmet in the second to-be-detected video frame meets a preset condition; counting the proportion of a second video frame to be detected which does not meet the preset condition; and generating the wearing condition of the safety helmet to be detected based on the statistical result.

In some embodiments, generating the wearing condition of the helmet to be detected based on the statistical result includes: comparing the proportion of the second video frame to be detected which does not meet the preset condition with the preset proportion; if the proportion of the second to-be-detected video frames which do not meet the preset condition is not smaller than the preset proportion, indicating that the to-be-detected safety helmet is not worn by the to-be-detected constructor according to the wearing condition of the to-be-detected safety helmet; and if the proportion of the second to-be-detected video frame which does not meet the preset condition is smaller than the preset proportion, indicating the wearing condition of the to-be-detected safety helmet to a constructor to wear the to-be-detected safety helmet.

In some embodiments, the location information includes upper, lower, left, and right boundary coordinates of the bounding box; and the preset conditions include at least one of: the left boundary coordinate of the boundary frame of the safety helmet to be detected is larger than the left boundary coordinate of the boundary frame of the constructor to be detected; the right boundary coordinates of the boundary frame of the safety helmet to be detected are smaller than the right boundary coordinates of the boundary frame of the constructor to be detected; the difference between the upper boundary coordinate of the boundary frame of the to-be-detected construction worker, of which the upper boundary coordinate is larger than the first preset numerical value times, and the lower boundary coordinate of the boundary frame of the to-be-detected construction worker, of which the second preset numerical value times; and the lower boundary coordinate of the boundary frame of the safety helmet to be detected is smaller than the sum of the upper boundary coordinate of the boundary frame of the constructor to be detected, which is multiplied by a third preset value, and the lower boundary coordinate of the boundary frame of the constructor to be detected, which is multiplied by a fourth preset value.

In a second aspect, the present application provides an apparatus for detecting a wearing condition of a safety helmet, including: an acquisition unit configured to acquire a video obtained by shooting a construction site; the selecting unit is configured to select a video frame from the video as a first video frame to be detected, wherein the first video frame to be detected comprises a constructor to be detected and a safety helmet to be detected; the detection unit is configured to input a first video frame to be detected to a pre-trained target detection model to obtain position information of a constructor to be detected and a safety helmet to be detected in the first video frame to be detected, wherein the target detection model is used for detecting the position information of the constructor and the safety helmet; the generating unit is configured to generate wearing conditions of the safety helmet to be detected based on the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected.

In some embodiments, the target detection model includes a deep neural network and a set of convolutional layers; and the detection unit includes: the first detection subunit is configured to input the first video frame to be detected to the deep neural network, so as to obtain a first feature map corresponding to the first video frame to be detected; the second detection subunit is configured to input the first feature map corresponding to the first video frame to be detected into the convolutional layer set, obtain a second feature map output from a convolutional layer in the convolutional layer set, and generate a second feature map set; and the first generation subunit is configured to generate the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected based on the first characteristic diagram and the second characteristic diagram set.

In some embodiments, the first generating subunit comprises: the convolution module is configured to perform convolution on the first characteristic diagram and a second characteristic diagram in the second characteristic diagram set by utilizing convolution kernel, and generate position information of the constructor to be detected and the helmet to be detected, which correspond to the first characteristic diagram, and position information of the constructor to be detected and the helmet to be detected, which correspond to the second characteristic diagram in the second characteristic diagram set, based on the convolution result; and the merging module is configured to merge the position information of the constructor to be detected and the safety helmet to be detected, which correspond to the first characteristic diagram, and the position information of the constructor to be detected and the safety helmet to be detected, which correspond to the second characteristic diagram in the second characteristic diagram set, so as to generate the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected.

In some embodiments, the target detection model is trained by: acquiring a training sample set, wherein the training sample comprises a sample image and position information of a sample constructor and a sample safety helmet in the sample image; and taking the sample image in the training sample set as input, taking the position information of the sample constructor and the sample safety helmet in the input sample image as output, and training to obtain the target detection model.

In some embodiments, the generating unit comprises: a first determining subunit configured to determine whether the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame satisfies a preset condition; the second generation subunit is configured to generate wearing conditions of the to-be-detected safety helmet in response to determining that the preset condition is met, wherein the wearing conditions of the to-be-detected safety helmet indicate that the to-be-detected constructor wears the to-be-detected safety helmet.

In some embodiments, the generating unit further comprises: the third generation subunit is configured to select a video frame associated with the first video frame to be detected from the video in response to determining that the preset condition is not met, and generate a second video frame set to be detected, wherein the second video frame to be detected contains a constructor to be detected and a safety helmet to be detected; the second determining subunit is configured to input a second video frame to be detected in the second video frame set to be detected into the target detection model, so as to obtain position information of the constructor to be detected and the safety helmet to be detected in the second video frame to be detected; determining whether the position information of the to-be-detected constructor and the to-be-detected safety helmet in the second to-be-detected video frame meets a preset condition; the statistic subunit is configured to count the proportion of the second video frame to be detected which does not meet the preset condition; and the fourth generation subunit is configured to generate the wearing condition of the helmet to be detected based on the statistical result.

In some embodiments, the fourth generation subunit is further configured to: comparing the proportion of the second video frame to be detected which does not meet the preset condition with the preset proportion; if the proportion of the second to-be-detected video frames which do not meet the preset condition is not smaller than the preset proportion, indicating that the to-be-detected safety helmet is not worn by the to-be-detected constructor according to the wearing condition of the to-be-detected safety helmet; and if the proportion of the second to-be-detected video frame which does not meet the preset condition is smaller than the preset proportion, indicating the wearing condition of the to-be-detected safety helmet to a constructor to wear the to-be-detected safety helmet.

In some embodiments, the location information includes upper, lower, left, and right boundary coordinates of the bounding box; and the preset conditions include at least one of: the left boundary coordinate of the boundary frame of the safety helmet to be detected is larger than the left boundary coordinate of the boundary frame of the constructor to be detected; the right boundary coordinate of the boundary frame of the safety helmet to be detected is smaller than the right boundary coordinate of the boundary frame of the constructor to be detected; the difference between the upper boundary coordinate of the boundary frame of the to-be-detected construction worker, of which the upper boundary coordinate is larger than the first preset numerical value times, and the lower boundary coordinate of the boundary frame of the to-be-detected construction worker, of which the second preset numerical value times; and the lower boundary coordinate of the boundary frame of the to-be-detected safety helmet is smaller than the sum of the upper boundary coordinate of the boundary frame of the to-be-detected constructor with the third preset numerical value times and the lower boundary coordinate of the boundary frame of the to-be-detected constructor with the fourth preset numerical value times.

In a third aspect, an embodiment of the present application provides a server, where the server includes: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

According to the method and the device for detecting the wearing condition of the safety helmet, a first video frame to be detected is selected from videos obtained by shooting a construction site; then inputting the first video frame to be detected into the target detection model to obtain the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected; and finally, analyzing the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame to determine the wearing condition of the to-be-detected safety helmet. The wearing condition of the safety helmet to be detected is determined based on the position information of the constructor to be detected and the safety helmet to be detected, which are detected by the target detection model, so that the detection accuracy of the wearing condition of the safety helmet is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture to which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for detecting the fit of a hard hat according to the present application;

FIG. 3 is a flow chart of yet another embodiment of a method for detecting the fit of a headgear according to the present application;

FIG. 4A is a schematic diagram of a standard convolutional layer;

FIG. 4B is a schematic view of a depth convolution layer;

FIG. 4C is a schematic view of a dot convolution layer;

FIG. 4D is a schematic diagram of an object detection model;

FIG. 5 is a flow chart of yet another embodiment of a method for detecting the fit of a hard hat according to the present application;

FIG. 6 is a schematic structural diagram of one embodiment of an apparatus for detecting the wear of a protective headgear according to the present application;

FIG. 7 is a block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the present method for detecting the wearing condition of a crash helmet or an apparatus for detecting the wearing condition of a crash helmet may be applied.

As shown in fig. 1, the system architecture 100 may include photographing

devices

101, 102, 103, a network 104, and a server 105. The network 104 is a medium to provide a communication link between the photographing

apparatuses

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The photographing

devices

101, 102, 103 may interact with the server 105 through the network 104 to receive or transmit messages or the like. The photographing

apparatuses

101, 102, and 103 may be hardware or software. When the photographing

apparatuses

101, 102, 103 are hardware, they may be various electronic apparatuses supporting video photographing, including, but not limited to, a video camera, a smart phone, and the like. When the photographing

apparatuses

101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may provide various services, for example, the server 105 may analyze and process data, such as videos obtained by shooting the construction site, acquired from the

shooting devices

101, 102, and 103, and generate a processing result (for example, wearing conditions of the to-be-detected helmet).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for detecting the wearing condition of the safety helmet provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for detecting the wearing condition of the safety helmet is generally disposed in the server 105.

It should be understood that the number of photographing devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of cameras, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for detecting the wear of a headgear according to the present application is shown. The method for detecting the wearing condition of the safety helmet comprises the following steps:

step 201, acquiring a video obtained by shooting a construction site.

In the present embodiment, an execution subject (for example, the server 105 shown in fig. 1) of the method for detecting the wearing condition of the helmet may acquire a video obtained by shooting a construction site from a shooting device (for example, the

shooting devices

101, 102, 103 shown in fig. 1) by a wired connection manner or a wireless connection manner. The shooting device may be various electronic devices having a video shooting function, including but not limited to a video camera, a still camera, a smart phone, and the like. Here, it is possible to dispose a camera or other photographing device at a critical position such as a high point in a construction site, an entrance of an elevator, or the like, collect a video of the construction site, and transmit the video to the execution main body in real time or periodically (for example, every 5 seconds).

Step 202, selecting a video frame from the video as a first video frame to be detected.

In this embodiment, the execution subject may select a video frame from a video as the first to-be-detected video frame. Wherein a video is typically composed of at least one frame of video. In addition, a part of video frames in the video obtained by shooting the construction site may have constructors, and the part of video frames may not have constructors. Here, the video frame where the constructor exists needs to be analyzed. Safety helmets may be present in some of the video frames where the constructor is present, and safety helmets may not be present in some of the video frames. For video frames where the constructor is present but the helmet is not present, it can be directly determined that the constructor does not wear the helmet. Here, it is necessary to analyze video frames in which both the constructor and the helmet exist to determine whether the constructor wears the helmet. The constructor to be detected can be any constructor on the construction site. The safety helmet to be detected can be a safety helmet of a constructor to be detected. The selected first video frame to be detected can be a video frame in which the constructor to be detected and the helmet to be detected exist.

As an example, if the video is sent to the execution main body in real time, it may be determined whether the constructor and the helmet to be detected exist in the currently received video frame. And if so, taking the video frame as a first video frame to be detected. And if not, continuously determining whether the constructor to be detected and the helmet to be detected exist in the next frame of video frame until determining that the video frames of the constructor to be detected and the helmet to be detected exist as the first video frame to be detected.

Step 203, inputting the first video frame to be detected to a pre-trained target detection model to obtain the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected.

In this embodiment, the executing body may input the first video frame to be detected to a pre-trained target detection model to obtain the position information of the to-be-detected constructor and the to-be-detected helmet in the first video frame to be detected. The position information of the constructor to be detected and the safety helmet to be detected can be the position information of a boundary frame containing the constructor to be detected and the position information of a boundary frame containing the safety helmet to be detected. The bounding box may be the smallest rectangular box of the first video frame to be detected containing the object. The position information of the bounding box may include upper, lower, left, and right boundary coordinates of the bounding box.

In this embodiment, the target detection model may be used to detect the position information of the constructor and the safety helmet, and represent the correspondence between the image and the position information of the constructor and the safety helmet in the image.

In some optional implementation manners of this embodiment, the target detection model may be a table that is obtained by statistically analyzing a large number of sample images and position information of sample constructors and sample safety caps in the sample images by a person skilled in the art, and stores a plurality of sample images and corresponding relationship tables of the position information of the sample constructors and the sample safety caps in the sample images. At this time, the executing body may calculate the similarity between the first video frame to be detected and each sample image in the correspondence table, and obtain the position information of the to-be-detected constructor and the to-be-detected helmet in the first video frame to be detected from the correspondence table based on the similarity calculation result. For example, a sample image with the highest similarity to the first video frame to be detected is determined, and then the position information of the sample constructor and the sample safety helmet in the sample image is found from the corresponding relation table to serve as the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected.

In some optional implementations of the present embodiment, the target detection model may be obtained by performing supervised training on an existing machine learning model (e.g., various artificial neural networks, etc.) by using various machine learning methods and training samples. The target detection model can include, but is not limited to, R-CNN, Fast R-CNN, Faster R-CNN, SSD, YOLO, and the like, and is obtained by training through the following steps:

first, a set of training samples is obtained.

Wherein each training sample in the set of training samples may include a sample image and location information of a sample constructor and a sample helmet in the sample image. Here, a person skilled in the art can analyze the sample image to obtain location information of the sample constructor and the sample helmet in the sample image. For example, the corresponding bounding boxes may be manually marked in the area where the sample constructor and the sample helmet are located in the sample image, so as to obtain the position information of the bounding box containing the sample constructor and the position information of the bounding box containing the sample helmet in the sample image.

And secondly, taking the sample images in the training samples in the training sample set as input, taking the position information of the sample constructors and the sample safety caps in the input sample images as output, and training to obtain a target detection model.

Here, the initial target detection model may be trained using a training sample set, thereby obtaining a target detection model for detecting position information of a worker and a helmet. Where the initial target detection model may be an untrained or an untrained completed target detection model (e.g., SSD). Here, for the untrained object detection model, its respective parameters (e.g., weight parameters and bias parameters) are initialized with some different small random numbers. The small random number is used for ensuring that the model does not enter a saturation state due to overlarge weight value, so that training fails, and the difference is used for ensuring that the model can be normally learned. For an untrained target detection model, the parameters may be adjusted parameters, but the detection effect of the target detection model generally does not meet the preset constraint condition.

And 204, generating wearing conditions of the to-be-detected safety helmet based on the to-be-detected constructor and the to-be-detected safety helmet position information in the first to-be-detected video frame.

In this embodiment, the execution main body may analyze the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame, so as to generate the wearing condition of the to-be-detected helmet. As an example, the execution main body may calculate the relative position information between the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame based on the position information between the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame; and determining the wearing condition of the safety helmet to be detected according to the relative position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected. For example, if the boundary frame containing the to-be-detected safety helmet is located above the boundary frame containing the to-be-detected constructor and at least partially overlapped with the boundary frame containing the to-be-detected constructor, determining that the to-be-detected constructor wears the to-be-detected safety helmet; otherwise, determining that the person to be detected does not wear the safety helmet to be detected.

According to the method for detecting the wearing condition of the safety helmet, a first video frame to be detected is selected from videos obtained by shooting a construction site; then inputting the first video frame to be detected into the target detection model to obtain the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected; and finally, analyzing the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame to determine the wearing condition of the to-be-detected safety helmet. The wearing condition of the safety helmet to be detected is determined based on the position information of the constructor to be detected and the safety helmet to be detected, which are detected by the target detection model, so that the detection accuracy of the wearing condition of the safety helmet is improved.

With further reference to fig. 3, a flow 300 of yet another embodiment of a method for detecting the wear of a headgear according to the present application is shown. The method for detecting the wearing condition of the safety helmet comprises the following steps:

and 301, acquiring a video obtained by shooting a construction site.

Step 302, selecting a video frame from the video as a first video frame to be detected.

In the present embodiment, the specific operations of step 301-.

Step 303, inputting the first video frame to be detected to the deep neural network to obtain a first feature map corresponding to the first video frame to be detected.

In this embodiment, the executing body for detecting the wearing condition of the safety helmet may input the first video frame to be detected to a Deep Neural Network (DNN) so as to convert the first video frame to be detected into a more expressive and abstract first Feature Map (Feature Map) through a series of operations such as convolution and pooling.

Here, the object detection model may include an underlying network and a plurality of feature extraction layers, and is generally constructed by adding a plurality of feature extraction layers after the underlying network. Wherein the underlying network may be a deep neural network. A deep neural network may be used to extract the first feature map. The feature extraction layer may be a convolutional layer. The convolutional layer may be used to extract a second feature map.

However, the conventional deep neural network has a large number of parameters due to a deep network structure, so that a dedicated server needs to be adapted. However, the video monitoring scene is difficult to adapt to a server required by a complex deep neural network, so that a target detection model with smaller scale and higher speed can be constructed by adopting a miniaturized deep neural network (such as a MobileNet). For example, the target detection model may be constructed by adding convolutional layer sets after MobileNet. Wherein, the sizes of the convolutional layers in the convolutional layer set are gradually reduced from near to far according to the distance MobileNet.

In practice, computation of deep neural networks occurs mostly at convolutional layers. The input is of size D _F ×D _F Feature graph F of x M, output is D _F ×D _F Characteristic graph G of xn. Wherein D is _F The width and height of the feature map F and the feature map G are indicated, M is the number of channels (input depth) of the feature map F, and N is the number of channels (output depth) of the feature map G.

As shown in fig. 4A, a schematic of a standard convolutional layer is shown. The convolution kernel of the standard convolution layer has a size of D _K ×D _K XMxN. Wherein D is _K Is the height and width of the convolution kernel. A standard convolution layer meter because convolution computation requires dot product computation using the convolution kernel to check every position in the input feature mapThe calculated amount is as follows: d _K ×D _K ×M×N×D _F ×D _F 。

MobileNet can decompose a standard convolutional layer into a depth convolutional layer and a point convolutional layer.

As shown in fig. 4B, which shows a schematic view of the depth convolution layer. The size of convolution kernel of the depth convolution layer is D _K ×D _K And (4) x M. The amount of computation for one depth convolution layer is: d _K ×D _K ×M×D _F ×D _F 。

As shown in fig. 4C, a schematic of a point convolution layer is shown. The convolution kernel size of the dot convolution layer is M × N × 1 × 1. The amount of computation for one point convolution layer is: MXN X D _F ×D _F . The total calculation for MobileNet is therefore: d _K ×D _K ×M×D _F ×D _F +M×N×D _F ×D _F 。

Thus, the ratio of the calculated quantities between the standard convolutional layer and the MobileNet is:

typically, MobileNet uses a 3 × 3 convolution kernel, so MobileNet has an 8 to 9 fold improvement over standard convolution layers, but only a small drop in accuracy.

Step 304, inputting the first feature map corresponding to the first video frame to be detected into the convolutional layer set to obtain a second feature map output from the convolutional layer in the convolutional layer set, and generating a second feature map set.

In this embodiment, the execution subject may input a first feature map corresponding to a first video frame to be detected into the convolutional layer set, and the first feature map is subjected to a series of convolution operations to obtain a second feature map output from a convolutional layer in the convolutional layer set, so as to generate a second feature map set. For example, the execution subject may obtain a second feature map output from each convolutional layer in the set of convolutional layers to generate a second feature map set. For another example, the execution subject may obtain a second signature graph output from a portion of the convolutional layers in the convolutional layer set to generate a second signature graph set.

And 305, performing convolution on the first characteristic diagram and the second characteristic diagram in the second characteristic diagram set by using convolution check, and generating position information of the constructor to be detected and the helmet to be detected corresponding to the first characteristic diagram and position information of the constructor to be detected and the helmet to be detected corresponding to the second characteristic diagram in the second characteristic diagram set based on a convolution result.

In this embodiment, the execution main body may analyze the first feature map and the second feature map set to generate the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame. Specifically, the execution subject may convolve the first feature map with a convolution kernel (e.g., a convolution kernel of 3 × 3) to obtain a convolution result; and performing classification operation (such as confidence) and regression operation (such as localization) on the convolution result to generate position information of the constructor to be detected and the helmet to be detected, which correspond to the first characteristic diagram. The classification operation can determine the position information of the constructor to be detected and the safety helmet to be detected. Meanwhile, the executing main body can also utilize a convolution core to perform convolution on each second feature map in the second feature map set to obtain a convolution result; and performing classification operation and regression operation on the convolution result to generate position information of the constructor to be detected and the helmet to be detected corresponding to each second characteristic diagram.

And step 306, combining the position information of the constructor to be detected and the safety helmet to be detected corresponding to the first characteristic diagram and the position information of the constructor to be detected and the safety helmet to be detected corresponding to the second characteristic diagram in the second characteristic diagram set to generate the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected.

In this embodiment, the execution main body may combine the position information of the to-be-detected constructor and the to-be-detected helmet corresponding to the first feature diagram and the position information of the to-be-detected constructor and the to-be-detected helmet corresponding to the second feature diagram in the second feature diagram set, so as to generate the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame. It should be understood that the combination here is a channel combination, not a numerical addition.

As an example, a Non-Maximum Suppression (Non-Maximum Suppression) algorithm may be used to merge the bounding box corresponding to the first feature map and the bounding box corresponding to each second feature map in the second feature map set. The merging process is an iteration, traversal and elimination process. Specifically, the merging may be performed by:

first, the scores of all bounding boxes are sorted, and the highest score and the bounding box corresponding to the highest score are selected.

And then traversing the rest bounding boxes, and deleting the bounding box if the overlapping area of the bounding box with the current highest score is larger than a certain threshold.

Finally, one bounding box with the highest score is continuously selected from the unprocessed bounding boxes, and the steps are repeated.

As shown in fig. 4D, a schematic diagram of the object detection model is shown. The target detection model is constructed by adding 3 convolutional layers after MobileNet 411. The 3 convolutional layers are convolutional layer 421, convolutional layer 422, and convolutional layer 423 in this order from the near to the far from MobileNet 411. The video frame 431 is input to the MobileNet411, and a first feature map 441 is output after a series of operations such as convolution and pooling. The first feature map 441 is input to the convolution layer 421, and the second feature map 451 is output through a convolution operation. The second signature 451 is input to the convolution layer 422, and is subjected to convolution operation to output a second signature 452. The second feature map 452 is input to the convolution layer 423, and subjected to convolution operation, and the second feature map 453 is output. The first feature map 441, the second feature map 451, the second feature map 452, and the second feature map 453 are input to the classification detection network 461, and the position information 471 corresponding to the first feature map 441, the position information 472 corresponding to the second feature map 451, the position information 473 corresponding to the second feature map 452, and the position information 474 corresponding to the second feature map 453 are obtained. The position information 471, 472, 473, 474 is input to the non-maximum suppression target detection network 481, and the position information 491 is output.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the method for detecting the wearing condition of the safety helmet in the present embodiment highlights the step of processing the first video frame to be detected by using the deep neural network and the convolutional layer set. Therefore, the position information of the constructor to be detected and the safety helmet to be detected in the scheme described in the embodiment is subjected to multi-scale detection, and the detection accuracy of the wearing condition of the safety helmet is further improved. Meanwhile, under the condition that the MobileNet is used as a basic network of the target detection model to extract the first characteristic diagram, the calculation amount for extracting the first characteristic diagram can be reduced, and the detection speed of the wearing condition of the safety helmet is improved.

With further reference to fig. 5, a flow 500 of yet another embodiment of a method for detecting the wear of a piece of headgear according to the present application is shown. The method for detecting the wearing condition of the safety helmet comprises the following steps:

step 501, obtaining a video obtained by shooting a construction site.

Step 502, selecting a video frame from a video as a first video frame to be detected.

Step 503, inputting the first video frame to be detected to a pre-trained target detection model, and obtaining the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected.

Step 504, determining whether the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected meets a preset condition.

In the present embodiment, an execution subject (e.g., the server 105 shown in fig. 1) of the method for detecting the wearing condition of the safety helmet may determine whether the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame satisfies a preset condition; if the preset condition is satisfied, go to step 505'; if the predetermined condition is not satisfied, go to step 505. The preset conditions may be various conditions preset according to requirements.

In some optional implementations of the embodiment, in the case that the position information includes an upper boundary coordinate, a lower boundary coordinate, a left boundary coordinate, and a right boundary coordinate of the bounding box, the preset condition may include, but is not limited to, at least one of the following:

1. and the left boundary coordinate of the boundary frame of the safety helmet to be detected is larger than the left boundary coordinate of the boundary frame of the constructor to be detected.

2. And the right boundary coordinate of the boundary frame of the safety helmet to be detected is smaller than the right boundary coordinate of the boundary frame of the constructor to be detected.

3. The upper boundary coordinates of the bounding box of the helmet to be inspected are greater than a first preset value times (e.g. by a factor of more than a first preset value)

) The upper boundary coordinates of the bounding box of the constructor to be detected are multiplied by a second preset value (for example

) The difference of the lower boundary coordinates of the boundary frame of the constructor to be detected.

4. The lower boundary coordinates of the bounding box of the helmet to be inspected are less than a third predetermined number times (e.g., multiple times

) The upper boundary coordinates of the boundary frame of the constructor to be detected and a fourth preset value (for example, multiple

) And (4) summing the lower boundary coordinates of the boundary frame of the constructor to be detected.

For example, if the upper boundary coordinate of the boundary frame of the helmet to be detected is top1, the lower boundary coordinate is bottom1, the left boundary coordinate is left1, and the right boundary coordinate is right 1; the upper boundary coordinate of the boundary frame of the constructor to be detected is top2, the lower boundary coordinate is bottom2, the left boundary coordinate is left2, and the right boundary coordinate is right 2. Then, the preset condition may include, but is not limited to, at least one of:

1、left1＞left2；

2、right1＜right2；

3、

4、

and 505', generating the wearing condition of the safety helmet to be detected.

In this embodiment, the execution main body may generate a wearing condition of the safety helmet to be detected when the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected satisfy a preset condition. Here, the wearing condition of the safety helmet to be detected indicates that a constructor to be detected wears the safety helmet to be detected. At this time, the executing body may continue to return to step 503 by using the next frame of the first video frame to be detected as the first video frame to be detected until it is detected that the constructor to be detected does not wear the helmet to be detected or all the video frames in the video are detected.

And 505, selecting a video frame associated with the first video frame to be detected from the video, and generating a second video frame set to be detected.

In this embodiment, when the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame does not satisfy the preset condition, the execution main body may select a video frame associated with the first to-be-detected video frame from the video, and generate the second to-be-detected video frame set. The video frame associated with the first video frame to be detected may be a video frame in which the constructor to be detected and the helmet to be detected exist in the last frames of the first video frame to be detected, or may be a video frame in which the constructor to be detected and the helmet to be detected exist in the first frames of the first video frame to be detected.

In some optional implementation manners of this embodiment, in a case that the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame does not satisfy the preset condition, the execution main body may further store the information of the to-be-detected constructor, so as to determine whether the to-be-detected constructor and the to-be-detected helmet exist in the last several video frames or the first several video frames of the first to-be-detected video frame.

Because the same constructor has limited moving position in the continuous video frames, the bounding boxes of the same constructor have larger superposition in the continuous video frames. If the upper boundary coordinate of the boundary frame of the constructor in the first frame of video frame is top1', the lower boundary coordinate is bottom1', the left boundary coordinate is left1', and the right boundary coordinate is right 1'; the upper boundary coordinates of the constructor in the second frame video frame are top2', the lower boundary coordinates are bottom2', the left boundary coordinates are left2', and the right boundary coordinates are right 2'. Then, it is explained that the constructor in the first video frame and the constructor in the second video frame are the same person when the following conditions are satisfied:

1、min(right1',right2')＞max(left1',left2')；

2、min(bottom1',bottom2')＞max(top1',top2')。

step 506, inputting a second video frame to be detected in the second video frame set to be detected to the target detection model to obtain the position information of the constructor to be detected and the helmet to be detected in the second video frame to be detected.

In this embodiment, for each frame of the second video frame to be detected in the second video frame set to be detected, the execution main body may input the second video frame to be detected to the target detection model, so as to obtain the position information of the constructor to be detected and the helmet to be detected in the second video frame to be detected.

It should be noted that the process of the target detection model for processing the second video frame to be detected is similar to the process of the target detection model for processing the first video frame to be detected in the embodiment shown in fig. 2, and details are not repeated here.

Step 507, determining whether the position information of the constructor to be detected and the safety helmet to be detected in the second video frame to be detected meets preset conditions.

In this embodiment, the execution main body may determine whether the position information of the to-be-detected constructor and the to-be-detected helmet in the second to-be-detected video frame meets a preset condition.

It should be noted that the process of analyzing the position information of the constructor to be detected and the helmet to be detected in the second video frame to be detected is similar to the process of analyzing the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected in step 504, and details are not repeated here.

And step 508, counting the proportion of the second video frame to be detected which does not meet the preset condition.

In this embodiment, the executing entity may count the proportion of the second video frame to be detected that does not satisfy the preset condition based on the determination result in step 507.

And 509, generating the wearing condition of the helmet to be detected based on the statistical result.

In this embodiment, the execution subject may generate the wearing condition of the helmet to be detected based on the statistical result. As an example, the execution subject may compare the proportion of the second video frame to be detected that does not satisfy the preset condition with a preset proportion (e.g., 80%); if the proportion of the second to-be-detected video frames which do not meet the preset condition is not smaller than the preset proportion, indicating that the to-be-detected safety helmet is not worn by the to-be-detected constructor according to the wearing condition of the to-be-detected safety helmet; and if the proportion of the second to-be-detected video frame which does not meet the preset condition is smaller than the preset proportion, indicating the wearing condition of the to-be-detected safety helmet to a constructor to wear the to-be-detected safety helmet.

Generally, when all the second video frames to be detected in the second video frame set to be detected do not meet the preset condition, it can be considered that the constructor to be detected does not wear the safety helmet to be detected. Therefore, the preset ratio is usually set to 100%.

In some optional implementation manners of the embodiment, an alarm instruction is sent to the alarm device when the to-be-detected safety helmet is not worn by a to-be-detected constructor. At this time, the alarm device may send out alarm information.

As can be seen from fig. 5, compared with the embodiment corresponding to fig. 2, the flow 500 of the method for detecting the wearing condition of the helmet in the present embodiment highlights the step of generating the wearing condition of the helmet to be detected. Therefore, the wearing condition of the safety helmet is generated by combining the video context information in the scheme described in the embodiment, and the detection accuracy of the wearing condition of the safety helmet is further improved. The phenomenon that the detection result is inaccurate due to the fact that the single-frame video frame is affected by external factors such as illumination, shadow and shielding is avoided.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for detecting a wearing condition of a helmet, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 6, the apparatus 600 for detecting the wearing condition of the helmet of the present embodiment may include: the device comprises an acquisition unit 601, a selection unit 602, a detection unit 603 and a generation unit 604. The acquisition unit 601 is configured to acquire a video obtained by shooting a construction site; a selecting unit 602 configured to select a video frame from the video as a first video frame to be detected, where the first video frame to be detected includes a constructor to be detected and a helmet to be detected; a detecting unit 603 configured to input the first video frame to be detected to a pre-trained target detection model, so as to obtain position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected, wherein the target detection model is used for detecting the position information of the constructor and the helmet; the generating unit 604 is configured to generate wearing conditions of the to-be-detected helmet based on the to-be-detected constructor and the to-be-detected helmet position information in the first to-be-detected video frame.

In the present embodiment, in the apparatus 600 for detecting the wearing condition of a helmet: for specific processing of the obtaining unit 601, the selecting unit 602, the detecting unit 603, and the generating unit 604 and technical effects thereof, reference may be made to relevant descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, and details are not repeated here.

In some optional implementations of this embodiment, the target detection model may include a deep neural network and a set of convolutional layers; and the detection unit 603 may include: a first detection subunit (not shown in the figure), configured to input the first video frame to be detected to the deep neural network, so as to obtain a first feature map corresponding to the first video frame to be detected; a second detection subunit (not shown in the figure), configured to input the first feature map corresponding to the first video frame to be detected into the convolutional layer set, obtain a second feature map output from a convolutional layer in the convolutional layer set, and generate a second feature map set; and the first generating subunit (not shown in the figures) is configured to generate the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected based on the first feature map and the second feature map set.

In some optional implementations of this embodiment, the first generating subunit may include: a convolution module (not shown in the figure) configured to perform convolution on the first feature map and the second feature map in the second feature map set by using a convolution kernel, and generate position information of the constructor to be detected and the helmet to be detected corresponding to the first feature map and position information of the constructor to be detected and the helmet to be detected corresponding to the second feature map in the second feature map set based on a convolution result; and the merging module (not shown in the figure) is configured to merge the position information of the to-be-detected constructor and the to-be-detected helmet corresponding to the first characteristic diagram and the position information of the to-be-detected constructor and the to-be-detected helmet corresponding to the second characteristic diagram in the second characteristic diagram set, so as to generate the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame.

In some optional implementations of this embodiment, the target detection model may be obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises a sample image and position information of a sample constructor and a sample safety helmet in the sample image; and taking the sample images in the training samples in the training sample set as input, taking the position information of sample constructors and sample safety caps in the input sample images as output, and training to obtain a target detection model.

In some optional implementations of this embodiment, the generating unit 604 may include: a first determining subunit (not shown in the figures) configured to determine whether the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame satisfies a preset condition; a second generating subunit (not shown in the figures) configured to generate a wearing condition of the to-be-detected helmet in response to determining that the preset condition is satisfied, wherein the wearing condition of the to-be-detected helmet indicates that the to-be-detected worker wears the to-be-detected helmet.

In some optional implementations of this embodiment, the generating unit 604 may further include: a third generating subunit (not shown in the figures) configured to, in response to determining that the preset condition is not satisfied, select a video frame associated with the first video frame to be detected from the video, and generate a second set of video frames to be detected, where the second video frame to be detected includes a constructor to be detected and a helmet to be detected; a second determining subunit (not shown in the figure), configured to, for a second video frame to be detected in the second set of video frames to be detected, input the second video frame to be detected to the target detection model, and obtain position information of the constructor to be detected and the helmet to be detected in the second video frame to be detected; determining whether the position information of the to-be-detected constructor and the to-be-detected safety helmet in the second to-be-detected video frame meets a preset condition; a statistics subunit (not shown in the figure) configured to count a proportion of the second video frame to be detected that does not satisfy a preset condition; and a fourth generation subunit (not shown in the figure) configured to generate the wearing condition of the helmet to be detected based on the statistical result.

In some optional implementations of this embodiment, the fourth generating subunit may be further configured to: comparing the proportion of the second video frame to be detected which does not meet the preset condition with the preset proportion; if the proportion of the second to-be-detected video frames which do not meet the preset condition is not smaller than the preset proportion, indicating that the to-be-detected safety helmet is not worn by the to-be-detected constructor according to the wearing condition of the to-be-detected safety helmet; and if the proportion of the second to-be-detected video frame which does not meet the preset condition is smaller than the preset proportion, indicating the wearing condition of the to-be-detected safety helmet to a constructor to wear the to-be-detected safety helmet.

In some optional implementations of this embodiment, the position information may include an upper boundary coordinate, a lower boundary coordinate, a left boundary coordinate, and a right boundary coordinate of the bounding box; and the preset conditions may include, but are not limited to, at least one of: the left boundary coordinate of the boundary frame of the safety helmet to be detected is larger than the left boundary coordinate of the boundary frame of the constructor to be detected; the right boundary coordinate of the boundary frame of the safety helmet to be detected is smaller than the right boundary coordinate of the boundary frame of the constructor to be detected; the difference between the upper boundary coordinate of the boundary frame of the to-be-detected construction worker, of which the upper boundary coordinate is larger than the first preset numerical value times, and the lower boundary coordinate of the boundary frame of the to-be-detected construction worker, of which the second preset numerical value times; and the lower boundary coordinate of the boundary frame of the to-be-detected safety helmet is smaller than the sum of the upper boundary coordinate of the boundary frame of the to-be-detected constructor with the third preset numerical value times and the lower boundary coordinate of the boundary frame of the to-be-detected constructor with the fourth preset numerical value times.

Referring now to FIG. 7, a block diagram of a computer system 700 suitable for use in implementing a server (e.g., server 105 shown in FIG. 1) according to embodiments of the present application is shown. The server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that the computer program read out therefrom is mounted in the storage section 708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a selection unit, a detection unit, and a generation unit. The names of the units do not in some cases constitute a limitation on the units themselves, and for example, the acquisition unit may also be described as a "unit that acquires a video obtained by shooting a construction site".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the server described in the above embodiments; or may exist separately and not be assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring a video obtained by shooting a construction site; selecting a video frame from the video as a first video frame to be detected, wherein the first video frame to be detected comprises a constructor to be detected and a safety helmet to be detected; inputting a first video frame to be detected into a pre-trained target detection model to obtain position information of a constructor to be detected and a safety helmet to be detected in the first video frame to be detected, wherein the target detection model is used for detecting the position information of the constructor and the safety helmet; and generating the wearing condition of the to-be-detected safety helmet based on the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame.

The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for detecting the wear of a safety helmet, comprising:

acquiring a real-time video obtained by shooting a construction site;

selecting a video frame from the video as a first video frame to be detected, wherein the first video frame to be detected comprises a constructor to be detected and a safety helmet to be detected;

inputting the first video frame to be detected to a pre-trained target detection model to obtain position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected, wherein the target detection model is used for detecting the position information of the constructor and the helmet, the target detection model comprises a miniaturized deep neural network and a convolutional layer set, the deep neural network is used for extracting a first feature map, the convolutional layer set is used for extracting a second feature map, and the first feature map is subjected to a series of convolution operations to obtain the second feature map;

generating wearing conditions of the to-be-detected safety helmet based on the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame and the position information of the to-be-detected constructor and the to-be-detected safety helmet in the second to-be-detected video frame associated with the first to-be-detected video frame; and determining the boundary frame coordinates of the constructors in the second video frame to be detected based on the first video frame to be detected and the constructors in the second video frame to be detected.

2. The method of claim 1, wherein the inputting the first video frame to be detected into a pre-trained target detection model to obtain the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected comprises:

inputting the first video frame to be detected to the deep neural network to obtain a first feature map corresponding to the first video frame to be detected;

inputting a first feature map corresponding to the first video frame to be detected into the convolutional layer set to obtain a second feature map output from a convolutional layer in the convolutional layer set, and generating a second feature map set;

and generating the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected based on the first feature map and the second feature map set.

3. The method according to claim 2, wherein the generating of the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame based on the first feature map and the second feature map set comprises:

performing convolution on the first feature diagram and a second feature diagram in the second feature diagram set by using convolution kernel, and generating position information of the constructor to be detected and the helmet to be detected, which correspond to the first feature diagram, and position information of the constructor to be detected and the helmet to be detected, which correspond to the second feature diagram in the second feature diagram set, based on a convolution result;

and combining the position information of the constructor to be detected and the helmet to be detected corresponding to the first characteristic diagram and the position information of the constructor to be detected and the helmet to be detected corresponding to the second characteristic diagram in the second characteristic diagram set to generate the position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected.

4. The method according to one of claims 1 to 3, wherein the object detection model is trained by:

acquiring a training sample set, wherein the training sample comprises a sample image and position information of a sample constructor and a sample safety helmet in the sample image;

and taking the sample image in the training sample set as input, taking the position information of the sample constructor and the sample safety helmet in the input sample image as output, and training to obtain the target detection model.

5. The method according to claim 1, wherein the generating of the wearing condition of the to-be-detected helmet based on the to-be-detected constructor and the position information of the to-be-detected helmet in the first to-be-detected video frame comprises:

determining whether the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame meets a preset condition;

and responding to the situation that the preset condition is met, and generating the wearing situation of the safety helmet to be detected, wherein the wearing situation of the safety helmet to be detected indicates the constructor to be detected to wear the safety helmet to be detected.

6. The method according to claim 5, wherein the generating of the wearing condition of the safety helmet to be detected based on the position information of the constructor to be detected and the safety helmet to be detected in the first video frame to be detected further comprises:

in response to the fact that the preset condition is not met, selecting a video frame associated with the first video frame to be detected from the video, and generating a second video frame set to be detected, wherein the second video frame to be detected comprises the constructor to be detected and the safety helmet to be detected;

inputting a second video frame to be detected in the second video frame set to be detected to the target detection model to obtain position information of the constructor to be detected and the safety helmet to be detected in the second video frame to be detected; determining whether the position information of the to-be-detected constructor and the to-be-detected safety helmet in the second to-be-detected video frame meets the preset condition;

counting the proportion of a second video frame to be detected which does not meet the preset condition;

and generating the wearing condition of the safety helmet to be detected based on the statistical result.

7. The method of claim 6, wherein the generating the wearing condition of the helmet to be detected based on the statistical result comprises:

comparing the proportion of the second video frame to be detected which does not meet the preset condition with a preset proportion;

if the proportion of the second to-be-detected video frames which do not meet the preset condition is not smaller than the preset proportion, indicating that the to-be-detected safety helmet is not worn by the to-be-detected constructor according to the wearing condition of the to-be-detected safety helmet;

and if the proportion of the second to-be-detected video frame which does not meet the preset condition is smaller than the preset proportion, indicating the to-be-detected safety helmet to be worn by the to-be-detected constructor according to the wearing condition of the to-be-detected safety helmet.

8. The method of one of claims 5-7, wherein the location information comprises upper, lower, left, and right boundary coordinates of the bounding box; and

the preset condition comprises at least one of the following conditions:

the left boundary coordinate of the boundary frame of the safety helmet to be detected is larger than the left boundary coordinate of the boundary frame of the constructor to be detected;

the right boundary coordinates of the boundary frame of the safety helmet to be detected are smaller than the right boundary coordinates of the boundary frame of the constructor to be detected;

the difference between the upper boundary coordinate of the boundary frame of the to-be-detected construction worker, which is larger than the first preset numerical value times, of the upper boundary coordinate of the boundary frame of the to-be-detected safety helmet and the lower boundary coordinate of the boundary frame of the to-be-detected construction worker, which is larger than the second preset numerical value times;

and the lower boundary coordinate of the boundary frame of the safety helmet to be detected is smaller than the sum of the upper boundary coordinate of the boundary frame of the constructor to be detected, which is multiplied by a third preset numerical value, and the lower boundary coordinate of the boundary frame of the constructor to be detected, which is multiplied by a fourth preset numerical value.

9. An apparatus for detecting a wearing condition of a helmet, comprising:

an acquisition unit configured to acquire a real-time video obtained by shooting a construction site;

the selecting unit is configured to select a video frame from the video as a first video frame to be detected, wherein the first video frame to be detected comprises a constructor to be detected and a safety helmet to be detected;

the detection unit is configured to input the first video frame to be detected into a pre-trained target detection model to obtain position information of the constructor to be detected and the helmet to be detected in the first video frame to be detected, wherein the target detection model is used for detecting the position information of the constructor and the helmet to be detected, the target detection model comprises a miniaturized deep neural network and a convolutional layer set, the deep neural network is used for extracting a first feature map, the convolutional layer set is used for extracting a second feature map, and the first feature map is subjected to a series of convolution operations to obtain the second feature map;

the generating unit is configured to generate wearing conditions of the to-be-detected safety helmet based on the position information of the to-be-detected constructor and the to-be-detected safety helmet in the first to-be-detected video frame and the position information of the to-be-detected constructor and the to-be-detected safety helmet in the second to-be-detected video frame associated with the first to-be-detected video frame; and determining the boundary frame coordinates of the constructors in the second video frame to be detected based on the first video frame to be detected and the constructors in the second video frame to be detected.

10. The apparatus of claim 9, wherein the detection unit comprises:

the first detection subunit is configured to input the first video frame to be detected to the deep neural network, so as to obtain a first feature map corresponding to the first video frame to be detected;

a second detection subunit, configured to input a first feature map corresponding to the first video frame to be detected into the convolutional layer set, obtain a second feature map output from a convolutional layer in the convolutional layer set, and generate a second feature map set;

a first generating subunit configured to generate position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame based on the first feature map and the second feature map set.

11. The apparatus of claim 10, wherein the first generating subunit comprises:

the convolution module is configured to perform convolution on the first feature map and a second feature map in the second feature map set by utilizing convolution kernel, and generate position information of the constructor to be detected and the safety helmet to be detected, which correspond to the first feature map, and position information of the constructor to be detected and the safety helmet to be detected, which correspond to the second feature map in the second feature map set, based on a convolution result;

the merging module is configured to merge the position information of the to-be-detected constructor and the to-be-detected helmet corresponding to the first feature map and the position information of the to-be-detected constructor and the to-be-detected helmet corresponding to a second feature map in the second feature map set, so as to generate the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame.

12. The apparatus of one of claims 9-11, wherein the object detection model is trained by:

13. The apparatus of claim 9, wherein the generating unit comprises:

a first determining subunit configured to determine whether the position information of the to-be-detected constructor and the to-be-detected helmet in the first to-be-detected video frame satisfies a preset condition;

the second generation subunit is configured to generate the wearing condition of the to-be-detected safety helmet in response to determining that the preset condition is met, wherein the wearing condition of the to-be-detected safety helmet indicates that the to-be-detected constructor wears the to-be-detected safety helmet.

14. The apparatus of claim 13, wherein the generating unit further comprises:

a third generating subunit, configured to select, in response to determining that the preset condition is not satisfied, a video frame associated with the first video frame to be detected from the video, and generate a second set of video frames to be detected, where the second video frame to be detected includes the constructor to be detected and the helmet to be detected;

a second determining subunit, configured to, for a second video frame to be detected in the second set of video frames to be detected, input the second video frame to be detected to the target detection model, so as to obtain position information of the to-be-detected constructor and the to-be-detected helmet in the second video frame to be detected; determining whether the position information of the to-be-detected constructor and the to-be-detected safety helmet in the second to-be-detected video frame meets the preset condition;

a statistic subunit configured to count a proportion of a second video frame to be detected that does not satisfy the preset condition;

and the fourth generation subunit is configured to generate the wearing condition of the helmet to be detected based on the statistical result.

15. The apparatus of claim 14, wherein the fourth generation subunit is further configured to:

if the proportion of the second to-be-detected video frames which do not meet the preset condition is smaller than the preset proportion, the wearing condition of the to-be-detected safety helmet indicates that the to-be-detected constructor wears the to-be-detected safety helmet.

16. The apparatus of one of claims 13-15, wherein the location information comprises upper, lower, left, and right boundary coordinates of the bounding box; and

the preset condition comprises at least one of the following conditions:

the left boundary coordinate of the boundary frame of the to-be-detected safety helmet is larger than the left boundary coordinate of the boundary frame of the to-be-detected constructor;

the right boundary coordinate of the boundary frame of the to-be-detected safety helmet is smaller than the right boundary coordinate of the boundary frame of the to-be-detected constructor;

17. A server, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

18. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.