CN111626276A

CN111626276A - Two-stage neural network-based work shoe wearing detection method and device

Info

Publication number: CN111626276A
Application number: CN202010750662.6A
Authority: CN
Inventors: 张逸; 徐晓刚; 王军; 徐芬; 张文广; 何鹏飞
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2020-09-04
Anticipated expiration: 2040-07-30
Also published as: CN111626276B

Abstract

The invention discloses a method and a device for detecting wearing of a pair of industrial shoes based on a two-stage neural network, wherein the method comprises the following steps: acquiring a picture data set of a monitoring video; marking the shoe target and the human body target contained in the picture data set to obtain a marked data set; constructing a two-stage neural network model, wherein the two-stage neural network model is formed by cascading a first-stage human body detection network model and a second-stage shoe detection network model, and the input of the second-stage shoe detection network model is the output of the first-stage human body detection network model; inputting the picture to be detected into the two-stage neural network model, and outputting the position of the human body frame, the offset of the shoe position relative to the human body frame and the confidence coefficient of the wearer shoe; and calculating the position of the shoe according to the position of the human body frame and the offset of the shoe position relative to the human body frame, and judging whether the worker shoes are worn or not according to the confidence coefficient. The method solves the problem of low detection recall rate caused by small shoe target of the staff in the video, and can be used for detection of wearing of the staff shoes in a factory.

Description

Two-stage neural network-based work shoe wearing detection method and device

Technical Field

The invention belongs to the technical field of artificial intelligence and computer vision, and particularly relates to a method and a device for detecting wearing of an industrial shoe based on a two-stage neural network.

Background

With the development of intelligence, the safety of production and living has become the focus and the demand of people for increasing attention. Cameras have been installed in industrial production sites and in many corners of cities, creating good objective conditions for automated monitoring using computer vision techniques.

In industrial production, human body accessories are used as key parts of human bodies and are often used as target objects for detection. Particularly, in an industrial production field, the wearing of the safety helmet can greatly reduce the occurrence of personal injury accidents, so that the wearing detection of the safety helmet at the head position needs to utilize a human head detection technology; in addition, in the mill, the wearing of worker's shoes can prevent to a great extent all kinds of dangerous circumstances that the sole skidded and arouses, consequently, need utilize the target detection technique to detect shoes to the wearing of worker's shoes.

In recent years, with the development of deep learning technology, the performance of target detection technology has been greatly improved, and target detection technology represented by Yolo has been widely applied in the industry, wherein the application of human body detection technology is becoming mature. However, the detection of small targets is one of the difficulties of the target detection technology, and because of the relatively high installation positions of factory buildings and urban road cameras, the human body accessory target has the characteristic of small target in the video image, so that the detection of human body accessories such as shoes directly on the video image often fails to obtain an ideal recall rate, and a human body accessory detection technology with high recall rate, high accuracy and limited calculation overhead is urgently needed to be provided.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a device for detecting wearing of a worker shoe based on a two-stage neural network, so as to solve the problem that an ideal recall rate cannot be obtained when human body accessories such as shoes are detected in a video image.

In order to achieve the above object, the techniques adopted in the embodiments of the present invention are as follows:

in a first aspect, an embodiment of the present invention provides a two-stage neural network-based method for detecting wearing of an industrial shoe, including:

acquiring a picture data set of a monitoring video;

marking the shoe target and the human body target contained in the picture data set to obtain a marked data set;

constructing a two-stage neural network model, wherein the two-stage neural network model is formed by cascading a first-stage human body detection network model and a second-stage shoe detection network model, the first-stage human body detection network model is obtained by training a human body detection network by a human body frame part in the labeled data set, the second-stage shoe detection network model is obtained by jointly training a second-stage shoe detection network by a human body frame, a shoe frame and a shoe category part in the labeled data set, and the input of the second-stage shoe detection network model is the output of the first-stage human body detection network model;

inputting the picture to be detected into the two-stage neural network model, and outputting the position of the human body frame, the offset of the shoe position relative to the human body frame and the confidence coefficient of the wearer shoe;

and calculating the position of the shoe according to the position of the human body frame and the offset of the position of the shoe relative to the human body frame, and judging whether the shoe is worn or not by combining the result of the comparison between the confidence coefficient of the shoe worn by the multi-frame picture and the threshold value.

Further, acquiring a picture data set of the surveillance video, comprising:

and acquiring a monitoring video, and performing picture segmentation on the monitoring video to obtain a picture data set.

Further, labeling the shoe targets and the human body targets contained in the picture data set includes:

and marking the wearing worker shoe personnel and the non-wearing worker shoe personnel contained in the picture data set, and respectively marking the human body position, the shoe position and the shoe category of each personnel. As the shoes are possibly shielded from each other and at least one shoe is visible, only one shoe is detected for simplifying the problem, the position of one completely visible shoe is marked, the category of the shoe is the work shoe, the category is marked as 1, and otherwise, the category is marked as 0.

Further, the first-level human detection network model is obtained by training a human detection network by a human frame part in the labeled data set, and includes:

and training a human body detection network by adopting the human body frame part in the labeled data set to obtain a primary human body detection network model, wherein a branch is led out before a last convolution module of the primary human body detection network, and the detection results of the characteristic diagram and the primary human body detection network are output together.

Further, training a secondary shoe detection network together with the human body frame, the shoe frame and the shoe category part in the labeled data set comprises:

step S3.1: the setting of the secondary shoe detection network structure comprises the following substeps:

step S3.1.1: the characteristic diagram and the human body detection frame output by the primary human body detection network model are used as the input of a secondary shoe detection network, wherein the dimension of the characteristic diagram is

WhereinNIs composed ofbatchThe number of the first and second groups is,Cthe number of the channels is the number of the channels,H,Wrespectively the height and width of the input image after network downsampling; the dimension of the human body detection frame is

Wherein

Representing the number of human body detection frames corresponding to a single image; base ofOutputting a characteristic diagram in a first-level human body detection network, performing pooling treatment on the characteristics of each human body detection frame area to obtain dimensionality of

Wherein the width and height of the pooled features are used

Represents;

step S3.1.2: unfolding the obtained features so that the dimensions become

；

Step S3.1.3: sequentially sending the characteristics into two layers of full-connection layers, wherein the two layers of full-connection layers are respectively connected with one another after trainingdrop_outLayer to avoid network overfitting;

step S3.1.4: sending the characteristics into a full-connection layer with the neuron number of 5 to obtain five-dimensional output;

step S3.1.5: normalizing the five-dimensional output to obtain a five-dimensional predicted value (P_bias0,P_bias1,P_ bias2,P_bias3,P_label)，P_bias0The predicted value of the ratio of the deviation of the shoe center point relative to the left lower corner of the human body frame in the direction of the transverse axis to the width of the human body frame is obtained;P_bias1the predicted value of the ratio of the deviation of the central point of the shoe relative to the lower left corner of the human body frame in the direction of the longitudinal axis to the height of the human body frame is obtained,P_bias2the predicted value of the ratio of the width of the shoes to the width of the human body frame,P_ bias3Is a predicted value of the ratio of the height of the shoes to the height of the human body frame,P_labela confidence prediction value for the wearer shoe;

step S3.2: setting a loss function of a secondary shoe detection network;

position of center point of shoes: (x ₀ ,y ₀) Relative to the left lower corner position of the human body frame (x ₁ ,y ₁) Is offset fromGt_biasComprising five components (Gt_bias0,Gt_bias1,Gt_bias2,Gt_bias3,Gt_label)，Gt_bias0The ratio of the deviation of the shoe center point relative to the left lower corner of the human body frame in the direction of the transverse axis to the width of the human body frame;Gt_bias1the ratio of the deviation of the central point of the shoe relative to the left lower corner of the human body frame in the direction of the longitudinal axis to the height of the human body frame,Gt_bias2is the ratio of the width of the shoes to the width of the human body frame,Gt_bias3The shoe type label is the ratio of the height of the shoe to the height of the human body frameGt_label。(Gt_ bias0,Gt_bias1,Gt_bias2,Gt_bias3,Gt_label) As a true value in the secondary shoe detection network training process, namely a regression target, the expressions are respectively as follows:

wherein the content of the first and second substances,

the central point of the shoe deviates in the direction of the transverse axis relative to the left lower corner of the human body frame, the value is positive,

the central point of the shoe deviates in the direction of the longitudinal axis relative to the position of the lower left corner of the human body frame, the value is positive,

the width of the human body detection frame is,

the height of the human body detection frame is set,

the height of the shoe is the height of the shoe,

is the width of the shoe;

the position of the lower left corner of the human body detection frame is used as a reference point, the coordinates are set as (0,0), and meanwhile, the width and the height of the human body detection frame are set, namely

、

Are all unit 1; from this, the coordinate of the center point of the shoe corresponding to the true value is (Gt_bias0,Gt_ bias1) Wide isGt_bias2High isGt_bias3；

From this, the coordinates of the upper left corner of the shoe target frame (A) can be obtained

)：

True value of coordinates of lower right corner of shoe target frame (

)：

Similarly, the coordinates of the upper left corner of the target frame of the shoe are predicted (

)：

Predicting the coordinates of the lower right corner of the shoe target frame (

)：

Is prepared from (A)

) And (a)

) Define truth value shoe target framebox _gtFrom (a) to (

) And (a)

) Defining a predictive shoe goal boxbox _p；

Computingbox _gtAndbox _pdegree of overlap ofGIoUThen the target frame is lostLoss=1-GIoU；

The above-mentionedGIoUThe calculation process is as follows:

for thebox _gtAndbox _pfirst, find the minimum box that can enclose bothCGiven its area ofc_area；

Let area

，

Then, then

；

；

For class lossLoss_labelAnd adopting a cross entropy loss function, wherein the calculation formula is as follows:

the total loss function is:

wherein

Is the weight lost to the shoe position,

weight lost for shoe classification;

step S3.3: performing secondary shoe detection network training;

and based on the network structure in the step S3.1 and the loss function in the step S3.2, training a secondary shoe detection network together by adopting the human body frame, the shoe frame and the shoe category part in the labeled data set to obtain a secondary shoe detection network model.

Further, calculating the position of the shoe according to the position of the human body frame and the offset of the position of the shoe relative to the human body frame, and the method comprises the following steps:

obtaining a target frame of the shoe according to the position of the human body frame and the offset of the position of the shoe relative to the human body frame, wherein the coordinate of the lower left corner of the human body frame output by the primary human body detection network is (x ₁ ,y ₁) Coordinates of upper left corner of shoe: (x _t ,y _t) And the coordinates of the lower right corner: (x _b ,y _b) The calculation formula of (a) is as follows:

further, whether wearing the work shoes is judged according to the result of comparing the confidence coefficient of the wearing work shoes with the threshold value by fusing the multi-frame images, and the method comprises the following steps:

judging whether the current frame wears the worker shoes by adopting a median filtering result, and setting a filtering length parameter asxI.e. the confidence of wearing the work shoe at the current frame is frontxAnd outputting the mean value of the confidence coefficient predicted values by the frame secondary shoe detection network, and if the mean value is greater than or equal to a set threshold value, determining that the worker shoes are worn, otherwise, determining that the worker shoes are not worn. The method can make the detection result of the shoe type more robust and avoid frequent jump of the prediction result.

In a second aspect, an embodiment of the present invention further provides a two-stage neural network-based device for detecting wearing of an industrial shoe, including:

the acquisition module is used for acquiring a picture data set of the monitoring video;

the marking module is used for marking the shoe target and the human body target contained in the picture data set to obtain a marked data set;

the building module is used for building a two-stage neural network model, the two-stage neural network model is formed by cascading a first-stage human body detection network model and a second-stage shoe detection network model, the first-stage human body detection network model is obtained by training a human body detection network by a human body frame part in the labeled data set, the second-stage shoe detection network model is obtained by jointly training a second-stage shoe detection network by a human body frame, a shoe frame and a shoe category part in the labeled data set, and the input of the second-stage shoe detection network model is the output of the first-stage human body detection network model;

the output module is used for inputting the picture to be detected into the two-stage neural network model and outputting the position of the human body frame, the offset of the shoe position relative to the human body frame and the confidence coefficient of the wearing worker shoe;

and the calculation and judgment module is used for calculating the position of the shoe according to the position of the human body frame and the offset of the shoe position relative to the human body frame, and judging whether the shoe is worn according to the result of the comparison between the confidence coefficient of the shoe worn by the multi-frame picture and the threshold value.

According to the technical scheme, the invention has the beneficial effects that:

(1) because the shoe target is very small in the video image and the direct detection of the shoe recall rate is low, the invention adopts a network cascade mode to convert the detection of the shoe target into human body detection and shoe position regression and classification based on the human body detection, and the realization method is simple. Compared with the detection by adopting a primary neural network, the secondary shoe detection network in the cascade network is dedicated to detecting the positions and the types of the shoes in the human body, the difficulty of directly detecting the small target shoes in the factory monitoring video is overcome, and the recall rate and the accuracy rate of the detection are obviously improved.

(2) The application effect is obviously influenced by the reasoning time consumption of the detection algorithm in industrial application, the secondary shoe detection network only adopts the characteristics of the primary human body detection network and the detection box result as input, and large-memory copy does not exist, so the reasoning time consumption is low, and the application of an industrial scene is facilitated.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a method for detecting wearing of a pair of industrial shoes based on a two-stage neural network according to an embodiment of the present invention;

FIG. 2 is a network model structure diagram of a two-stage neural network-based method for detecting wearing of a pair of industrial shoes according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating the calculation of the offset of a shoe target frame relative to a human target frame in accordance with an embodiment of the present invention;

fig. 4 is a block diagram of a two-stage neural network-based device for detecting wearing of an industrial shoe according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example 1:

FIG. 1 is a flow chart of a method for detecting wearing of a pair of industrial shoes based on a two-stage neural network according to an embodiment of the present invention; the method for detecting the wearing of the industrial shoe based on the two-stage neural network comprises the following steps:

step S1, acquiring a picture data set of a monitoring video;

specifically, a monitoring video is obtained, picture segmentation is performed on the monitoring video, one frame of image is extracted every 200 frames, a picture data set is obtained, and each frame of image in the data set is ensured to have certain difference.

Step 2, marking the shoe target and the human body target contained in the picture data set to obtain a marked data set;

specifically, the wearer shoe personnel and the non-wearer shoe personnel contained in the picture data set are marked, and for each personnel, the human body position, the shoe position and the shoe category of each personnel are marked respectively. As the shoes are possibly shielded from each other and at least one shoe is visible, only one shoe is detected for simplifying the problem, the position of one completely visible shoe is marked, the category of the shoe is the work shoe, the category is marked as 1, and otherwise, the category is marked as 0.

Step S3, constructing a two-stage neural network model, where fig. 2 is a network model structure diagram of a two-stage neural network-based method for detecting wearing of a pair of industrial shoes according to an embodiment of the present invention, where the two-stage neural network model is formed by cascading a first-stage human body detection network model and a second-stage shoe detection network model, the first-stage human body detection network model is obtained by training a human body detection network in a human body frame part in the labeled data set, the second-stage shoe detection network model is obtained by training a second-stage shoe detection network in a human body frame, a shoe frame and a shoe category part in the labeled data set, and an input of the second-stage shoe detection network model is an output of the first-stage human body detection network model;

specifically, the first-stage human body detection network model is obtained by training a human body detection network by a human body frame part in the labeled data set, and includes:

training using the human frame portion in the labeled datasetyolov3The human body detection network obtains a first-level human body detection network model by adoptingadamIn an optimization mode, the initial learning rate is set to 0.0005,epochis set to be at a speed of 90 degrees,batchsizeset to 64. Wherein the underlying network employsdarknet53Fusing 32-time and 16-time downsampling branches in a subsequent network and then fusing with 8-time downsampling branches, finally leading out a branch before a last convolution module of a first-level human body detection network, wherein the branch is a characteristic diagram of an image subjected to 8-time downsampling, and the same-level human body detection networkAre output together.

Adopt human frame, shoes subframe and shoes classification part in the mark data set train second grade shoes detection network jointly, include:

Wherein

The number of the human body detection frames corresponding to a single image is represented, and the coordinates of the center point and the width and height dimensions of each human body detection frame are 4; outputting a characteristic diagram based on a primary human body detection network, and carrying out characteristic processing on the detection frame area of each human bodyroi- alignPerforming pooling treatment to obtain a product with dimension of

Is characterized in thatroi-alignWidth and height of pooled features

Showing, this embodiment is here

7, thereby obtaining the characteristics of each human body frame area, so that the subsequent steps can be focused on the shoe detection in the human body frame area;

step S3.1.2: unfolding the obtained features so that the dimensions become

；

Step S3.1.3: sequentially sending the characteristics into a full-connection layer network with 2048 neurons in two layers, wherein the two full-connection layers are respectively connected with one another after trainingdrop_outLayers to avoid overfitting;

step S3.1.5: for five-dimensional outputsigmoidNormalization processing to obtain five-dimensional predicted value (P_bias0,P_ bias1,P_bias2,P_bias3,P_label)，P_bias0The predicted value of the ratio of the deviation of the shoe center point relative to the left lower corner of the human body frame in the direction of the transverse axis to the width of the human body frame is obtained;P_bias1the predicted value of the ratio of the deviation of the central point of the shoe relative to the lower left corner of the human body frame in the direction of the longitudinal axis to the height of the human body frame is obtained,P_bias2the predicted value of the ratio of the width of the shoes to the width of the human body frame,P_bias3Is a predicted value of the ratio of the height of the shoes to the height of the human body frame,P_labela confidence prediction value for the wearer shoe;

step S3.2: setting a loss function of a secondary shoe detection network;

FIG. 3 is a schematic diagram illustrating the calculation of the deviation of a shoe target frame from a human target frame according to an embodiment of the present invention, wherein the shoe center point is located at (A)x ₀ ,y ₀) Relative to the left lower corner position of the human body frame (x ₁ ,y ₁) Is offset fromGt_biasComprising five components (Gt_bias0,Gt_bias1,Gt_bias2,Gt_bias3,Gt_label)，Gt_bias0The ratio of the deviation of the shoe center point relative to the left lower corner of the human body frame in the direction of the transverse axis to the width of the human body frame;Gt_bias1the ratio of the deviation of the central point of the shoe relative to the left lower corner of the human body frame in the direction of the longitudinal axis to the height of the human body frame,Gt_bias2is the ratio of the width of the shoes to the width of the human body frame,Gt_ bias3The shoe type label is the ratio of the height of the shoe to the height of the human body frameGt_label。(Gt_bias0,Gt_bias1,Gt_bias2,Gt_bias3,Gt_label) As a true value in the secondary shoe detection network training process, namely a regression target, the expressions are respectively as follows:

wherein the content of the first and second substances,

the width of the human body detection frame is,

the height of the human body detection frame is set,

the height of the shoe is the height of the shoe,

is the width of the shoe;

the left side of the human body detection frameThe lower corner position is used as a reference point, the coordinate is set as (0,0), and meanwhile, the width and the height of the human body detection frame are set, namely

、

)：

True value of coordinates of lower right corner of shoe target frame (

)：

)：

Predicting the coordinates of the lower right corner of the shoe target frame (

)：

Is prepared from (A)

) And (a)

) Define truth value shoe target framebox _gtFrom (a) to (

) And (a)

) Defining a predictive shoe goal boxbox _p；

The above-mentionedGIoUThe calculation process is as follows:

Let area

，

Then, then

；

；

the total loss function is:

wherein

Is the weight lost to the shoe position,

weight lost for shoe classification, set

The content of the organic acid is 0.6,

is 0.4.

Step S3.3: performing secondary shoe detection network training;

based on the network structure of step S3.1 and the loss function of step S3.2, training a secondary shoe detection network together by adopting the human body frame, the shoe frame and the shoe category part in the labeled data set to obtain a secondary shoe detection network model, and adopting the secondary shoe detection network modeladamIn an optimization mode, the initial learning rate is set to 0.0002,epochis set to be 50 in the above-mentioned order,batchsizeset to 64.

Step S4, inputting the picture to be detected into the two-stage neural network model, and outputting the human body frame position, the deviation of the shoe position relative to the human body frame and the confidence coefficient predicted value of the wearing worker shoe;

specifically, the video is subjected to nearest neighbor downsampling frame by frame, the longest edge is scaled to 608 pixels, and the short edge is subjected to nearest neighbor downsamplingpaddingThe size is adjusted to 608 × 608 pixels, after normalization, the first-level human body detection network model is input, corresponding features of a human body detection frame and 8-time down-sampling are obtained, the human body detection frame and the 8-time down-sampling are used as input of a second-level shoe detection network, and deviation of the shoe position relative to the human body frame and a confidence coefficient prediction value of a wearer shoe are further obtained.

And step S5, calculating the position of the shoe according to the position of the human body frame and the deviation of the shoe position relative to the human body frame, and judging whether the shoe is worn or not by combining the result of the comparison between the confidence coefficient of the shoe worn by the multi-frame picture and the threshold value.

Specifically, the method for calculating the position of the shoe according to the position of the human body frame and the offset of the position of the shoe relative to the human body frame comprises the following steps:

fuse whether wearing the work shoes of result judgement that multiframe picture wearing work shoes confidence degree and threshold value are compared includes:

and judging whether the current frame wears the worker shoes or not by adopting a median filtering result, wherein the filtering length parameter is set to be 10, namely, the confidence coefficient of the current frame wears the worker shoes is the mean value of the predicted values of the output confidence coefficients of the first 10 frames of secondary shoe detection networks, if the mean value is more than or equal to a set threshold value of 0.5, the current frame wears the worker shoes, otherwise, the current frame does not wear the worker shoes, the method can enable the detection result of the shoe type to be more robust, and frequent jumping of the prediction result is avoided.

Example 2:

referring to fig. 4, the present embodiment provides a two-stage neural network-based work shoe wearing detection apparatus, which is a virtual apparatus of the two-stage neural network-based work shoe wearing detection method provided in embodiment 1, and has corresponding functional modules and beneficial effects for executing the method, and the apparatus includes:

an obtaining module 91, configured to obtain a picture data set of a monitoring video;

the marking module 92 is used for marking the shoe target and the human body target contained in the picture data set to obtain a marked data set; a building module 93, configured to build a two-stage neural network model, where the two-stage neural network model is formed by cascading a first-stage human body detection network model and a second-stage shoe detection network model, the first-stage human body detection network model is obtained by training a human body detection network on a human body frame part in the labeled data set, the second-stage shoe detection network model is obtained by training a second-stage shoe detection network on a human body frame, a shoe frame and a shoe category part in the labeled data set, and an input of the second-stage shoe detection network model is an output of the first-stage human body detection network model;

the output module 94 is used for inputting the picture to be detected into the two-stage neural network model, and outputting the position of the human body frame, the offset of the shoe position relative to the human body frame and the confidence coefficient of the wearing worker shoe;

and the calculation and judgment module 95 is used for calculating the position of the shoe according to the position of the human body frame and the offset of the shoe position relative to the human body frame, and judging whether the shoe is worn according to the result of the comparison between the confidence coefficient of the shoe worn by the multi-frame picture and the threshold value.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described device embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for detecting wearing of a pair of industrial shoes based on a two-stage neural network is characterized by comprising the following steps:

acquiring a picture data set of a monitoring video;

2. The two-stage neural network-based method for detecting wearing of the industrial shoes according to claim 1, wherein the step of obtaining a picture data set of the monitoring video comprises the following steps:

3. The method for detecting wearing of the work shoe based on the two-stage neural network as claimed in claim 1 or 2, wherein the step of labeling the shoe target and the human body target included in the picture data set comprises the steps of:

marking the worker shoes personnel and the worker shoes personnel not wearing contained in the picture data set, respectively marking the human body position of each personnel, and only marking the position of one completely visible shoe in the personnel, wherein the shoe category is the worker shoes, the category is marked as 1, otherwise, the category is marked as 0.

4. The method for detecting wearing of a pair of industrial shoes based on a two-stage neural network as claimed in claim 1, wherein the one-stage human body detection network model is obtained by training a human body detection network in a human body frame part in the labeled data set, and comprises:

5. The method for detecting wearing of a worker shoe based on the two-stage neural network as claimed in claim 1, wherein the training of the two-stage shoe detection network by the human body frame, the shoe frame and the shoe category part in the labeled data set comprises:

Wherein

Representing the number of human body detection frames corresponding to a single image; based on a first-level human body detection network output characteristic diagram, performing pooling treatment on the characteristics of each human body detection frame area to obtain dimensionality of

Wherein the width and height of the pooled features are used

Represents;

step S3.1.2: unfolding the obtained features so that the dimensions become

；

step S3.1.5: normalizing the five-dimensional output to obtain a five-dimensional predicted value (P_bias0，P_bias1，P_ bias2，P_bias3，P_label)，P_bias0The predicted value of the ratio of the deviation of the shoe center point relative to the left lower corner of the human body frame in the direction of the transverse axis to the width of the human body frame is obtained;P_bias1the predicted value of the ratio of the deviation of the central point of the shoe relative to the lower left corner of the human body frame in the direction of the longitudinal axis to the height of the human body frame is obtained,P_bias2the predicted value of the ratio of the width of the shoes to the width of the human body frame,P_ bias3Is a predicted value of the ratio of the height of the shoes to the height of the human body frame,P_labela confidence prediction value for the wearer shoe;

step S3.2: setting a loss function of a secondary shoe detection network;

position of center point of shoes: (x ₀ ,y ₀) Relative to the left lower corner position of the human body frame (x ₁ ,y ₁) Is offset fromGt_biasComprising five components (Gt_bias0，Gt_bias1，Gt_bias2，Gt_bias3，Gt_label)，Gt_bias0The ratio of the deviation of the shoe center point relative to the left lower corner of the human body frame in the direction of the transverse axis to the width of the human body frame;Gt_bias1the ratio of the deviation of the central point of the shoe relative to the left lower corner of the human body frame in the direction of the longitudinal axis to the height of the human body frame,Gt_bias2is the ratio of the width of the shoes to the width of the human body frame,Gt_bias3The shoe type label is the ratio of the height of the shoe to the height of the human body frameGt_label；

(Gt_bias0，Gt_bias1，Gt_bias2，Gt_bias3，Gt_label) As a true value in the secondary shoe detection network training process, namely a regression target, the expressions are respectively as follows:

wherein the content of the first and second substances,

the width of the human body detection frame is,

the height of the human body detection frame is set,

the height of the shoe is the height of the shoe,

is the width of the shoe;

、

Are all unit 1; from this, true value pairs can be obtainedThe coordinate of the center point of the shoe is (Gt_bias0，Gt_ bias1) Wide isGt_bias2High isGt_bias3；

)：

True value of coordinates of lower right corner of shoe target frame (

)：

)：

Predicting the coordinates of the lower right corner of the shoe target frame (

)：

Is prepared from (A)

) And (a)

) Define truth value shoe target framebox _gtFrom (a) to (

) And (a)

) Defining a predictive shoe goal boxbox _p；

The above-mentionedGIoUThe calculation process is as follows:

Let area

，

Then, then

；

；

the total loss function is:

wherein

Is the weight lost to the shoe position,

weight lost for shoe classification;

step S3.3: performing secondary shoe detection network training;

6. The method for detecting wearing of the work shoe based on the two-stage neural network as claimed in claim 5, wherein the step of calculating the position of the work shoe according to the position of the human body frame and the deviation of the position of the work shoe relative to the human body frame comprises the following steps:

obtaining a target frame of the shoe according to the position of the human body frame and the offset of the position of the shoe relative to the human body frame, wherein, the position and the width and the height of the human body frame are output by the primary human body detection network, and the coordinate of the upper left corner of the shoe is (the coordinate of the upper left corner of the shoe)x _t ,y _t) And the coordinates of the lower right corner: (x _b ,y _b) The calculation formula of (a) is as follows:

。

7. the method for detecting wearing of a work shoe based on a two-stage neural network as claimed in claim 5, wherein the step of combining the result of comparing the confidence level of the work shoe with the threshold value to determine whether to wear the work shoe comprises the steps of:

judging whether the current frame wears the worker shoes by adopting a median filtering result, and setting a filtering length parameter asxI.e. the confidence of wearing the work shoe at the current frame is frontxAnd outputting the mean value of the confidence coefficient predicted values by the frame secondary shoe detection network, and if the mean value is greater than or equal to a set threshold value, determining that the worker shoes are worn, otherwise, determining that the worker shoes are not worn.

8. The utility model provides a detection device is dressed to worker's shoes based on two-stage neural network which characterized in that includes: