CN114170677A

CN114170677A - Network model training method and equipment for detecting smoking behavior

Info

Publication number: CN114170677A
Application number: CN202111342312.7A
Authority: CN
Inventors: 杨之乐; 杨猛; 郭媛君; 王尧; 冯伟; 吴承科
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-03-11
Also published as: WO2023082407A1

Abstract

The application provides a network model training method and equipment for detecting smoking behaviors. The network model training method comprises the following steps: establishing an initial network model; acquiring a training data set related to smoking behavior, and labeling the training data set to obtain labeling information related to the training data set; inputting the training data set and the labeling information thereof into a one-stage network model to generate a detection frame of each image in the training data set; inputting each image and the detection frame thereof in the training data set into the two-stage network model to generate a judgment result about smoking behavior in each image in the training data set; and training the first-stage network model and the second-stage network model based on the judgment result to obtain a preset network model for detecting smoking behavior. By means of the method, the network model training method can obtain the network model for detecting smoking behaviors, solves the limitation of the traditional method, and has good accuracy.

Description

Network model training method and equipment for detecting smoking behavior

Technical Field

The application relates to the technical field of image recognition, in particular to a network model training method and equipment for detecting smoking behaviors.

Background

Smoking behavior is one of ten bans on construction sites, and if the smoking behavior is not controlled, the possibility of fire disaster is increased, so that serious property loss and casualties are caused. At present, the smoke suction behavior is mostly controlled through a smoke detector and an artificial monitoring mode on the construction site, but a smoke sensor is limited by space, the space is very large under an outdoor scene, the accuracy rate of the smoke sensor detection can be reduced, and the manual monitoring mode cannot realize the real-time performance of the detection while wasting manpower. Therefore, it is an urgent problem to detect smoking behavior accurately and in real time on the construction site.

The traditional unsafe behavior detection method for workers is based on manual detection, the manual detection completely depends on manpower, the detection accuracy depends on the attention concentration degree of detection personnel, and as the coverage area of a construction site is larger and larger, the detection of whether unsafe behaviors exist in the workers in the construction site or not by the manpower is more difficult.

Disclosure of Invention

The application provides a network model training method and equipment for detecting smoking behaviors.

The application provides a network model training method for detecting smoking behaviors, which comprises the following steps:

establishing an initial network model, wherein the initial network model comprises a one-stage network model and a two-stage network model;

acquiring a training data set related to the smoking behavior, and labeling the training data set to obtain labeling information related to the training data set;

inputting the training data set and the labeling information thereof into the one-stage network model to generate a detection frame of each image in the training data set;

inputting each image and the detection frame thereof in the training data set into the two-stage network model to generate a judgment result about smoking behavior in each image in the training data set;

and training the first-stage network model and the second-stage network model based on the judgment result to obtain a preset network model for detecting smoking behavior.

Wherein said obtaining a training data set relating to said smoking behaviour comprises:

under different external conditions, carrying out image acquisition on smoking behaviors in a preset scene to obtain a smoking behavior image;

under different external conditions, carrying out image acquisition on the non-smoking behavior in a preset scene to obtain a non-smoking behavior image;

establishing the training data set based on the smoking behavior image and the non-smoking behavior image.

Wherein the labeling of the training data set comprises:

labeling a worker detection box and a worker skeletal point in each image in the training dataset;

wherein the worker skeletal points comprise basic skeletal points and hand skeletal points.

Wherein the inputting the training data set and its labeling information into the one-stage network model comprises:

normalizing all images in the training data set;

and inputting the normalized image and the labeling information thereof into the one-stage network model.

The loss function of the first-stage network model comprises a detection frame classification loss function and a detection frame coordinate loss function;

the detection frame classification loss function is used for learning the relation between the confidence coefficient of the human classification in the prediction detection frame and the confidence coefficient of the human classification in the labeling detection frame, and the detection frame coordinate loss function is used for learning the relation between the coordinate position of the prediction detection frame and the coordinate position of the labeling detection frame.

Before each image and its detection box in the training data set are input into the two-stage network model, the network model training method further includes:

screening the detection frames in the image by adopting a preset algorithm;

and fusing the screened detection frame with the corresponding image.

Wherein the generating of the determination result about smoking behavior in each image in the training dataset comprises:

acquiring a pre-marked real smoking action;

inputting each image and the detection frame thereof in the training data set into the two-stage network model to obtain a preset smoking action in the detection frame of each image;

calculating the attitude distance between the real smoking action and the preset smoking action;

and when the gesture distance is smaller than a preset threshold value, confirming that the smoking behavior exists in the range of the detection frame of the image.

Wherein, the calculating the gesture distance between the real smoking action and the preset smoking action comprises:

acquiring the number of joints of the real smoking action matched with the preset smoking action;

calculating the space distance of the same joint in the real smoking action and the preset smoking action;

and calculating the posture distance between the real smoking action and the preset smoking action according to the matched joint number and the space distance of the same joint.

Wherein, the network model training method further comprises:

acquiring a real-time monitoring image;

inputting the real-time monitoring image into the preset network model to obtain a judgment result output by the preset network model;

and confirming whether the smoking behavior exists in the real-time monitoring image based on the judgment result.

The application also provides a terminal device comprising a memory and a processor, wherein the memory is coupled to the processor;

the memory is used for storing program data, and the processor is used for executing the program data to realize the network model training method.

The present application also provides a computer storage medium for storing program data which, when executed by a processor, is used to implement the network model training method described above.

The beneficial effect of this application is: the terminal equipment establishes an initial network model; acquiring a training data set related to smoking behavior, and labeling the training data set to obtain labeling information related to the training data set; inputting the training data set and the labeling information thereof into a one-stage network model to generate a detection frame of each image in the training data set; inputting each image and the detection frame thereof in the training data set into the two-stage network model to generate a judgment result about smoking behavior in each image in the training data set; and training the first-stage network model and the second-stage network model based on the judgment result to obtain a preset network model for detecting smoking behavior. By means of the method, the network model training method can obtain the network model for detecting smoking behaviors, solves the limitation of the traditional method, and has good accuracy.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a network model training method for detecting smoking behavior provided herein;

FIG. 2 is a block diagram of an embodiment of an overall network model provided herein;

fig. 3 is a schematic structural diagram of an embodiment of a terminal device provided in the present application;

fig. 4 is a schematic structural diagram of an embodiment of a terminal device provided in the present application;

FIG. 5 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to solve the technical problems, the application discloses a deep learning-based two-stage detection method for unsafe behaviors of construction site workers, which is suitable for the field of worker behavior detection of construction sites in various big cities. The method is based on a deep learning framework, has good autonomous learning capability, fault-tolerant capability and generalization capability for various external factors which are difficult to quantify, and has good expansibility and higher prediction precision. Aiming at the problems that the existing construction site is gradually popularized, for example, smoking of workers is forbidden gradually and comprehensively in the construction site, the method can realize automatic monitoring and alarming of unsafe behaviors of the workers in the construction site, reduce the management cost of the construction site and improve the safety and reliability of construction.

Referring to fig. 1 in detail, fig. 1 is a schematic flowchart of an embodiment of a network model training method for detecting smoking behavior provided in the present application.

The network model training method is applied to a terminal device, wherein the terminal device can be a server, and can also be a system in which the server and a mobile terminal are matched with each other. Accordingly, various parts, such as various units, sub-units, modules and sub-modules, included in the terminal device may be all disposed in the server, or may be disposed in the server and the mobile terminal, respectively.

Further, the server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing distributed servers, or as a single software or software module, and is not limited herein. In some possible implementations, the network model training method of the embodiments of the present application may be implemented by a processor calling computer readable instructions stored in a memory.

Specifically, as shown in fig. 1, the network model training method in the embodiment of the present application specifically includes the following steps:

step S11: and establishing an initial network model, wherein the initial network model comprises a one-stage network model and a two-stage network model.

In the embodiment of the present application, the terminal device establishes an initial network model as shown in fig. 2, where the initial network model includes a one-stage network model and a two-stage network model, the one-stage network model is responsible for generating a worker detection box, and the two-stage network model is responsible for detecting the behavior of a worker.

Specifically, the one-phase network model may be based on the fast-rcnn setting, and the two-phase network model may be based on RMPE, Regional Multi-Person Pose Estimation setting. In other embodiments, other possible network models may be used to build the initial network model.

Step S12: and acquiring a training data set related to smoking behavior, and labeling the training data set to obtain labeling information related to the training data set.

In the embodiment of the application, the terminal equipment collects images of workers on the construction site and establishes a training data set. The worker marks the specific position of the worker in the image by using a marking tool in the training data set, namely marking a worker detection frame, marking basic skeleton points of the worker and marking hand skeleton points in detail. Wherein the basic skeleton points comprise trunk skeleton points, hand skeleton points, leg skeleton points, foot skeleton points and the like; the hand skeleton points comprise shoulder skeleton points, elbow skeleton points, wrist skeleton points, palmar skeleton points and five finger skeleton points, and the total number of the hand skeleton points is 9. Further, the worker needs to enjoy and label each image in the training data set to label whether the behavior of the worker in the image belongs to the smoking behavior.

Specifically, when the worker collects images of a training data set of smoking behaviors of the worker in a construction site scene, the worker needs to collect images including smoking behaviors with different scales and non-smoking behaviors under different lighting conditions so as to enrich the content of the training data set.

Step S13: and inputting the training data set and the labeling information thereof into the one-stage network model to generate a detection frame of each image in the training data set.

In the embodiment of the present application, as shown in fig. 2, the terminal device performs convolution processing on all images in the training data set by the convolutional layer to obtain a feature map of each image. Then, the terminal device normalizes the image feature map, and inputs the normalized image feature map and the manually marked annotation information into the one-stage network model at the same time to obtain a prediction detection frame generated on the image feature map by the one-stage network model.

Specifically, the terminal device generates a plurality of candidate regions on the image feature map through a candidate region network, and then selects a region of interest related to smoking behavior through a classifier to generate a prediction detection frame.

Further, the loss function set for the one-stage network model in the embodiment of the present application is defined as follows:

where i is the index of the prediction detection box, i.e., the anchor box generated by the one-stage network model. p is a radical of_iIs the confidence that there is a worker in the ith prediction detection box, the true tag

Indicating that a worker is present in the predictive detection box, a genuine tag

Indicating that there is no worker in the prediction detection box, only background.

Is the real position coordinate of the ith label prediction box, t_iIs the position coordinate of the ith prediction detection frame, including [ t_x，t_y，t_w，t_h]Four parameters. N is a radical of_clsAdjustment parameter for the classification loss function of the detection frame, N_regAnd λ is an adjustment function of the detection frame coordinate loss function. Therefore, the loss function of the one-stage network model is specifically composed of a detection frame classification loss function and a detection frame coordinate loss function.

In the detection frame regression, [ t ]_x，t_y，t_w，t_h]The four parameters are defined as follows:

t_x＝(x-x_a)/w_at_y＝(y-y_a)/h_at_w＝log(w/w_a)t_h＝log(h/h_a)

wherein, [ x, y, w, h [ ]]The center coordinates, width and height, [ x ] of the predicted detection frame, respectively_a，y_a，w_a，h_a]Respectively, the center coordinate, width and height of the labeled detection frame, [ t ]_x，t_y，t_w，t_h]Four parameters are used to characterize the offset between the prediction detection box and the annotation detection box.

Before inputting the prediction detection frame into the two-stage network model, the terminal device may further fuse the prediction detection frame generated by the one-stage network model with the image feature map after being screened by using an NMS (Non-Maximum Suppression) algorithm, and input a fusion result into the two-stage network model, where the two-stage network model mainly reselects the detection frame generated by the one-stage network model by a posture analysis method to generate a final result.

The definition of the NMS algorithm in the embodiment of the present application is as follows:

specifically, the terminal device inputs one image feature map and all prediction detection frames B ═ B thereof₁…b_NAnd (5) the confidence coefficient S ═ S corresponding to each prediction detection frame₁...s_N}. And circularly executing the following steps until all the prediction detection boxes are traversed: obtaining the prediction detection frame with the highest confidence coefficient in the B, then calculating the intersection ratio of the prediction detection frame with the highest confidence coefficient and other prediction detection frames, and when the intersection ratio is larger than a preset threshold valueN_tThen, deleting the prediction detection frame and the confidence thereof; when the intersection ratio is less than or equal to a preset threshold value N_tThen, the predicted detection box and its confidence are retained.

And after traversing all the prediction detection boxes based on the NMS algorithm, fusing the reserved prediction detection boxes with the image feature map, and inputting a fusion result into the two-stage network model. The two-stage network model is mainly used for selecting the prediction detection frame generated by the one-stage network model again through a posture analysis method to generate a final result.

Step S14: and inputting each image and the detection box thereof in the training data set into the two-stage network model to generate a judgment result about smoking behavior in each image in the training data set.

In the embodiment of the application, the terminal device defines the smoking action with m joints in the two-stage network model

Wherein the content of the first and second substances,

the j position coordinate of the i joint and the confidence thereof are respectively.

The two-stage network model identifies the worker smoking action in the prediction detection box according to the prediction detection box generated in the step S13

Assuming a smoking action P_iThe detection frame is

A soft matching function may be defined for calculating the smoking action P_iAnd smoking action P_jThe number of joints matching between the two poses is:

wherein σ₁Hyper-parameter being a soft-match functionAnd (4) counting.

Further, a smoking action P is defined_iAnd smoking action P_jThe spatial distance between each identical joint in the two poses is formulated as:

wherein σ₂Is a hyperparameter of the spatial distance formula.

Finally, an attitude distance function is obtained:

d_pose(P_i，P_j|Λ)＝K_Sim(P_i，P_j|σ₁)+λH_Sim(P_i，P_j|σ₂)

therefore, the two-stage network model is responsible for generating the smoking action P of the worker_jAnd calculating the smoking action P of the worker_jSmoking action P with label_iWhen the distance is smaller than the preset threshold value, the action can be judged to belong to the smoking action.

Step S15: and training the first-stage network model and the second-stage network model based on the judgment result to obtain a preset network model for detecting smoking behavior.

In this embodiment of the application, the terminal device may train the first-stage network model and the second-stage network model based on the loss functions of the first-stage network model and the second-stage network model, respectively, so as to train the overall network model shown in fig. 2, and finally obtain the preset network model for detecting smoking behavior.

Further, after a preset network model for detecting smoking behavior is trained, the terminal equipment can acquire a real-time monitoring image; inputting the real-time monitoring image into the preset network model to obtain a judgment result output by the preset network model; and confirming whether the smoking behavior exists in the real-time monitoring image based on the judgment result.

In the embodiment of the application, the terminal equipment establishes an initial network model; acquiring a training data set related to smoking behavior, and labeling the training data set to obtain labeling information related to the training data set; inputting the training data set and the labeling information thereof into a one-stage network model to generate a detection frame of each image in the training data set; inputting each image and the detection frame thereof in the training data set into the two-stage network model to generate a judgment result about smoking behavior in each image in the training data set; and training the first-stage network model and the second-stage network model based on the judgment result to obtain a preset network model for detecting smoking behavior. By means of the method, the network model training method can obtain the network model for detecting smoking behaviors, solves the limitation of the traditional method, and has good accuracy.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

To implement the network model training method of the foregoing embodiment, the present application further provides a terminal device, and specifically refer to fig. 3, where fig. 3 is a schematic structural diagram of an embodiment of the terminal device provided in the present application.

The terminal device 300 of the embodiment of the present application includes a model building module 31, a data obtaining module 32, a behavior judging module 33, and a network training module 34; wherein the content of the first and second substances,

a model building module 31 is configured to build an initial network model, where the initial network model includes a one-stage network model and a two-stage network model.

And the data acquisition module 32 is configured to acquire a training data set related to the smoking behavior, and label the training data set to obtain labeled information related to the training data set.

A behavior judging module 33, configured to input the training data set and the label information thereof into the one-stage network model, and generate a detection frame for each image in the training data set; and inputting each image and the detection frame thereof in the training data set into the two-stage network model to generate a judgment result about smoking behavior in each image in the training data set.

And the network training module 34 is configured to train the first-stage network model and the second-stage network model based on the determination result to obtain a preset network model for detecting smoking behavior.

To implement the network model training method of the foregoing embodiment, the present application further provides another terminal device, and specifically refer to fig. 4, where fig. 4 is a schematic structural diagram of another embodiment of the terminal device provided in the present application.

The terminal device 400 of the embodiment of the present application includes a memory 41 and a processor 42, wherein the memory 41 and the processor 42 are coupled.

The memory 41 is used for storing program data, and the processor 42 is used for executing the program data to implement the network model training method described in the above embodiments.

In the present embodiment, the processor 42 may also be referred to as a CPU (Central Processing Unit). The processor 42 may be an integrated circuit chip having signal processing capabilities. The processor 42 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 42 may be any conventional processor or the like.

The present application further provides a computer storage medium, as shown in fig. 5, the computer storage medium 500 is used for storing program data 51, and the program data 51 is used for implementing the network model training method according to the above embodiment when being executed by a processor.

The present application further provides a computer program product, wherein the computer program product includes a computer program operable to cause a computer to execute the network model training method according to the embodiment of the present application. The computer program product may be a software installation package.

The network model training method according to the above embodiment of the present application may be stored in a device, for example, a computer-readable storage medium, when the network model training method is implemented in the form of a software functional unit and sold or used as an independent product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A network model training method for detecting smoking behavior is characterized by comprising the following steps:

2. The network model training method of claim 1,

the obtaining a training data set regarding the smoking behavior comprises:

3. The network model training method of claim 1,

the labeling of the training data set includes:

4. The network model training method of claim 3,

the inputting the training data set and the labeling information thereof into the one-stage network model includes:

normalizing all images in the training data set;

5. The network model training method according to claim 1 or 4,

6. The network model training method of claim 1,

screening the detection frames in the image by adopting a preset algorithm;

and fusing the screened detection frame with the corresponding image.

7. The network model training method of claim 1,

the generating of the judgment result about smoking behavior in each image in the training data set includes:

acquiring a pre-marked real smoking action;

8. The network model training method of claim 7,

the calculating the attitude distance between the real smoking action and the preset smoking action comprises the following steps:

9. The network model training method of claim 1,

the network model training method further comprises the following steps:

acquiring a real-time monitoring image;

10. A terminal device, comprising a memory and a processor, wherein the memory is coupled to the processor;

wherein the memory is configured to store program data and the processor is configured to execute the program data to implement the network model training method of any one of claims 1-9.

11. A computer storage medium for storing program data which, when executed by a processor, is adapted to implement the network model training method of any of claims 1-9.