CN115546894A

CN115546894A - Behavior detection method based on lightweight OpenPose space-time diagram network

Info

Publication number: CN115546894A
Application number: CN202211245726.2A
Authority: CN
Inventors: 张小瑞; 解其健; 孙伟; 张小娜; 宋爱国
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2022-12-30

Abstract

The invention discloses a behavior detection method based on a lightweight OpenPose space-time diagram network, which comprises the following steps of: (1) collecting a data set, and preprocessing an image; (2) Sending the data set into a lightweight OpenPose network to obtain a human skeleton sequence; (3) Sending the human skeleton sequence into a DST-GCN network, and extracting space structure characteristics and time trajectory characteristics from space and time dimensions to form high-level space-time characteristics; (4) Classifying the high-level spatiotemporal features into corresponding action classes by using a Softmax classifier; and (5) judging the motion type of the test image. According to the invention, the OpenPose is lightened firstly, the real-time performance of model detection is improved, meanwhile, a dense connection mechanism is adopted for the ST-GCN for improvement, the long-distance associated information extraction capability of a space-time convolution layer is improved, and the judgment accuracy is improved.

Description

Behavior detection method based on lightweight OpenPose space-time diagram network

Technical Field

The invention relates to the technical field of computer vision abnormal behavior detection, in particular to a behavior detection method based on a lightweight OpenPose space-time diagram network.

Background

In daily life, two main reasons for falling of a human body are provided, one is that the legs and the feet are inconvenient to trip or slide, and the other is falling caused by diseases; if the people are not rescued in time during falling, the injury is aggravated, and even the life cost is paid; therefore, it is important to detect a fall of a human body.

Currently, common human fall detection technologies can be divided into three categories, including wearable-based, environmental sensor-based, and computer vision-based. The wearing method is to place the sensor in a belt or a watch, etc., but the elderly may forget to wear the sensor due to memory deterioration or may not wear the sensor due to discomfort. Methods based on environmental sensors, such as infrared technology monitoring, but some old people are allergic to infrared rays and affect the health of the old people. The tumble detecting method based on computer vision includes collecting human body video image information with some video camera devices, processing the image information with image processing technology, extracting human body features, and analyzing to obtain human body motion state. Generally, the fall behavior can be recognized by a variety of patterns, such as appearance, optical flow, depth, and human skeleton. The dynamic human skeleton generally transmits important information, an OpenPose model is mostly adopted for skeleton extraction at present, most performance indexes of the detection capability of the model are the best, and if the dynamic human skeleton is applied to an actual scene, the dynamic human skeleton still has the defects of poor detection real-time performance, excessive model parameters, overlarge model and the like. Meanwhile, the skeleton data is non-Euclidean structure data, and the prior convolutional network and the recurrent neural network ignore the vital inter-node association information, so the overall promotion is limited.

Disclosure of Invention

The purpose of the invention is as follows: the invention improves the OpenPose in light weight, optimizes the ST-GCN network by adopting a dense connection mechanism, and improves the accuracy and the real-time performance of human behavior detection.

The technical scheme is as follows: the behavior detection method based on the lightweight OpenPose space-time diagram network comprises the following steps:

(1) Collecting a data set, and preprocessing an image;

(2) Sending the data set into a lightweight OpenPose network to obtain a human skeleton sequence;

(3) Sending the human skeleton sequence into a DST-GCN network, and extracting space structure characteristics and time trajectory characteristics from space and time dimensions to form high-level space-time characteristics;

(4) Classifying the high-level spatiotemporal features into corresponding action classes by using a Softmax classifier;

(5) And judging the motion type of the test image.

The step (2) comprises the following steps:

(2.1) acquiring characteristics of data input into the lightweight OpenPose network;

(2.2) extracting the features, sending the features into a prediction layer of an OpenPose model, obtaining thermodynamic diagrams of key points of a human body and affinities among different key points, and fusing to obtain a human body skeleton sequence; and replacing a 7x7 convolution structure in the prediction layer with a structure formed by connecting three convolutions of 1x7 convolution, 7x1 convolution and 7x7 convolution in parallel, fusing the outputs of the three convolutions after BN operation, and simultaneously compressing the number of characteristic diagram channels input into the parallel convolution layers by adopting one 1x1 convolution before the parallel convolution layers are connected in parallel.

The light-weight OpenPose network adopts a MobileNet V1 network to replace a VGG 19 network in an OpenPose model, the stride of a conv4_2/dw layer in the MobileNet V1 network is removed, the expansion parameter value is set to be 2, and the light-weight OpenPose network only adopts the first layer to the conv5_5 layer of the MobileNet V1 network.

In the step (3), the DST-GCN network adopts a dense connection mechanism, nine layers of space-time diagram convolution layers are designed into two dense blocks, the first five layers are one dense block, the last four layers are one dense block, in each dense block, each layer of space-time diagram convolution is connected with all the previous space-time diagram convolution layers, features are spliced on channels in a cross-layer mode, a transition layer is designed between the two dense blocks to control the complexity of the model, the number of the channels is reduced through 1x1 convolution layers, and the height and the width of the feature diagram are reduced by half through an average pooling layer with the stride of 2.

And (4) the Softmax classifier uses two fully connected layers, the first fully connected layer reduces the dimensionality from 256 to 64, meanwhile dropout is used for preventing overfitting, the second fully connected layer reduces the dimensionality to the number of categories, and behavior classification results are output.

Adopting a two-classification cross entropy loss function, adding an L2 regularization item, and training an optimal model by using an Adam optimizer, wherein an L2 regularized target loss function is as follows:

wherein L is the target loss function, a is the sample subscript, m is the number of samples,

is a sample label with a positive class of 1 and a negative class of 0 _a For the probability of prediction being positive, λ | θ | |) ² For the L2 regular term, theta represents a characteristic coefficient, and lambda specifies a coefficient for the user.

And (5) selecting a section of monitoring video, firstly obtaining a skeleton sequence of a human body target in the monitoring video through lightweight OpenPose, sending the skeleton sequence into a DST-GCN, extracting a high-level space-time characteristic diagram of the skeleton sequence through graph convolution and time convolution, then sending the space-time characteristic diagram into a classifier for classification, and outputting the probability of falling and non-falling, wherein the high probability is a judgment result.

The behavior detection device based on the light-weighted OpenPose space-time diagram network comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and is characterized in that when the computer program is loaded to the processor, the behavior detection method based on the light-weighted OpenPose space-time diagram network is realized.

Has the advantages that: (1) The method adopts a two-class cross entropy loss function, adds an L2 regular term to further avoid model overfitting, trains an optimal model by using an Adam optimizer, is suitable for solving the problem with large-scale data or parameters, and has high calculation efficiency and low memory requirement. (2) And the OpenPose is lightened, and the real-time property of model detection is improved. (3) And the ST-GCN is improved by adopting a dense connection mechanism, so that the long-distance associated information extraction capability of the space-time convolution layer is improved, and the judgment accuracy is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

As shown in fig. 1, the present invention provides a technical solution:

the behavior detection method based on the light OpenPose space-time diagram network comprises the following steps:

(1) Collecting a data set, and preprocessing an image;

(5) And judging the motion type of the test image.

The step (2) comprises the following steps:

(2.2) after the features are extracted, sending the features into a prediction layer of an OpenPose model, obtaining thermodynamic diagrams of key points of a human body and affinities among different key points, and fusing to obtain a human skeleton sequence; and replacing a 7x7 convolution structure in the prediction layer with a structure formed by connecting three convolutions of 1x7 convolution, 7x1 convolution and 7x7 convolution in parallel, fusing the outputs of the three convolutions after BN operation, and simultaneously compressing the number of characteristic diagram channels input into the parallel convolution layers by adopting one 1x1 convolution before the parallel convolution layers are connected in parallel.

Adopting a two-classification cross entropy loss function, adding an L2 regular term, and training an optimal model by using an Adam optimizer, wherein the target loss function containing the L2 regular is as follows:

is a sampleLabels, positive class 1, negative class 0 _a For the probability of prediction being positive, λ | θ | |) ² For the L2 regular term, theta represents a characteristic coefficient, and lambda specifies a coefficient for the user.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The behavior detection method based on the lightweight OpenPose space-time diagram network is characterized by comprising the following steps of:

(1) Collecting a data set, and preprocessing an image;

(5) And judging the motion type of the test image.

2. The method for behavior detection based on the light-weighted OpenPose space-time graph network according to claim 1, wherein the step (2) comprises:

3. The method for detecting behaviors based on the light-weight OpenPose space-time graph network according to claim 1, wherein the light-weight OpenPose network replaces a VGG 19 network in an OpenPose model with a MobileNet V1 network, and removes a stride of conv4_2/dw layers in the MobileNet V1 network, sets an expansion parameter value to 2, and the light-weight OpenPose network only adopts the first layer to the conv5_5 layer of the MobileNet V1 network.

4. The behavior detection method based on the light-weighted OpenPose space-time diagram network as claimed in claim 1, wherein in step (3), the DST-GCN network employs a dense connection mechanism, nine space-time diagram convolutional layers are designed as two dense blocks, the first five layers are one dense block, the last four layers are one dense block, in each dense block, each space-time diagram convolutional layer is connected with all the space-time diagram convolutions, features are spliced on channels across layers, a transition layer is designed between two dense blocks to control model complexity, the number of channels is reduced through 1 × 1 convolutional layers, and the height and width of the feature diagram are reduced by half using an average pooling layer with a step size of 2.

5. The behavior detection method based on the light-weight OpenPose space-time graph network according to claim 1, wherein the Softmax classifier in the step (4) uses two fully connected layers, the first fully connected layer reduces the dimension from 256 to 64, meanwhile, dropout is used for preventing overfitting, and the second fully connected layer reduces the dimension to the number of classes, so as to output the behavior classification result.

6. The behavior detection method based on the lightweight OpenPose space-time graph network according to claim 1, wherein a two-class cross-entropy loss function is adopted, an L2 regularization term is added, an Adam optimizer is used for training an optimal model, and an objective loss function containing L2 regularization is as follows:

wherein L is an objective loss function, a is a sample subscript, m is the number of samples,

is a sample label with a positive class of 1 and a negative class of 0 _a For the probability of prediction being positive, λ | θ | |) ² Is an L2 regular term, theta represents a characteristic coefficient, and lambda specifies a coefficient for a user.

7. The behavior detection method based on the light-weight OpenPose spatiotemporal graph network as claimed in claim 1, wherein in the test stage of step (5), a section of surveillance video is selected, firstly, a skeleton sequence of a human body target in the surveillance video is obtained through light-weight OpenPose, and is sent to a DST-GCN to extract a high-level spatiotemporal feature graph of the skeleton sequence through graph convolution and time convolution, then the spatiotemporal feature graph is sent to a classifier to be classified, the probability of falling and non-falling is output, and the judgment result with high probability is output.

8. Behavior detection apparatus based on a lightweight openpos space-time graph network, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the computer program, when loaded into the processor, implements the behavior detection method based on the lightweight openpos space-time graph network according to any one of claims 1 to 7.