CN115546894A - Behavior detection method based on lightweight OpenPose space-time diagram network - Google Patents

Behavior detection method based on lightweight OpenPose space-time diagram network Download PDF

Info

Publication number
CN115546894A
CN115546894A CN202211245726.2A CN202211245726A CN115546894A CN 115546894 A CN115546894 A CN 115546894A CN 202211245726 A CN202211245726 A CN 202211245726A CN 115546894 A CN115546894 A CN 115546894A
Authority
CN
China
Prior art keywords
space
openpose
network
time
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211245726.2A
Other languages
Chinese (zh)
Inventor
张小瑞
解其健
孙伟
张小娜
宋爱国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202211245726.2A priority Critical patent/CN115546894A/en
Publication of CN115546894A publication Critical patent/CN115546894A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a behavior detection method based on a lightweight OpenPose space-time diagram network, which comprises the following steps of: (1) collecting a data set, and preprocessing an image; (2) Sending the data set into a lightweight OpenPose network to obtain a human skeleton sequence; (3) Sending the human skeleton sequence into a DST-GCN network, and extracting space structure characteristics and time trajectory characteristics from space and time dimensions to form high-level space-time characteristics; (4) Classifying the high-level spatiotemporal features into corresponding action classes by using a Softmax classifier; and (5) judging the motion type of the test image. According to the invention, the OpenPose is lightened firstly, the real-time performance of model detection is improved, meanwhile, a dense connection mechanism is adopted for the ST-GCN for improvement, the long-distance associated information extraction capability of a space-time convolution layer is improved, and the judgment accuracy is improved.

Description

Behavior detection method based on lightweight OpenPose space-time diagram network
Technical Field
The invention relates to the technical field of computer vision abnormal behavior detection, in particular to a behavior detection method based on a lightweight OpenPose space-time diagram network.
Background
In daily life, two main reasons for falling of a human body are provided, one is that the legs and the feet are inconvenient to trip or slide, and the other is falling caused by diseases; if the people are not rescued in time during falling, the injury is aggravated, and even the life cost is paid; therefore, it is important to detect a fall of a human body.
Currently, common human fall detection technologies can be divided into three categories, including wearable-based, environmental sensor-based, and computer vision-based. The wearing method is to place the sensor in a belt or a watch, etc., but the elderly may forget to wear the sensor due to memory deterioration or may not wear the sensor due to discomfort. Methods based on environmental sensors, such as infrared technology monitoring, but some old people are allergic to infrared rays and affect the health of the old people. The tumble detecting method based on computer vision includes collecting human body video image information with some video camera devices, processing the image information with image processing technology, extracting human body features, and analyzing to obtain human body motion state. Generally, the fall behavior can be recognized by a variety of patterns, such as appearance, optical flow, depth, and human skeleton. The dynamic human skeleton generally transmits important information, an OpenPose model is mostly adopted for skeleton extraction at present, most performance indexes of the detection capability of the model are the best, and if the dynamic human skeleton is applied to an actual scene, the dynamic human skeleton still has the defects of poor detection real-time performance, excessive model parameters, overlarge model and the like. Meanwhile, the skeleton data is non-Euclidean structure data, and the prior convolutional network and the recurrent neural network ignore the vital inter-node association information, so the overall promotion is limited.
Disclosure of Invention
The purpose of the invention is as follows: the invention improves the OpenPose in light weight, optimizes the ST-GCN network by adopting a dense connection mechanism, and improves the accuracy and the real-time performance of human behavior detection.
The technical scheme is as follows: the behavior detection method based on the lightweight OpenPose space-time diagram network comprises the following steps:
(1) Collecting a data set, and preprocessing an image;
(2) Sending the data set into a lightweight OpenPose network to obtain a human skeleton sequence;
(3) Sending the human skeleton sequence into a DST-GCN network, and extracting space structure characteristics and time trajectory characteristics from space and time dimensions to form high-level space-time characteristics;
(4) Classifying the high-level spatiotemporal features into corresponding action classes by using a Softmax classifier;
(5) And judging the motion type of the test image.
The step (2) comprises the following steps:
(2.1) acquiring characteristics of data input into the lightweight OpenPose network;
(2.2) extracting the features, sending the features into a prediction layer of an OpenPose model, obtaining thermodynamic diagrams of key points of a human body and affinities among different key points, and fusing to obtain a human body skeleton sequence; and replacing a 7x7 convolution structure in the prediction layer with a structure formed by connecting three convolutions of 1x7 convolution, 7x1 convolution and 7x7 convolution in parallel, fusing the outputs of the three convolutions after BN operation, and simultaneously compressing the number of characteristic diagram channels input into the parallel convolution layers by adopting one 1x1 convolution before the parallel convolution layers are connected in parallel.
The light-weight OpenPose network adopts a MobileNet V1 network to replace a VGG 19 network in an OpenPose model, the stride of a conv4_2/dw layer in the MobileNet V1 network is removed, the expansion parameter value is set to be 2, and the light-weight OpenPose network only adopts the first layer to the conv5_5 layer of the MobileNet V1 network.
In the step (3), the DST-GCN network adopts a dense connection mechanism, nine layers of space-time diagram convolution layers are designed into two dense blocks, the first five layers are one dense block, the last four layers are one dense block, in each dense block, each layer of space-time diagram convolution is connected with all the previous space-time diagram convolution layers, features are spliced on channels in a cross-layer mode, a transition layer is designed between the two dense blocks to control the complexity of the model, the number of the channels is reduced through 1x1 convolution layers, and the height and the width of the feature diagram are reduced by half through an average pooling layer with the stride of 2.
And (4) the Softmax classifier uses two fully connected layers, the first fully connected layer reduces the dimensionality from 256 to 64, meanwhile dropout is used for preventing overfitting, the second fully connected layer reduces the dimensionality to the number of categories, and behavior classification results are output.
Adopting a two-classification cross entropy loss function, adding an L2 regularization item, and training an optimal model by using an Adam optimizer, wherein an L2 regularized target loss function is as follows:
Figure BDA0003886516910000021
wherein L is the target loss function, a is the sample subscript, m is the number of samples,
Figure BDA0003886516910000022
is a sample label with a positive class of 1 and a negative class of 0 a For the probability of prediction being positive, λ | θ | |) 2 For the L2 regular term, theta represents a characteristic coefficient, and lambda specifies a coefficient for the user.
And (5) selecting a section of monitoring video, firstly obtaining a skeleton sequence of a human body target in the monitoring video through lightweight OpenPose, sending the skeleton sequence into a DST-GCN, extracting a high-level space-time characteristic diagram of the skeleton sequence through graph convolution and time convolution, then sending the space-time characteristic diagram into a classifier for classification, and outputting the probability of falling and non-falling, wherein the high probability is a judgment result.
The behavior detection device based on the light-weighted OpenPose space-time diagram network comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and is characterized in that when the computer program is loaded to the processor, the behavior detection method based on the light-weighted OpenPose space-time diagram network is realized.
Has the advantages that: (1) The method adopts a two-class cross entropy loss function, adds an L2 regular term to further avoid model overfitting, trains an optimal model by using an Adam optimizer, is suitable for solving the problem with large-scale data or parameters, and has high calculation efficiency and low memory requirement. (2) And the OpenPose is lightened, and the real-time property of model detection is improved. (3) And the ST-GCN is improved by adopting a dense connection mechanism, so that the long-distance associated information extraction capability of the space-time convolution layer is improved, and the judgment accuracy is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 1, the present invention provides a technical solution:
the behavior detection method based on the light OpenPose space-time diagram network comprises the following steps:
(1) Collecting a data set, and preprocessing an image;
(2) Sending the data set into a lightweight OpenPose network to obtain a human skeleton sequence;
(3) Sending the human skeleton sequence into a DST-GCN network, and extracting space structure characteristics and time trajectory characteristics from space and time dimensions to form high-level space-time characteristics;
(4) Classifying the high-level spatiotemporal features into corresponding action classes by using a Softmax classifier;
(5) And judging the motion type of the test image.
The step (2) comprises the following steps:
(2.1) acquiring characteristics of data input into the lightweight OpenPose network;
(2.2) after the features are extracted, sending the features into a prediction layer of an OpenPose model, obtaining thermodynamic diagrams of key points of a human body and affinities among different key points, and fusing to obtain a human skeleton sequence; and replacing a 7x7 convolution structure in the prediction layer with a structure formed by connecting three convolutions of 1x7 convolution, 7x1 convolution and 7x7 convolution in parallel, fusing the outputs of the three convolutions after BN operation, and simultaneously compressing the number of characteristic diagram channels input into the parallel convolution layers by adopting one 1x1 convolution before the parallel convolution layers are connected in parallel.
The light-weight OpenPose network adopts a MobileNet V1 network to replace a VGG 19 network in an OpenPose model, the stride of a conv4_2/dw layer in the MobileNet V1 network is removed, the expansion parameter value is set to be 2, and the light-weight OpenPose network only adopts the first layer to the conv5_5 layer of the MobileNet V1 network.
In the step (3), the DST-GCN network adopts a dense connection mechanism, nine layers of space-time diagram convolution layers are designed into two dense blocks, the first five layers are one dense block, the last four layers are one dense block, in each dense block, each layer of space-time diagram convolution is connected with all the previous space-time diagram convolution layers, features are spliced on channels in a cross-layer mode, a transition layer is designed between the two dense blocks to control the complexity of the model, the number of the channels is reduced through 1x1 convolution layers, and the height and the width of the feature diagram are reduced by half through an average pooling layer with the stride of 2.
And (4) the Softmax classifier uses two fully connected layers, the first fully connected layer reduces the dimensionality from 256 to 64, meanwhile dropout is used for preventing overfitting, the second fully connected layer reduces the dimensionality to the number of categories, and behavior classification results are output.
Adopting a two-classification cross entropy loss function, adding an L2 regular term, and training an optimal model by using an Adam optimizer, wherein the target loss function containing the L2 regular is as follows:
Figure BDA0003886516910000041
wherein L is the target loss function, a is the sample subscript, m is the number of samples,
Figure BDA0003886516910000042
is a sampleLabels, positive class 1, negative class 0 a For the probability of prediction being positive, λ | θ | |) 2 For the L2 regular term, theta represents a characteristic coefficient, and lambda specifies a coefficient for the user.
And (5) selecting a section of monitoring video, firstly obtaining a skeleton sequence of a human body target in the monitoring video through lightweight OpenPose, sending the skeleton sequence into a DST-GCN, extracting a high-level space-time characteristic diagram of the skeleton sequence through graph convolution and time convolution, then sending the space-time characteristic diagram into a classifier for classification, and outputting the probability of falling and non-falling, wherein the high probability is a judgment result.
The behavior detection device based on the light-weighted OpenPose space-time diagram network comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and is characterized in that when the computer program is loaded to the processor, the behavior detection method based on the light-weighted OpenPose space-time diagram network is realized.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. The behavior detection method based on the lightweight OpenPose space-time diagram network is characterized by comprising the following steps of:
(1) Collecting a data set, and preprocessing an image;
(2) Sending the data set into a lightweight OpenPose network to obtain a human skeleton sequence;
(3) Sending the human skeleton sequence into a DST-GCN network, and extracting space structure characteristics and time trajectory characteristics from space and time dimensions to form high-level space-time characteristics;
(4) Classifying the high-level spatiotemporal features into corresponding action classes by using a Softmax classifier;
(5) And judging the motion type of the test image.
2. The method for behavior detection based on the light-weighted OpenPose space-time graph network according to claim 1, wherein the step (2) comprises:
(2.1) acquiring characteristics of data input into the lightweight OpenPose network;
(2.2) extracting the features, sending the features into a prediction layer of an OpenPose model, obtaining thermodynamic diagrams of key points of a human body and affinities among different key points, and fusing to obtain a human body skeleton sequence; and replacing a 7x7 convolution structure in the prediction layer with a structure formed by connecting three convolutions of 1x7 convolution, 7x1 convolution and 7x7 convolution in parallel, fusing the outputs of the three convolutions after BN operation, and simultaneously compressing the number of characteristic diagram channels input into the parallel convolution layers by adopting one 1x1 convolution before the parallel convolution layers are connected in parallel.
3. The method for detecting behaviors based on the light-weight OpenPose space-time graph network according to claim 1, wherein the light-weight OpenPose network replaces a VGG 19 network in an OpenPose model with a MobileNet V1 network, and removes a stride of conv4_2/dw layers in the MobileNet V1 network, sets an expansion parameter value to 2, and the light-weight OpenPose network only adopts the first layer to the conv5_5 layer of the MobileNet V1 network.
4. The behavior detection method based on the light-weighted OpenPose space-time diagram network as claimed in claim 1, wherein in step (3), the DST-GCN network employs a dense connection mechanism, nine space-time diagram convolutional layers are designed as two dense blocks, the first five layers are one dense block, the last four layers are one dense block, in each dense block, each space-time diagram convolutional layer is connected with all the space-time diagram convolutions, features are spliced on channels across layers, a transition layer is designed between two dense blocks to control model complexity, the number of channels is reduced through 1 × 1 convolutional layers, and the height and width of the feature diagram are reduced by half using an average pooling layer with a step size of 2.
5. The behavior detection method based on the light-weight OpenPose space-time graph network according to claim 1, wherein the Softmax classifier in the step (4) uses two fully connected layers, the first fully connected layer reduces the dimension from 256 to 64, meanwhile, dropout is used for preventing overfitting, and the second fully connected layer reduces the dimension to the number of classes, so as to output the behavior classification result.
6. The behavior detection method based on the lightweight OpenPose space-time graph network according to claim 1, wherein a two-class cross-entropy loss function is adopted, an L2 regularization term is added, an Adam optimizer is used for training an optimal model, and an objective loss function containing L2 regularization is as follows:
Figure FDA0003886516900000021
wherein L is an objective loss function, a is a sample subscript, m is the number of samples,
Figure FDA0003886516900000022
is a sample label with a positive class of 1 and a negative class of 0 a For the probability of prediction being positive, λ | θ | |) 2 Is an L2 regular term, theta represents a characteristic coefficient, and lambda specifies a coefficient for a user.
7. The behavior detection method based on the light-weight OpenPose spatiotemporal graph network as claimed in claim 1, wherein in the test stage of step (5), a section of surveillance video is selected, firstly, a skeleton sequence of a human body target in the surveillance video is obtained through light-weight OpenPose, and is sent to a DST-GCN to extract a high-level spatiotemporal feature graph of the skeleton sequence through graph convolution and time convolution, then the spatiotemporal feature graph is sent to a classifier to be classified, the probability of falling and non-falling is output, and the judgment result with high probability is output.
8. Behavior detection apparatus based on a lightweight openpos space-time graph network, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the computer program, when loaded into the processor, implements the behavior detection method based on the lightweight openpos space-time graph network according to any one of claims 1 to 7.
CN202211245726.2A 2022-10-12 2022-10-12 Behavior detection method based on lightweight OpenPose space-time diagram network Pending CN115546894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211245726.2A CN115546894A (en) 2022-10-12 2022-10-12 Behavior detection method based on lightweight OpenPose space-time diagram network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211245726.2A CN115546894A (en) 2022-10-12 2022-10-12 Behavior detection method based on lightweight OpenPose space-time diagram network

Publications (1)

Publication Number Publication Date
CN115546894A true CN115546894A (en) 2022-12-30

Family

ID=84733734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211245726.2A Pending CN115546894A (en) 2022-10-12 2022-10-12 Behavior detection method based on lightweight OpenPose space-time diagram network

Country Status (1)

Country Link
CN (1) CN115546894A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958703A (en) * 2023-08-02 2023-10-27 德智鸿(上海)机器人有限责任公司 Identification method and device based on acetabulum fracture

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958703A (en) * 2023-08-02 2023-10-27 德智鸿(上海)机器人有限责任公司 Identification method and device based on acetabulum fracture

Similar Documents

Publication Publication Date Title
Pan et al. Deepfake detection through deep learning
US20120106782A1 (en) Detector for chemical, biological and/or radiological attacks
CN108681712A (en) A kind of Basketball Match Context event recognition methods of fusion domain knowledge and multistage depth characteristic
Hsueh et al. Human behavior recognition from multiview videos
CN111582095B (en) Light-weight rapid detection method for abnormal behaviors of pedestrians
CN109871780B (en) Face quality judgment method and system and face identification method and system
CN113838034B (en) Quick detection method for surface defects of candy package based on machine vision
CN111582092B (en) Pedestrian abnormal behavior detection method based on human skeleton
CN113239801B (en) Cross-domain action recognition method based on multi-scale feature learning and multi-level domain alignment
CN114283469A (en) Lightweight target detection method and system based on improved YOLOv4-tiny
TWI761813B (en) Video analysis method and related model training methods, electronic device and storage medium thereof
CN109446897B (en) Scene recognition method and device based on image context information
CN113283368B (en) Model training method, face attribute analysis method, device and medium
CN115546894A (en) Behavior detection method based on lightweight OpenPose space-time diagram network
Engoor et al. Occlusion-aware dynamic human emotion recognition using landmark detection
Burkapalli et al. TRANSFER LEARNING: INCEPTION-V3 BASED CUSTOM CLASSIFICATION APPROACH FOR FOOD IMAGES.
Sezavar et al. DCapsNet: Deep capsule network for human activity and gait recognition with smartphone sensors
Banerjee et al. CNN-SVM Model for Accurate Detection of Bacterial Diseases in Cucumber Leaves
CN116740808A (en) Animal behavior recognition method based on deep learning target detection and image classification
Park et al. Intensity classification background model based on the tracing scheme for deep learning based CCTV pedestrian detection
CN112052881B (en) Hyperspectral image classification model device based on multi-scale near-end feature splicing
CN116778214A (en) Behavior detection method, device, equipment and storage medium thereof
CN115393802A (en) Railway scene unusual invasion target identification method based on small sample learning
CN114359578A (en) Application method and system of pest and disease damage identification intelligent terminal
CN113361475A (en) Multi-spectral pedestrian detection method based on multi-stage feature fusion information multiplexing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination