CN108573232B - Human body action recognition method based on convolutional neural network - Google Patents

Human body action recognition method based on convolutional neural network Download PDF

Info

Publication number
CN108573232B
CN108573232B CN201810345479.0A CN201810345479A CN108573232B CN 108573232 B CN108573232 B CN 108573232B CN 201810345479 A CN201810345479 A CN 201810345479A CN 108573232 B CN108573232 B CN 108573232B
Authority
CN
China
Prior art keywords
convolutional neural
neural network
dimensional
groups
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810345479.0A
Other languages
Chinese (zh)
Other versions
CN108573232A (en
Inventor
张良
李玉鹏
刘婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation University of China
Original Assignee
Civil Aviation University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation University of China filed Critical Civil Aviation University of China
Priority to CN201810345479.0A priority Critical patent/CN108573232B/en
Publication of CN108573232A publication Critical patent/CN108573232A/en
Application granted granted Critical
Publication of CN108573232B publication Critical patent/CN108573232B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

A human body action recognition method based on a convolutional neural network. Selecting partial depth images in a data set as training samples, using the rest depth images as test samples, and mapping four-dimensional information of the depth images in the data set to a two-dimensional space by adopting a space structure dynamic depth image technology to obtain a two-dimensional image; constructing a convolutional neural network; training the convolutional neural network by using the two-dimensional image in the training sample; and inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, performing inter-group fusion, and finally completing the steps of human body action identification and the like. The method can be used as the basis of pattern recognition and artificial intelligence and has important significance on human action recognition.

Description

Human body action recognition method based on convolutional neural network
Technical Field
The invention belongs to the technical field of human body action recognition, and particularly relates to a human body action recognition method based on a convolutional neural network.
Background
At present, human body action recognition is widely applied to the aspects of intelligent monitoring, human-computer interaction, video retrieval, virtual reality and the like, so that the human body action recognition is always an active research direction in the field of computer vision. In previous research, many research methods for human motion recognition have focused on conventional RGB color video. In recent years, the release of microsoft Kinect brings new opportunities to the field, the Kinect device can acquire depth images in real time, and compared with traditional color images, the depth images have many advantages, for example, a depth image sequence is a four-dimensional space in nature, can contain more abundant motion information, is insensitive to the change of illumination conditions, and can estimate the human body contour and bones more reliably. At present, research work of human motion recognition based on depth images mainly focuses on seeking to map four-dimensional information of motion to a two-dimensional space by designing some effective feature representation mode, and then, important features of the motion are expected to be represented in the two-dimensional space as much as possible, so that the accuracy of motion recognition is improved. However, after the depth information of the motion is mapped to the two-dimensional space representation, the motion is easy to be confused in the classification process, so that the upper limit of the recognition rate of the method is limited.
Disclosure of Invention
In order to overcome the problems, the invention aims to provide a human body action recognition method based on a convolutional neural network, which can realize the recognition of human body actions, can perform more accurate action classification, and has the characteristics of high action recognition rate and strong robustness.
In order to achieve the above object, the method for recognizing human body actions based on convolutional neural network provided by the present invention comprises the following steps performed in sequence:
(1) selecting partial depth images in the data set as training samples, using the rest depth images as test samples, and then mapping four-dimensional information of the depth images in the data set to a two-dimensional space by adopting a space structure dynamic depth image technology to obtain a two-dimensional image;
(2) constructing a convolutional neural network;
(3) training the convolutional neural network by using the two-dimensional image in the training sample;
(4) and inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, performing inter-group fusion, and finally completing human body action recognition.
In step (1), the method for mapping the four-dimensional information of the depth image in the data set to the two-dimensional space by using the spatial structure dynamic depth image technology to obtain the two-dimensional image includes: each depth image in the data set is converted into 6 different two-dimensional images by adopting a space structure dynamic depth image technology, the 6 two-dimensional images are divided into 3 groups, the groups are respectively a trunk, four limbs and a joint, and each group consists of two-dimensional images which are respectively DDIF and DDIB.
In step (2), the method for constructing the convolutional neural network is as follows: the network has 12 layers, and the 12 layers are a convolutional layer conv1, a pooling layer pool1, a convolutional layer conv2, a pooling layer pool2, a convolutional layer conv3, a convolutional layer conv4, a convolutional layer conv5, a pooling layer pool5, a full connection layer fc6, a full connection layer fc7, a full connection layer fc8 and a classification layer in sequence, wherein the classification layer adopts a combination of a cross entropy loss function and a center loss function as a joint cost function to increase the distance constraint between a motion sample feature space and a class center.
In step (3), the method for training the convolutional neural network by using the two-dimensional image in the training sample is as follows: and respectively inputting 3 groups of two-dimensional images obtained after the conversion of the depth images in the training samples into 3 convolutional neural networks, and respectively training the 3 convolutional neural networks by using the two-dimensional images.
In step (4), the method of inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, then performing inter-group fusion, and finally completing human body motion recognition includes: respectively inputting 3 groups of 6 two-dimensional images obtained after conversion of each depth image in a test sample into 3 groups of corresponding trained convolutional neural networks, obtaining 2 corresponding vectors for each group, and averaging the dimensionalities corresponding to the 2 vectors of each group to obtain intra-group fusion; and then averaging the dimensionalities corresponding to the 3 vectors in the group, namely fusing the dimensionalities between the groups to serve as a final vector, wherein the dimensionality serial number corresponding to the maximum numerical value in the final vector is the human motion category to be identified.
Compared with the prior art, the human body action recognition method based on the convolutional neural network has the beneficial effects that:
according to the invention, the distance constraint between the characteristic space of the action sample and the class center is increased in the network training process, so that the intra-class aggregation and inter-class separation of the action characteristics are considered, the convolutional neural network is guided to learn more distinguishing characteristics, and the subsequent classification is more accurate. The experimental results of a plurality of data sets show that the accuracy and robustness of human body action recognition are obviously improved by using the method disclosed by the invention. The method can be used as the basis of pattern recognition and artificial intelligence and has important significance on human action recognition.
Drawings
Fig. 1 is a flowchart of a human body motion recognition method based on a convolutional neural network provided by the present invention.
Fig. 2 is a schematic diagram of a convolutional neural network structure.
Detailed Description
The present invention will be described in further detail with reference to examples.
As shown in fig. 1, the method for recognizing human body actions based on convolutional neural network provided by the present invention comprises the following steps performed in sequence:
(1) selecting partial depth images in the data set as training samples, using the rest depth images as test samples, then adopting a space structure dynamic depth image technology to map four-dimensional information of the depth images in the data set to a two-dimensional space to obtain two-dimensional images for subsequent classification, and converting a human body action recognition problem into an image classification problem;
each depth image can be converted into 6 different two-dimensional images using SSDDI technology, the 6 two-dimensional images are divided into 3 groups, the groups being the torso, limbs and joints, respectively, and each group is composed of two-dimensional images, DDIF and DDIB, respectively, as shown in fig. 1.
In particular, SSDDI techniques rely on a sequential Pooling approach (RP), which is further briefly described below, with the depth image of k frames being denoted X ═ X1,x2,...,xt,...,xk>,
Figure BDA0001631784770000041
Representing the data from each frame xtFeature vector, feature vector mapped from
Figure BDA0001631784770000042
Average of previous t frames
Figure BDA0001631784770000043
Meaning that r is assigned to an arbitrary timing tt=wT·VtThe score is expressed such that the score is larger as the time sequence is later, and therefore, the score function r is usedtAnd (3) satisfying the constraint:
Figure BDA0001631784770000044
the purpose of the sequential pooling procedure is to find the parameter w that satisfies the objective function (1)*
Figure BDA0001631784770000045
s.t. wT·(Vi-Vj)≥1-εijij≥0 (1)
εijIs a small non-negative value, the parameter w*Can be characterized by vtFirst but vt+1The information of the image sequence before the process can be used as a feature descriptor of a Depth image, which is a two-dimensional image with the same scale as the input image and contains information of space-time change in the Forward process of the whole action, so that the information is called a Forward Dynamic Depth map (DDIF), and a Backward Dynamic Depth map (DDIB) can be obtained by performing RP operation after the time sequence of the Depth image is processed in a reverse order.
The SSDDI performs RP operations on local regions of the depth image from 3 levels of the trunk, limbs and joints of the depth image, and finally recombines the dynamic depth maps of the levels.
(2) Constructing a convolutional neural network;
as shown in fig. 2, the convolutional neural network has 12 layers, and these 12 layers are convolutional layer conv1, pooling layer pool1, convolutional layer conv2, pooling layer pool2, convolutional layer conv3, convolutional layer conv4, convolutional layer conv5, pooling layer pool5, full connection layer fc6, full connection layer fc7, full connection layer fc8 and classification layer in sequence.
In order to guide the convolutional neural network to learn characteristics with higher resolution and enable subsequent classification to be more accurate, the classification layer adopts a combined cross entropy loss function and a central loss function as a joint cost function, so that distance constraint between a motion sample characteristic space and a class center can be increased in the network training process, and intra-class aggregation and inter-class separation of motion characteristics are considered. Wherein the cross entropy loss function is expressed as:
Figure BDA0001631784770000051
wherein xi∈RdIndicates belonging to yiThe ith depth feature of the class, d is the dimension of the depth feature, W ∈ Rd×nIs the weight matrix of the last full connection layer fc8, b ∈ RnIs an offset term, Wj∈RdRepresenting the jth column of the weight matrix W, wherein m and n are the number of each batch of training samples and the corresponding category number respectively; the center loss function is expressed as:
Figure BDA0001631784770000052
Figure BDA0001631784770000053
Figure BDA0001631784770000054
in the formula (3)
Figure BDA0001631784770000055
Can be calculated by the formula (4) and the formula (5) and represents yiClass depth feature xiClass center of (1), which is characteristic of x with depthiIs updated, the parameter alpha is in the range of 0,1]By adjusting this parameter, the neural network can be further optimized, and thus it is a hyper-parameter;
the joint cost function is then expressed as:
Figure BDA0001631784770000061
λ is a hyper-parameter introduced to control the ratio of the two loss functions, and when λ is 0, the joint cost function will degrade into a cross-entropy loss function.
(3) Training the convolutional neural network by using the two-dimensional image in the training sample;
and respectively inputting 3 groups of two-dimensional images obtained after the conversion of the depth images in the training samples into 3 convolutional neural networks, and respectively training the 3 convolutional neural networks by using the two-dimensional images.
(4) And inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, performing inter-group fusion, and finally completing human body action recognition.
The central loss function in the classification layer does not work when the test is performed, i.e. when the hyper-parameter λ is 0. And respectively inputting 3 groups of 6 two-dimensional images obtained after each depth image in the test sample is converted into the trained convolutional neural networks corresponding to the 3 groups, so that each group obtains 2 corresponding vectors, and averaging the dimensionalities corresponding to the 2 vectors of each group to obtain the fusion in the group. Specifically, after two corresponding two-dimensional images DDIF and DDIB in the torso group test sample are respectively input to the convolutional neural network, 2 corresponding results are output, which are in the form of multidimensional vectors, and the two vectors need to be subjected to mean value fusion to be used as output vectors of the group, that is, intra-group fusion, and the limbs and joints are treated in the same way. And then averaging the dimensionalities corresponding to the 3 vectors in the group, namely fusing the dimensionalities between the groups to serve as a final vector, wherein the dimensionality serial number corresponding to the maximum numerical value in the final vector is the human motion category to be identified.

Claims (4)

1. A human body action recognition method based on a convolutional neural network is characterized in that: the human body action recognition method based on the convolutional neural network comprises the following steps of sequentially carrying out:
(1) selecting partial depth images in the data set as training samples, using the rest depth images as test samples, and then mapping four-dimensional information of the depth images in the data set to a two-dimensional space by adopting a space structure dynamic depth image technology to obtain a two-dimensional image;
(2) constructing a convolutional neural network;
(3) training the convolutional neural network by using the two-dimensional image in the training sample;
(4) inputting two-dimensional images in a test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, performing inter-group fusion, and finally completing human body action recognition;
in step (2), the method for constructing the convolutional neural network is as follows: the network has 12 layers, and the 12 layers are a convolutional layer conv1, a pooling layer pool1, a convolutional layer conv2, a pooling layer pool2, a convolutional layer conv3, a convolutional layer conv4, a convolutional layer conv5, a pooling layer pool5, a full connection layer fc6, a full connection layer fc7, a full connection layer fc8 and a classification layer in sequence, wherein the classification layer adopts a combination of a cross entropy loss function and a center loss function as a joint cost function to increase the distance constraint between a motion sample feature space and a class center.
2. The convolutional neural network-based human motion recognition method of claim 1, wherein: in step (1), the method for mapping the four-dimensional information of the depth image in the data set to the two-dimensional space by using the spatial structure dynamic depth image technology to obtain the two-dimensional image includes: each depth image in the data set is converted into 6 different two-dimensional images by adopting a space structure dynamic depth image technology, the 6 two-dimensional images are divided into 3 groups, the groups are respectively a trunk, four limbs and a joint, and each group consists of two-dimensional images which are respectively DDIF and DDIB.
3. The convolutional neural network-based human motion recognition method of claim 1, wherein: in step (3), the method for training the convolutional neural network by using the two-dimensional image in the training sample is as follows: and respectively inputting 3 groups of two-dimensional images obtained after the conversion of the depth images in the training samples into 3 convolutional neural networks, and respectively training the 3 convolutional neural networks by using the two-dimensional images.
4. The convolutional neural network-based human motion recognition method of claim 1, wherein: in step (4), the method of inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, then performing inter-group fusion, and finally completing human body motion recognition includes: respectively inputting 3 groups of 6 two-dimensional images obtained after conversion of each depth image in a test sample into a trained convolutional neural network corresponding to the 3 groups, obtaining 2 corresponding vectors for each group, and averaging the dimensionalities corresponding to the 2 vectors of each group to obtain intra-group fusion; and then averaging the dimensionalities corresponding to the 3 vectors in the group, namely fusing the dimensionalities between the groups to serve as a final vector, wherein the dimensionality serial number corresponding to the maximum numerical value in the final vector is the human motion category to be identified.
CN201810345479.0A 2018-04-17 2018-04-17 Human body action recognition method based on convolutional neural network Expired - Fee Related CN108573232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810345479.0A CN108573232B (en) 2018-04-17 2018-04-17 Human body action recognition method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810345479.0A CN108573232B (en) 2018-04-17 2018-04-17 Human body action recognition method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN108573232A CN108573232A (en) 2018-09-25
CN108573232B true CN108573232B (en) 2021-07-23

Family

ID=63574962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810345479.0A Expired - Fee Related CN108573232B (en) 2018-04-17 2018-04-17 Human body action recognition method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN108573232B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871750B (en) * 2019-01-02 2023-08-18 东南大学 Gait recognition method based on skeleton diagram sequence abnormal joint repair
CN109801636A (en) * 2019-01-29 2019-05-24 北京猎户星空科技有限公司 Training method, device, electronic equipment and the storage medium of Application on Voiceprint Recognition model
CN110633630B (en) * 2019-08-05 2022-02-01 中国科学院深圳先进技术研究院 Behavior identification method and device and terminal equipment
CN111144348A (en) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 Image processing method, image processing device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103108199A (en) * 2011-11-09 2013-05-15 宏碁股份有限公司 Dynamic depth-of-field adjusting device and method thereof
CN107832700A (en) * 2017-11-03 2018-03-23 全悉科技(北京)有限公司 A kind of face identification method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201320716A (en) * 2011-11-01 2013-05-16 Acer Inc Dynamic depth adjusting apparatus and method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103108199A (en) * 2011-11-09 2013-05-15 宏碁股份有限公司 Dynamic depth-of-field adjusting device and method thereof
CN107832700A (en) * 2017-11-03 2018-03-23 全悉科技(北京)有限公司 A kind of face identification method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Action Recognition with Dynamic Image Networks;Hakan Bilen等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20171102;第40卷(第12期);全文 *
Rank Pooling for Action Recognition;Basura Fernando等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20160425;第39卷(第4期);全文 *
Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition;Yonghong Hou等;《IEEE Access》;20171111;全文 *
Structured Images for RGB-D Action Recognition;Pichao Wang等;《2017 IEEE International Conference on Computer Vision Workshops (ICCVW)》;20180123;全文 *

Also Published As

Publication number Publication date
CN108573232A (en) 2018-09-25

Similar Documents

Publication Publication Date Title
CN108573232B (en) Human body action recognition method based on convolutional neural network
CN108830296B (en) Improved high-resolution remote sensing image classification method based on deep learning
CN108460356B (en) Face image automatic processing system based on monitoring system
Zhou et al. Cad: Scale invariant framework for real-time object detection
CN108734208B (en) Multi-source heterogeneous data fusion system based on multi-mode deep migration learning mechanism
CN107609638B (en) method for optimizing convolutional neural network based on linear encoder and interpolation sampling
CN108509920B (en) CNN-based face recognition method for multi-patch multi-channel joint feature selection learning
CN110135277B (en) Human behavior recognition method based on convolutional neural network
CN109508686B (en) Human behavior recognition method based on hierarchical feature subspace learning
CN112036260B (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN115100709B (en) Feature separation image face recognition and age estimation method
CN112070010B (en) Pedestrian re-recognition method for enhancing local feature learning by combining multiple-loss dynamic training strategies
CN113743544A (en) Cross-modal neural network construction method, pedestrian retrieval method and system
CN108573241B (en) Video behavior identification method based on fusion features
CN113255602A (en) Dynamic gesture recognition method based on multi-modal data
CN113343974A (en) Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement
CN114329031A (en) Fine-grained bird image retrieval method based on graph neural network and deep hash
CN113505719B (en) Gait recognition model compression system and method based on local-integral combined knowledge distillation algorithm
CN111967326B (en) Gait recognition method based on lightweight multi-scale feature extraction
Jiang et al. Cross-level reinforced attention network for person re-identification
CN113095201A (en) AU degree estimation model establishment method based on self-attention and uncertainty weighted multi-task learning among different regions of human face
CN103793720B (en) A kind of eye locating method and system
CN113014923B (en) Behavior identification method based on compressed domain representation motion vector
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
CN114694174A (en) Human body interaction behavior identification method based on space-time diagram convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210723