CN108573232B - Human body action recognition method based on convolutional neural network - Google Patents
Human body action recognition method based on convolutional neural network Download PDFInfo
- Publication number
- CN108573232B CN108573232B CN201810345479.0A CN201810345479A CN108573232B CN 108573232 B CN108573232 B CN 108573232B CN 201810345479 A CN201810345479 A CN 201810345479A CN 108573232 B CN108573232 B CN 108573232B
- Authority
- CN
- China
- Prior art keywords
- convolutional neural
- neural network
- dimensional
- groups
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
A human body action recognition method based on a convolutional neural network. Selecting partial depth images in a data set as training samples, using the rest depth images as test samples, and mapping four-dimensional information of the depth images in the data set to a two-dimensional space by adopting a space structure dynamic depth image technology to obtain a two-dimensional image; constructing a convolutional neural network; training the convolutional neural network by using the two-dimensional image in the training sample; and inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, performing inter-group fusion, and finally completing the steps of human body action identification and the like. The method can be used as the basis of pattern recognition and artificial intelligence and has important significance on human action recognition.
Description
Technical Field
The invention belongs to the technical field of human body action recognition, and particularly relates to a human body action recognition method based on a convolutional neural network.
Background
At present, human body action recognition is widely applied to the aspects of intelligent monitoring, human-computer interaction, video retrieval, virtual reality and the like, so that the human body action recognition is always an active research direction in the field of computer vision. In previous research, many research methods for human motion recognition have focused on conventional RGB color video. In recent years, the release of microsoft Kinect brings new opportunities to the field, the Kinect device can acquire depth images in real time, and compared with traditional color images, the depth images have many advantages, for example, a depth image sequence is a four-dimensional space in nature, can contain more abundant motion information, is insensitive to the change of illumination conditions, and can estimate the human body contour and bones more reliably. At present, research work of human motion recognition based on depth images mainly focuses on seeking to map four-dimensional information of motion to a two-dimensional space by designing some effective feature representation mode, and then, important features of the motion are expected to be represented in the two-dimensional space as much as possible, so that the accuracy of motion recognition is improved. However, after the depth information of the motion is mapped to the two-dimensional space representation, the motion is easy to be confused in the classification process, so that the upper limit of the recognition rate of the method is limited.
Disclosure of Invention
In order to overcome the problems, the invention aims to provide a human body action recognition method based on a convolutional neural network, which can realize the recognition of human body actions, can perform more accurate action classification, and has the characteristics of high action recognition rate and strong robustness.
In order to achieve the above object, the method for recognizing human body actions based on convolutional neural network provided by the present invention comprises the following steps performed in sequence:
(1) selecting partial depth images in the data set as training samples, using the rest depth images as test samples, and then mapping four-dimensional information of the depth images in the data set to a two-dimensional space by adopting a space structure dynamic depth image technology to obtain a two-dimensional image;
(2) constructing a convolutional neural network;
(3) training the convolutional neural network by using the two-dimensional image in the training sample;
(4) and inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, performing inter-group fusion, and finally completing human body action recognition.
In step (1), the method for mapping the four-dimensional information of the depth image in the data set to the two-dimensional space by using the spatial structure dynamic depth image technology to obtain the two-dimensional image includes: each depth image in the data set is converted into 6 different two-dimensional images by adopting a space structure dynamic depth image technology, the 6 two-dimensional images are divided into 3 groups, the groups are respectively a trunk, four limbs and a joint, and each group consists of two-dimensional images which are respectively DDIF and DDIB.
In step (2), the method for constructing the convolutional neural network is as follows: the network has 12 layers, and the 12 layers are a convolutional layer conv1, a pooling layer pool1, a convolutional layer conv2, a pooling layer pool2, a convolutional layer conv3, a convolutional layer conv4, a convolutional layer conv5, a pooling layer pool5, a full connection layer fc6, a full connection layer fc7, a full connection layer fc8 and a classification layer in sequence, wherein the classification layer adopts a combination of a cross entropy loss function and a center loss function as a joint cost function to increase the distance constraint between a motion sample feature space and a class center.
In step (3), the method for training the convolutional neural network by using the two-dimensional image in the training sample is as follows: and respectively inputting 3 groups of two-dimensional images obtained after the conversion of the depth images in the training samples into 3 convolutional neural networks, and respectively training the 3 convolutional neural networks by using the two-dimensional images.
In step (4), the method of inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, then performing inter-group fusion, and finally completing human body motion recognition includes: respectively inputting 3 groups of 6 two-dimensional images obtained after conversion of each depth image in a test sample into 3 groups of corresponding trained convolutional neural networks, obtaining 2 corresponding vectors for each group, and averaging the dimensionalities corresponding to the 2 vectors of each group to obtain intra-group fusion; and then averaging the dimensionalities corresponding to the 3 vectors in the group, namely fusing the dimensionalities between the groups to serve as a final vector, wherein the dimensionality serial number corresponding to the maximum numerical value in the final vector is the human motion category to be identified.
Compared with the prior art, the human body action recognition method based on the convolutional neural network has the beneficial effects that:
according to the invention, the distance constraint between the characteristic space of the action sample and the class center is increased in the network training process, so that the intra-class aggregation and inter-class separation of the action characteristics are considered, the convolutional neural network is guided to learn more distinguishing characteristics, and the subsequent classification is more accurate. The experimental results of a plurality of data sets show that the accuracy and robustness of human body action recognition are obviously improved by using the method disclosed by the invention. The method can be used as the basis of pattern recognition and artificial intelligence and has important significance on human action recognition.
Drawings
Fig. 1 is a flowchart of a human body motion recognition method based on a convolutional neural network provided by the present invention.
Fig. 2 is a schematic diagram of a convolutional neural network structure.
Detailed Description
The present invention will be described in further detail with reference to examples.
As shown in fig. 1, the method for recognizing human body actions based on convolutional neural network provided by the present invention comprises the following steps performed in sequence:
(1) selecting partial depth images in the data set as training samples, using the rest depth images as test samples, then adopting a space structure dynamic depth image technology to map four-dimensional information of the depth images in the data set to a two-dimensional space to obtain two-dimensional images for subsequent classification, and converting a human body action recognition problem into an image classification problem;
each depth image can be converted into 6 different two-dimensional images using SSDDI technology, the 6 two-dimensional images are divided into 3 groups, the groups being the torso, limbs and joints, respectively, and each group is composed of two-dimensional images, DDIF and DDIB, respectively, as shown in fig. 1.
In particular, SSDDI techniques rely on a sequential Pooling approach (RP), which is further briefly described below, with the depth image of k frames being denoted X ═ X1,x2,...,xt,...,xk>,Representing the data from each frame xtFeature vector, feature vector mapped fromAverage of previous t framesMeaning that r is assigned to an arbitrary timing tt=wT·VtThe score is expressed such that the score is larger as the time sequence is later, and therefore, the score function r is usedtAnd (3) satisfying the constraint:the purpose of the sequential pooling procedure is to find the parameter w that satisfies the objective function (1)*,
s.t. wT·(Vi-Vj)≥1-εij,εij≥0 (1)
εijIs a small non-negative value, the parameter w*Can be characterized by vtFirst but vt+1The information of the image sequence before the process can be used as a feature descriptor of a Depth image, which is a two-dimensional image with the same scale as the input image and contains information of space-time change in the Forward process of the whole action, so that the information is called a Forward Dynamic Depth map (DDIF), and a Backward Dynamic Depth map (DDIB) can be obtained by performing RP operation after the time sequence of the Depth image is processed in a reverse order.
The SSDDI performs RP operations on local regions of the depth image from 3 levels of the trunk, limbs and joints of the depth image, and finally recombines the dynamic depth maps of the levels.
(2) Constructing a convolutional neural network;
as shown in fig. 2, the convolutional neural network has 12 layers, and these 12 layers are convolutional layer conv1, pooling layer pool1, convolutional layer conv2, pooling layer pool2, convolutional layer conv3, convolutional layer conv4, convolutional layer conv5, pooling layer pool5, full connection layer fc6, full connection layer fc7, full connection layer fc8 and classification layer in sequence.
In order to guide the convolutional neural network to learn characteristics with higher resolution and enable subsequent classification to be more accurate, the classification layer adopts a combined cross entropy loss function and a central loss function as a joint cost function, so that distance constraint between a motion sample characteristic space and a class center can be increased in the network training process, and intra-class aggregation and inter-class separation of motion characteristics are considered. Wherein the cross entropy loss function is expressed as:
wherein xi∈RdIndicates belonging to yiThe ith depth feature of the class, d is the dimension of the depth feature, W ∈ Rd×nIs the weight matrix of the last full connection layer fc8, b ∈ RnIs an offset term, Wj∈RdRepresenting the jth column of the weight matrix W, wherein m and n are the number of each batch of training samples and the corresponding category number respectively; the center loss function is expressed as:
in the formula (3)Can be calculated by the formula (4) and the formula (5) and represents yiClass depth feature xiClass center of (1), which is characteristic of x with depthiIs updated, the parameter alpha is in the range of 0,1]By adjusting this parameter, the neural network can be further optimized, and thus it is a hyper-parameter;
the joint cost function is then expressed as:
λ is a hyper-parameter introduced to control the ratio of the two loss functions, and when λ is 0, the joint cost function will degrade into a cross-entropy loss function.
(3) Training the convolutional neural network by using the two-dimensional image in the training sample;
and respectively inputting 3 groups of two-dimensional images obtained after the conversion of the depth images in the training samples into 3 convolutional neural networks, and respectively training the 3 convolutional neural networks by using the two-dimensional images.
(4) And inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, performing inter-group fusion, and finally completing human body action recognition.
The central loss function in the classification layer does not work when the test is performed, i.e. when the hyper-parameter λ is 0. And respectively inputting 3 groups of 6 two-dimensional images obtained after each depth image in the test sample is converted into the trained convolutional neural networks corresponding to the 3 groups, so that each group obtains 2 corresponding vectors, and averaging the dimensionalities corresponding to the 2 vectors of each group to obtain the fusion in the group. Specifically, after two corresponding two-dimensional images DDIF and DDIB in the torso group test sample are respectively input to the convolutional neural network, 2 corresponding results are output, which are in the form of multidimensional vectors, and the two vectors need to be subjected to mean value fusion to be used as output vectors of the group, that is, intra-group fusion, and the limbs and joints are treated in the same way. And then averaging the dimensionalities corresponding to the 3 vectors in the group, namely fusing the dimensionalities between the groups to serve as a final vector, wherein the dimensionality serial number corresponding to the maximum numerical value in the final vector is the human motion category to be identified.
Claims (4)
1. A human body action recognition method based on a convolutional neural network is characterized in that: the human body action recognition method based on the convolutional neural network comprises the following steps of sequentially carrying out:
(1) selecting partial depth images in the data set as training samples, using the rest depth images as test samples, and then mapping four-dimensional information of the depth images in the data set to a two-dimensional space by adopting a space structure dynamic depth image technology to obtain a two-dimensional image;
(2) constructing a convolutional neural network;
(3) training the convolutional neural network by using the two-dimensional image in the training sample;
(4) inputting two-dimensional images in a test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, performing inter-group fusion, and finally completing human body action recognition;
in step (2), the method for constructing the convolutional neural network is as follows: the network has 12 layers, and the 12 layers are a convolutional layer conv1, a pooling layer pool1, a convolutional layer conv2, a pooling layer pool2, a convolutional layer conv3, a convolutional layer conv4, a convolutional layer conv5, a pooling layer pool5, a full connection layer fc6, a full connection layer fc7, a full connection layer fc8 and a classification layer in sequence, wherein the classification layer adopts a combination of a cross entropy loss function and a center loss function as a joint cost function to increase the distance constraint between a motion sample feature space and a class center.
2. The convolutional neural network-based human motion recognition method of claim 1, wherein: in step (1), the method for mapping the four-dimensional information of the depth image in the data set to the two-dimensional space by using the spatial structure dynamic depth image technology to obtain the two-dimensional image includes: each depth image in the data set is converted into 6 different two-dimensional images by adopting a space structure dynamic depth image technology, the 6 two-dimensional images are divided into 3 groups, the groups are respectively a trunk, four limbs and a joint, and each group consists of two-dimensional images which are respectively DDIF and DDIB.
3. The convolutional neural network-based human motion recognition method of claim 1, wherein: in step (3), the method for training the convolutional neural network by using the two-dimensional image in the training sample is as follows: and respectively inputting 3 groups of two-dimensional images obtained after the conversion of the depth images in the training samples into 3 convolutional neural networks, and respectively training the 3 convolutional neural networks by using the two-dimensional images.
4. The convolutional neural network-based human motion recognition method of claim 1, wherein: in step (4), the method of inputting the two-dimensional image in the test sample into the trained convolutional neural network to obtain three groups of output vectors, then performing intra-group fusion, then performing inter-group fusion, and finally completing human body motion recognition includes: respectively inputting 3 groups of 6 two-dimensional images obtained after conversion of each depth image in a test sample into a trained convolutional neural network corresponding to the 3 groups, obtaining 2 corresponding vectors for each group, and averaging the dimensionalities corresponding to the 2 vectors of each group to obtain intra-group fusion; and then averaging the dimensionalities corresponding to the 3 vectors in the group, namely fusing the dimensionalities between the groups to serve as a final vector, wherein the dimensionality serial number corresponding to the maximum numerical value in the final vector is the human motion category to be identified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810345479.0A CN108573232B (en) | 2018-04-17 | 2018-04-17 | Human body action recognition method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810345479.0A CN108573232B (en) | 2018-04-17 | 2018-04-17 | Human body action recognition method based on convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108573232A CN108573232A (en) | 2018-09-25 |
CN108573232B true CN108573232B (en) | 2021-07-23 |
Family
ID=63574962
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810345479.0A Expired - Fee Related CN108573232B (en) | 2018-04-17 | 2018-04-17 | Human body action recognition method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108573232B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109871750B (en) * | 2019-01-02 | 2023-08-18 | 东南大学 | Gait recognition method based on skeleton diagram sequence abnormal joint repair |
CN109801636A (en) * | 2019-01-29 | 2019-05-24 | 北京猎户星空科技有限公司 | Training method, device, electronic equipment and the storage medium of Application on Voiceprint Recognition model |
CN110633630B (en) * | 2019-08-05 | 2022-02-01 | 中国科学院深圳先进技术研究院 | Behavior identification method and device and terminal equipment |
CN111144348A (en) * | 2019-12-30 | 2020-05-12 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103108199A (en) * | 2011-11-09 | 2013-05-15 | 宏碁股份有限公司 | Dynamic depth-of-field adjusting device and method thereof |
CN107832700A (en) * | 2017-11-03 | 2018-03-23 | 全悉科技(北京)有限公司 | A kind of face identification method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201320716A (en) * | 2011-11-01 | 2013-05-16 | Acer Inc | Dynamic depth adjusting apparatus and method thereof |
-
2018
- 2018-04-17 CN CN201810345479.0A patent/CN108573232B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103108199A (en) * | 2011-11-09 | 2013-05-15 | 宏碁股份有限公司 | Dynamic depth-of-field adjusting device and method thereof |
CN107832700A (en) * | 2017-11-03 | 2018-03-23 | 全悉科技(北京)有限公司 | A kind of face identification method and system |
Non-Patent Citations (4)
Title |
---|
Action Recognition with Dynamic Image Networks;Hakan Bilen等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20171102;第40卷(第12期);全文 * |
Rank Pooling for Action Recognition;Basura Fernando等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20160425;第39卷(第4期);全文 * |
Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition;Yonghong Hou等;《IEEE Access》;20171111;全文 * |
Structured Images for RGB-D Action Recognition;Pichao Wang等;《2017 IEEE International Conference on Computer Vision Workshops (ICCVW)》;20180123;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108573232A (en) | 2018-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108573232B (en) | Human body action recognition method based on convolutional neural network | |
CN108830296B (en) | Improved high-resolution remote sensing image classification method based on deep learning | |
CN108460356B (en) | Face image automatic processing system based on monitoring system | |
Zhou et al. | Cad: Scale invariant framework for real-time object detection | |
CN108734208B (en) | Multi-source heterogeneous data fusion system based on multi-mode deep migration learning mechanism | |
CN107609638B (en) | method for optimizing convolutional neural network based on linear encoder and interpolation sampling | |
CN108509920B (en) | CNN-based face recognition method for multi-patch multi-channel joint feature selection learning | |
CN110135277B (en) | Human behavior recognition method based on convolutional neural network | |
CN109508686B (en) | Human behavior recognition method based on hierarchical feature subspace learning | |
CN112036260B (en) | Expression recognition method and system for multi-scale sub-block aggregation in natural environment | |
CN115100709B (en) | Feature separation image face recognition and age estimation method | |
CN112070010B (en) | Pedestrian re-recognition method for enhancing local feature learning by combining multiple-loss dynamic training strategies | |
CN113743544A (en) | Cross-modal neural network construction method, pedestrian retrieval method and system | |
CN108573241B (en) | Video behavior identification method based on fusion features | |
CN113255602A (en) | Dynamic gesture recognition method based on multi-modal data | |
CN113343974A (en) | Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement | |
CN114329031A (en) | Fine-grained bird image retrieval method based on graph neural network and deep hash | |
CN113505719B (en) | Gait recognition model compression system and method based on local-integral combined knowledge distillation algorithm | |
CN111967326B (en) | Gait recognition method based on lightweight multi-scale feature extraction | |
Jiang et al. | Cross-level reinforced attention network for person re-identification | |
CN113095201A (en) | AU degree estimation model establishment method based on self-attention and uncertainty weighted multi-task learning among different regions of human face | |
CN103793720B (en) | A kind of eye locating method and system | |
CN113014923B (en) | Behavior identification method based on compressed domain representation motion vector | |
CN115496859A (en) | Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning | |
CN114694174A (en) | Human body interaction behavior identification method based on space-time diagram convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210723 |