CN111860117A - Human behavior recognition method based on deep learning - Google Patents

Human behavior recognition method based on deep learning Download PDF

Info

Publication number
CN111860117A
CN111860117A CN202010494768.4A CN202010494768A CN111860117A CN 111860117 A CN111860117 A CN 111860117A CN 202010494768 A CN202010494768 A CN 202010494768A CN 111860117 A CN111860117 A CN 111860117A
Authority
CN
China
Prior art keywords
axis
neural network
angular velocity
deep learning
human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010494768.4A
Other languages
Chinese (zh)
Inventor
胡二琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Bigeng Software Co ltd
Original Assignee
Anhui Bigeng Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Bigeng Software Co ltd filed Critical Anhui Bigeng Software Co ltd
Priority to CN202010494768.4A priority Critical patent/CN111860117A/en
Publication of CN111860117A publication Critical patent/CN111860117A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a human behavior recognition method based on deep learning, and relates to the technical field of computer vision. The invention comprises the following steps: acquiring mass videos, uniformly sampling to obtain videos with fixed frame numbers, and using the videos as a training set; installing inertial sensors at human joints, and collecting human behavior data of each inertial sensor to serve as a test set; training the convolutional neural network by utilizing a training set and a test set; establishing a process for providing a background interface, providing an identification entry and prediction feedback. The invention trains the convolutional neural network by taking a mass of videos as a training set, obtains the joint angular velocity and the acceleration of human body behaviors in real time by utilizing the inertial sensor, obtains the most correct test set to train the convolutional neural network, and improves the accuracy of the convolutional neural network output.

Description

Human behavior recognition method based on deep learning
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a human behavior recognition method based on deep learning.
Background
With the rapid growth of the internet in recent years, networks have become the primary means by which people entertain and obtain information, and in the process, the internet has accumulated a large amount of video data. The data shows that the video duration uploaded only at youtube per minute is up to 35 hours long. How to handle these large amounts of video data is now a challenge. Therefore, computer vision has been brought forward, and human behavior recognition thereof has attracted extensive attention in academia and industry.
The human body action recognition of the video is a classic subject in the fields of computer vision and image processing due to the wide application of the human body action recognition in the fields of video monitoring, human-computer interface equipment and the like. However, many challenges still exist in motion recognition, such as efficient multi-range spatiotemporal feature extraction. Recently proposed spatio-temporal feature extractors are roughly classified into two categories: long-term and short-term characteristics.
The key to the extraction of video as a track is its short-term characteristics. The technique has the advantages of robustness and simplicity due to its local and repetitive short-term extraction function. Long-term signatures have more discriminative power than short-term signatures because they have the opposite properties of short-term signatures, i.e., long-term signatures and discriminative, while remaining sensitive to within-class variability.
More specifically, when the framework is capturing only short-term spatiotemporal information, it is difficult to distinguish between pre-crawl and breaststroke. Conversely, extracting short-term spatiotemporal features is more effective in identifying the actions of a dog walking. Therefore, a powerful action recognition system should be able to distinguish between different classes of actions through multiple contexts. Therefore, capturing information over multiple spatiotemporal ranges is very important and beneficial.
The method based on deep learning, such as TSN, I3D, obtains a good result in computer vision, and particularly remarkably improves the overall accuracy rate for a large-scale and complex-behavior data set. However, it is still a challenge to further improve the recognition rate for complex video data sets by capturing information in multiple spatio-temporal ranges.
Disclosure of Invention
The invention aims to provide a human behavior recognition method based on deep learning, which trains a convolutional neural network by taking massive videos as a training set, acquires the joint angular velocity and the acceleration of human behaviors in real time by using an inertial sensor, acquires the most correct test set to train the convolutional neural network, and solves the problems of low recognition rate and insufficient accuracy of the existing video data set.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a human behavior recognition method based on deep learning, which comprises the following steps:
step S1: acquiring mass videos, uniformly sampling to obtain videos with fixed frame numbers, and using the videos as a training set;
step S2: installing inertial sensors at human joints, and collecting human behavior data of each inertial sensor to serve as a test set;
Step S3: training the convolutional neural network by utilizing a training set and a test set;
step S4: establishing a process for providing a background interface, providing an identification entry and prediction feedback.
Preferably, in step S1, the specific steps of obtaining a video with a fixed frame number are as follows:
step S11: inputting the whole video file into a Gaussian mixture model for motion detection;
step S12: detecting a person in a motion region using a gradient histogram;
step S13: and carrying out softmax classifier classification on the people in the detected motion area.
Preferably, in step S11, the time sequence { x ] of any pixel point in the image is determined1,x2,...,xγAnd modeling through a Gaussian mixture model, wherein the probability of the pixel value of the current observation point is as follows:
Figure BDA0002522382540000031
where k is the number of Gaussian models, ωitIs the weight mu of the ith Gaussian model at the moment tit,∑itThe mean and variance of the ith Gaussian model at time t are shown, and eta is a Gaussian probability density function.
Preferably, in step S13, the human image in the gradient histogram detection motion region is mapped onto a corresponding tag through a softmax classifier, a classification result is obtained in the process of processing the gradient histogram, the corresponding tag data is compared to calculate a corresponding relative error, the relative error is continuously reduced by training a weight on an military aircraft window in the convolutional neural network for a certain number of times, and finally the region converges, and then the final result of the gradient histogram is input to the softmax classifier network for test classification.
Preferably, in step S2, each inertial sensor performs a sliding window segmentation process on each human behavior data to obtain three-axis acceleration and angular velocity of each observation window.
Preferably, the inertial sensor performs feature extraction on the acceleration and the angular velocity of three axes to obtain a feature vector of each sensor node; wherein the performing feature extraction on the obtained acceleration of the three axes and the angular velocity of the three axes includes: the characteristic extraction is to extract the mean value of acceleration data and angular velocity data on three axes of an x axis, a y axis and a z axis, the variance of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis, the kurtosis value of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis, the covariance of the acceleration data and the angular velocity between the x axis, the y axis and the z axis, and the energy characteristic set of an intrinsic mode function obtained by ensemble empirical mode decomposition of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis by adopting a time domain analysis method and a time-frequency characteristic analysis method in a signal theory.
Preferably, in the step S3, the convolutional neural network includes an embedding layer, an LSTM, a full connection layer, and a softmax layer.
Preferably, in the training or predicting process of the convolutional neural network, the transmission process of the signal is as follows:
inputting (x, y, z) signals of a training sample into an embedding layer, respectively converting x, y and z into corresponding m-dimensional vectors by the embedding layer, and splicing the m-dimensional vectors corresponding to the x, y and z into a 3 m-dimensional vector; and inputting the 3 m-dimensional vector into an LSTM neural network according to a time sequence, outputting the 3 m-L-dimensional expression vector of the track to a full connection layer by the LSTM neural network, and outputting a judgment result of whether the track is a human behavior through a softmax layer.
The invention has the following beneficial effects:
(1) the method trains the convolutional neural network by taking a mass of videos as a training set, obtains the joint angular velocity and the acceleration of human body behaviors in real time by using the inertial sensor, obtains the most correct test set to train the convolutional neural network, and improves the accuracy of the convolutional neural network output;
(2) according to the invention, the softmax classifier is arranged on the bottom layer of the convolutional neural network, so that the image of the map person is mapped to the corresponding label through the softmax classifier, the image is preliminarily classified into sitting posture, standing posture, walking posture and running posture before prediction, monitoring is directly carried out according to the label during prediction, and the detection efficiency is improved.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a step diagram of a human behavior recognition method based on deep learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention is a method for recognizing human behavior based on deep learning, including the following steps:
Step S1: acquiring mass videos, uniformly sampling to obtain videos with fixed frame numbers, and using the videos as a training set;
step S2: installing inertial sensors at human joints, and collecting human behavior data of each inertial sensor to serve as a test set;
step S3: training the convolutional neural network by utilizing a training set and a test set;
step S4: establishing a process for providing a background interface, providing an identification entry and prediction feedback.
In step S1, the specific steps of obtaining a video with a fixed frame number are as follows:
step S11: inputting the whole video file into a Gaussian mixture model for motion detection;
the Gaussian Mixture Model (GMM) is a classic adaptive background modeling method, supposing that unit pixels accord with normal distribution in a time domain, setting a threshold range to judge that the pixels are background and serving as a basis for updating the model, describing the background into a plurality of Gaussian distributions by the Gaussian mixture model, and taking the pixels which accord with one of the distribution models as background pixels.
Step S12: detecting a person in a motion region using a gradient histogram;
gradient Histogram (HOG) is an operator, which is described based on shape edge features, and is generally used for object detection, and the basic idea is to calculate pixel value gradients to express edge information of an object and extract features of local appearance and shape of an image by local gradient values.
Step S13: performing softmax classifier classification on the people in the detected motion area; in actual calculation, factors such as selection of gradient directions and parameter templates under different scales influence the final calculation result, and finally, the pedestrian target is detected through the softmax classifier.
In step S11, the time series { x ] of any pixel in the image is determined1,x2,...,xγIs modeled by a Gaussian mixture modelAnd the probability of the pixel value of the current observation point is as follows:
Figure BDA0002522382540000061
where k is the number of Gaussian models, ωitIs the weight mu of the ith Gaussian model at the moment tit,∑itThe mean and variance of the ith Gaussian model at time t are shown, and eta is a Gaussian probability density function.
In step S13, the human image in the gradient histogram detection motion region is mapped onto a corresponding label through a softmax classifier, a classification result is obtained in the gradient histogram processing process, the corresponding label data is compared to calculate a corresponding relative error, the relative error is continuously reduced by training the weight on the military aircraft window in the convolutional neural network for a certain number of times, and finally the region converges, and then the final result of the gradient histogram is input to the softmax classifier network for test classification.
In step S2, each inertial sensor performs sliding window segmentation processing on each human behavior data to obtain the three-axis acceleration and angular velocity of each observation window.
The inertial sensor performs feature extraction on the acceleration and the angular velocity of three axes to obtain a feature vector of each sensor node; wherein, carry out feature extraction to the acceleration of the triaxial that obtains and the angular velocity of triaxial, include: the characteristic extraction is to respectively extract the mean value of acceleration data and angular velocity data on three axes of an x axis, a y axis and a z axis, the variance of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis, the kurtosis value of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis, the covariance of the acceleration data and the angular velocity between the x axis, the y axis and the z axis, and the energy characteristic set of an intrinsic mode function obtained by ensemble empirical mode decomposition of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis by adopting a time domain analysis method and a time-frequency characteristic analysis method in a signal theory.
In step S3, the convolutional neural network includes an embedding layer, an LSTM, a full connection layer, and a softmax layer.
In the training or predicting process of the convolutional neural network, the transmission process of signals is as follows:
Inputting (x, y, z) signals of a training sample into an embedding layer, respectively converting x, y and z into corresponding m-dimensional vectors by the embedding layer, and splicing the m-dimensional vectors corresponding to x, y and z into a 3 m-dimensional vector; and inputting the 3 m-dimensional vector into an LSTM neural network according to a time sequence, outputting the 3 m-L-dimensional expression vector of the track to a full connection layer by the LSTM neural network, and outputting a judgment result of whether the track is a human behavior through a softmax layer.
It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
In addition, it is understood by those skilled in the art that all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (8)

1. A human behavior recognition method based on deep learning is characterized by comprising the following steps:
step S1: acquiring mass videos, uniformly sampling to obtain videos with fixed frame numbers, and using the videos as a training set;
step S2: installing inertial sensors at human joints, and collecting human behavior data of each inertial sensor to serve as a test set;
step S3: training the convolutional neural network by utilizing a training set and a test set;
step S4: establishing a process for providing a background interface, providing an identification entry and prediction feedback.
2. The method for recognizing human body behaviors based on deep learning of claim 1, wherein in the step S1, the specific steps of obtaining the video with fixed frame number are as follows:
step S11: inputting the whole video file into a Gaussian mixture model for motion detection;
step S12: detecting a person in a motion region using a gradient histogram;
step S13: and carrying out softmax classifier classification on the people in the detected motion area.
3. The method for human behavior recognition based on deep learning of claim 2, wherein in step S11, the time sequence { x ] of any pixel point in the image is processed 1,x2,...,xγAnd modeling through a Gaussian mixture model, wherein the probability of the pixel value of the current observation point is as follows:
Figure FDA0002522382530000011
where k is the number of Gaussian models, ωitIs the weight mu of the ith Gaussian model at the moment tit,∑itThe mean and variance of the ith Gaussian model at time t are shown, and eta is a Gaussian probability density function.
4. The method for human behavior recognition based on deep learning of claim 2, wherein in step S13, the human image in the gradient histogram detection motion region is mapped onto the corresponding label through a softmax classifier, a classification result is obtained during the gradient histogram processing, the corresponding label data is compared to calculate the corresponding relative error, the weight on the military aircraft window in the convolutional neural network is continuously modified by training for a certain number of times so that the relative error is continuously reduced, and finally the region converges, and then the final result of the gradient histogram is input to the softmax classifier network for test classification.
5. The method for recognizing human body behavior based on deep learning of claim 1, wherein in step S2, each inertial sensor performs sliding window segmentation processing on each human body behavior data to obtain three-axis acceleration and angular velocity of each observation window.
6. The human behavior recognition method based on deep learning of claim 1 or 5, wherein the inertial sensor performs feature extraction on the acceleration and angular velocity of three axes to obtain a feature vector of each sensor node; wherein the performing feature extraction on the obtained acceleration of the three axes and the angular velocity of the three axes includes: the characteristic extraction is to extract the mean value of acceleration data and angular velocity data on three axes of an x axis, a y axis and a z axis, the variance of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis, the kurtosis value of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis, the covariance of the acceleration data and the angular velocity between the x axis, the y axis and the z axis, and the energy characteristic set of an intrinsic mode function obtained by ensemble empirical mode decomposition of the acceleration data and the angular velocity data on the three axes of the x axis, the y axis and the z axis by adopting a time domain analysis method and a time-frequency characteristic analysis method in a signal theory.
7. The deep learning-based human behavior recognition method of claim 1, wherein in the step S3, the convolutional neural network comprises an embedding layer, an LSTM, a full connection layer and a softmax layer.
8. The human behavior recognition method based on deep learning as claimed in claim 1 or 7, wherein in the training or prediction process of the convolutional neural network, the transmission process of signals is as follows:
inputting (x, y, z) signals of a training sample into an embedding layer, respectively converting x, y and z into corresponding m-dimensional vectors by the embedding layer, and splicing the m-dimensional vectors corresponding to the x, y and z into a 3 m-dimensional vector; and inputting the 3 m-dimensional vector into an LSTM neural network according to a time sequence, outputting the 3 m-L-dimensional expression vector of the track to a full connection layer by the LSTM neural network, and outputting a judgment result of whether the track is a human behavior through a softmax layer.
CN202010494768.4A 2020-06-03 2020-06-03 Human behavior recognition method based on deep learning Withdrawn CN111860117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010494768.4A CN111860117A (en) 2020-06-03 2020-06-03 Human behavior recognition method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010494768.4A CN111860117A (en) 2020-06-03 2020-06-03 Human behavior recognition method based on deep learning

Publications (1)

Publication Number Publication Date
CN111860117A true CN111860117A (en) 2020-10-30

Family

ID=72985499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010494768.4A Withdrawn CN111860117A (en) 2020-06-03 2020-06-03 Human behavior recognition method based on deep learning

Country Status (1)

Country Link
CN (1) CN111860117A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596024A (en) * 2020-12-04 2021-04-02 华中科技大学 Motion identification method based on environment background wireless radio frequency signal
CN112766420A (en) * 2021-03-12 2021-05-07 合肥共达职业技术学院 Human behavior identification method based on time-frequency domain information
CN116758479A (en) * 2023-06-27 2023-09-15 汇鲲化鹏(海南)科技有限公司 Coding deep learning-based intelligent agent activity recognition method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596024A (en) * 2020-12-04 2021-04-02 华中科技大学 Motion identification method based on environment background wireless radio frequency signal
CN112766420A (en) * 2021-03-12 2021-05-07 合肥共达职业技术学院 Human behavior identification method based on time-frequency domain information
CN112766420B (en) * 2021-03-12 2022-10-21 合肥共达职业技术学院 Human behavior identification method based on time-frequency domain information
CN116758479A (en) * 2023-06-27 2023-09-15 汇鲲化鹏(海南)科技有限公司 Coding deep learning-based intelligent agent activity recognition method and system
CN116758479B (en) * 2023-06-27 2024-02-02 汇鲲化鹏(海南)科技有限公司 Coding deep learning-based intelligent agent activity recognition method and system

Similar Documents

Publication Publication Date Title
CN109522793B (en) Method for detecting and identifying abnormal behaviors of multiple persons based on machine vision
CN106897670B (en) Express violence sorting identification method based on computer vision
WO2021184619A1 (en) Human body motion attitude identification and evaluation method and system therefor
CN108256433B (en) Motion attitude assessment method and system
Jalal et al. Shape and motion features approach for activity tracking and recognition from kinect video camera
CN110287844B (en) Traffic police gesture recognition method based on convolution gesture machine and long-and-short-term memory network
CN109241829B (en) Behavior identification method and device based on space-time attention convolutional neural network
CN106778796B (en) Human body action recognition method and system based on hybrid cooperative training
CN108647644B (en) Coal mine blasting unsafe action identification and judgment method based on GMM representation
CN111161315B (en) Multi-target tracking method and system based on graph neural network
CN110070029B (en) Gait recognition method and device
CN108230291B (en) Object recognition system training method, object recognition method, device and electronic equipment
CN111860117A (en) Human behavior recognition method based on deep learning
CN102509085A (en) Pig walking posture identification system and method based on outline invariant moment features
CN111738218B (en) Human body abnormal behavior recognition system and method
CN106648078A (en) Multimode interaction method and system applied to intelligent robot
CN110599463A (en) Tongue image detection and positioning algorithm based on lightweight cascade neural network
CN110458022A (en) It is a kind of based on domain adapt to can autonomous learning object detection method
CN114332911A (en) Head posture detection method and device and computer equipment
CN115761537A (en) Power transmission line foreign matter intrusion identification method oriented to dynamic characteristic supplement mechanism
CN113705445B (en) Method and equipment for recognizing human body posture based on event camera
CN105160285A (en) Method and system for recognizing human body tumble automatically based on stereoscopic vision
CN112926522B (en) Behavior recognition method based on skeleton gesture and space-time diagram convolution network
CN110163142B (en) Real-time gesture recognition method and system
CN111694980A (en) Robust family child learning state visual supervision method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201030

WW01 Invention patent application withdrawn after publication