CN111523361B - Human behavior recognition method - Google Patents

Human behavior recognition method Download PDF

Info

Publication number
CN111523361B
CN111523361B CN201911366634.8A CN201911366634A CN111523361B CN 111523361 B CN111523361 B CN 111523361B CN 201911366634 A CN201911366634 A CN 201911366634A CN 111523361 B CN111523361 B CN 111523361B
Authority
CN
China
Prior art keywords
information
representing
image
sparse
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911366634.8A
Other languages
Chinese (zh)
Other versions
CN111523361A (en
Inventor
张信明
郑辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201911366634.8A priority Critical patent/CN111523361B/en
Publication of CN111523361A publication Critical patent/CN111523361A/en
Application granted granted Critical
Publication of CN111523361B publication Critical patent/CN111523361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a human behavior identification method. The method comprises the steps of firstly extracting information of two modes of a first image and a second image, wherein the information can represent static information, the information of the first image and the second image is implicitly aligned by a convolutional neural network with an attention mechanism, the characteristics of implicit alignment are further mapped into a public subspace to be explicitly aligned, then, a sparse contraction depth automatic encoder is used for carrying out deep fusion on the characteristics of different aligned modes, and finally, the characteristics with high robustness and strong discriminability obtained after fusion are used for training a deep belief network to realize a high-precision human behavior recognition function. The invention can fully excavate and fuse the information of two different modes of time and space, learn the high-level semantic characteristics representing the essential information of the video, and finally realize the accurate identification of human behaviors.

Description

Human behavior recognition method
Technical Field
The application relates to the technical field of image analysis, in particular to a human behavior identification method based on cross-modal learning.
Background
In recent years, with the popularization of consumer electronic devices and products and the improvement of network performance, the amount of videos generated by various electronic devices is on a rapid growth trend.
Under the era background of big data drive and Smart City (Smart City) construction, better understanding of videos, especially visual understanding tasks with human as a center have profound influence on the fields of social security, intelligent medical treatment, unmanned driving and the like. Therefore, the human behavior recognition has important application value.
At present, mainstream human behavior recognition is mainly divided into model driving and data driving methods, but the accuracy of the two methods for human behavior recognition is lower in practical application.
Disclosure of Invention
In order to solve the technical problem, the application provides a human behavior recognition method so as to achieve the purpose of improving the accuracy of human behavior recognition.
In order to achieve the technical purpose, the embodiment of the application provides the following technical scheme:
a human behavior recognition method comprises the following steps:
extracting a first image and a second image from video data to be identified; the first image comprises static information of the video data to be identified, and the second image comprises dynamic information of the video data to be identified;
carrying out implicit alignment and feature learning on information of two different modes by utilizing a convolutional neural network with an attention mechanism;
mapping the implicitly aligned features to a common subspace for explicit alignment processing;
carrying out deep fusion on the aligned different modal information by using a sparse shrinkage depth automatic encoder;
and training a classifier deep belief network by using the fused features so as to realize accurate recognition of human behaviors.
Optionally, the mapping the implicitly aligned features to a common subspace to perform explicit alignment processing includes:
inputting the first mode information and the second mode information into a convolutional neural network with an attention mechanism for implicit alignment;
and performing explicit alignment on the implicitly aligned first modality information and second modality information, and mapping different modality information into a common subspace by performing subspace learning on the different modality information.
Optionally, the explicitly aligning the implicitly aligned first modality information and the second modality information includes:
substituting the first modality information and the second modality information into a first formula;
the first formula includes:
Figure GDA0003695750860000021
wherein X represents first modality information, Y represents second modality information, and X is d x ×T 1 Matrix of dimensions, Y being a number d y ×T 2 Matrix of dimensions, W x A mapping matrix representing the first modality information, and W x Is a d x X d dimensional matrix, W y A mapping matrix representing the second modality information, and W y Is a d y Matrix of dimension x d, V x And V y Represents a binary selection matrix, an
Figure GDA0003695750860000031
Δ represents a diagonal matrix; 1 represents a vector with all values 1; i denotes a unit vector.
Optionally, the method for acquiring the sparse shrinkage depth autoencoder network includes:
replacing the L2 norm of a weight matrix in a loss function of the traditional automatic encoder network with the F norm of a Jacobian matrix;
introducing a sparse term in the loss function;
and determining the number of hidden nodes and the sparse parameters by applying a particle swarm optimization algorithm so as to improve the traditional automatic encoder network into the sparse shrinkage depth automatic encoder network.
Optionally, the loss function of the sparse shrinkage depth autoencoder network is:
Figure GDA0003695750860000032
wherein, J SCAE Representing a loss function of the sparse-shrinkage depth autoencoder network, D representing a training data set, L (x, y) representing a cross-entropy loss function, λ representing a control attenuation coefficient, s 2 Representing the number of neurons of the hidden layer, beta representing a sparse term coefficient, j representing the jth hidden layer node, and j being more than or equal to 1 and less than or equal to s 2 J (x) represents a Jacobian matrix, ρ represents a sparse term parameter,
Figure GDA0003695750860000033
represents the average activity of the hidden neuron j, and | | represents the relative entropy calculation.
It can be seen from the foregoing technical solutions that, in the human behavior recognition method provided in the embodiments of the present application, according to two different modality information, namely, a first image containing static information and a second image containing dynamic information, extracted from video data to be recognized, the human behavior recognition method performs implicit alignment and feature learning on the first image and the second image by using a convolutional neural network with an attention mechanism, maps different modalities of the implicit alignment into a common subspace for explicit alignment, and then fuses different modality features after alignment by using a sparse shrinkage depth auto-encoder, thereby implementing a feature learning process simultaneously in a fusion process; and finally, training a deep belief network by using the fused fusion characteristic information with high robustness and strong discriminability to realize the high-precision human behavior recognition function. Through the description, the human behavior identification method provided by the embodiment of the application can fully mine and fuse information between static and dynamic different modes, fully learn high-level semantic features representing essential video information, finally realize accurate identification of human behaviors and achieve the purpose of improving accuracy of human behavior identification.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flowchart of a human behavior recognition method according to an embodiment of the present application;
FIG. 2 is a diagram illustrating several frames of images of video data to be identified according to an embodiment of the present application;
fig. 3 is a schematic diagram of a first image extracted from video data to be identified according to an embodiment of the present application;
fig. 4 is a schematic diagram of a second image extracted from video data to be identified according to an embodiment of the present application;
fig. 5 is a flowchart illustrating an obtaining method of a sparse-shrinkage depth autoencoder network according to an embodiment of the present application.
Detailed Description
As described in the background art, the recognition accuracy of the human behavior recognition method that is mainstream in the prior art is low. The following is a detailed analysis of the model-driven and data-driven methods.
(1) The model-driven method first obtains artificially extracted features such as HOG (Histogram of Oriented Gradient), HOF (Histogram of Oriented Optical Flow), MBH (Motion Boundary Histogram), etc. from a video sequence, and then inputs them into a common classifier such as bayes, support vector machine, decision tree, etc. for classification and recognition. On one hand, manually extracting features is a time-consuming and labor-consuming project, and on the other hand, the extracted features are based on prior knowledge and often cannot sufficiently reflect the most essential information of data.
(2) The data-driven method is popular in the big data age deep learning wave and simulates the characteristic learning of the original data by a deep neural network instead of the traditional priori knowledge and physical model of the human brain. The high-level abstract features learned through the deep neural network can reflect essential information of data, have strong discriminability and robustness, and gradually replace the traditional feature engineering method in recent years.
However, on one hand, some methods ignore complementary information of different modalities contained in the video, and on the other hand, some methods ignore difference of the modalities on a spatio-temporal scale by default, which has a certain influence on understanding of high-level semantic features of the video, although some methods utilize information of different modalities on a dynamic state and a static state, the modalities are aligned on a spatio-temporal basis.
In view of this, an embodiment of the present application provides a method for recognizing human body behaviors, including:
extracting a first image and a second image from video data to be identified; the first image comprises static information of the video data to be identified, and the second image comprises dynamic information of the video data to be identified;
carrying out implicit alignment and feature learning on information of two different modes by utilizing a convolutional neural network with an attention mechanism;
mapping the implicitly aligned features to a common subspace for explicit alignment processing;
performing deep fusion on the aligned different modal information by using a sparse shrinkage depth automatic encoder;
and training a classifier deep belief network by using the fused features so as to realize accurate recognition of human behaviors.
In the embodiment, the human behavior identification method comprises the steps of firstly, according to two different modal information including a first image containing static information and a second image containing dynamic information extracted from video data to be identified, carrying out implicit alignment and feature learning on the two different modal information of the first image and the second image by using a convolutional neural network with an attention mechanism, mapping the feature information after the implicit alignment to a public subspace for explicit alignment, then fusing the aligned different modal features by using a sparse shrinkage depth automatic encoder network, and realizing the simultaneous implementation with a feature learning process in a fusion process; and finally, training a deep belief network by using the fused fusion characteristic information with high robustness and strong discriminability to realize the high-precision human behavior recognition function. Through the description, the human behavior identification method provided by the embodiment of the application can fully mine and fuse information between static and dynamic different modes, fully learn high-level semantic features representing essential video information, finally realize accurate identification of human behaviors and achieve the purpose of improving accuracy of human behavior identification.
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a human behavior identification method, as shown in fig. 1, including:
s101: extracting a first image and a second image from video data to be identified; the first image comprises static information of the video data to be identified, and the second image comprises dynamic information of the video data to be identified; extracting a first image representing static information and a second image representing dynamic information from video data;
alternatively, the first image may be an image capable of representing static information, such as an RGB image, and the second image may be an image capable of representing dynamic information, such as an optical flow image.
The Optical Flow (Optical Flow) method is an important method for motion image analysis, and refers to the velocity of mode motion in time-varying images, because when an object is moving, the luminance mode of its corresponding point on the image is also moving. The Apparent Motion (Apparent Motion) of this image brightness pattern is the optical flow. The optical flow expresses the change of the image, and since it contains information on the movement of the object, it can be used by the viewer to determine the movement of the object.
Referring to fig. 2 to 4, fig. 2 shows several frames of original images in the video data to be recognized, fig. 3 shows an RGB image extracted from the video data to be recognized as a first image, and fig. 4 shows an optical flow image calculated from the video data to be recognized as the second image.
S102: carrying out implicit alignment and feature learning on information of two different modes by utilizing a convolutional neural network with an attention mechanism;
attention (Attention) mechanism (CAM) is a mechanism embedded within a convolutional neural network that mimics human visual behavior. An attention mechanism is added to the ResNet50 convolutional neural network. In general, the first image and the second image are two-dimensional image signals, and after the size cropping transformation is completed, the two-dimensional image signals can be directly input into a convolutional neural network model with attention mechanism for feature learning.
The ResNet-50 network model is a powerful convolutional neural network with 50 layers that trains one million images from the ImageNet database to classify the images into 1000 object classes. The network which is fine-tuned and pre-trained is applied to a video human behavior recognition task, so that the advantage of characteristic learning of the network can be exerted, and essential information of each mode can be effectively mined.
In addition, an attention mechanism is added into the convolutional neural network, so that more discriminative and significant areas in the two-dimensional image can be captured further, and interference of some unimportant areas is ignored.
S103: mapping the implicitly aligned features to a common subspace to perform explicit alignment processing;
s104: performing deep fusion on the aligned different modal information by using a sparse shrinkage depth automatic encoder;
the fusion characteristic information obtained by the deep fusion can represent the essential spatiotemporal semantic information of the video data to be processed, and has the characteristic of high robustness.
S105: and training a deep belief network of the classifier by using the fused features so as to realize accurate recognition of human behaviors.
The Deep Belief Network (DBN) adopts a layer-by-layer training mode to solve the optimization problem of a Deep neural network, and gives a better initial weight to the whole network through the layer-by-layer training, so that the network can reach an optimal solution as long as the network is subjected to fine adjustment.
In the embodiment, the human behavior identification method comprises the steps of firstly, according to two different modal information of a first image containing static information and a second image containing dynamic information extracted from video data to be identified, carrying out implicit alignment and feature learning on the first image and the second image by using a convolutional neural network with an attention mechanism, mapping different implicitly aligned modes into a common subspace for explicit alignment, then fusing different aligned modal features by using a sparse shrinkage depth automatic encoder, and realizing the simultaneous implementation of the different modal features and the feature learning process in the fusion process; and finally, training a deep belief network by using the fused fusion characteristic information with high robustness and strong discriminability to realize the high-precision human behavior recognition function. Through the description, the human behavior identification method provided by the embodiment of the application can fully mine and fuse information between static and dynamic different modes, fully learn high-level semantic features representing essential video information, finally realize accurate identification of human behaviors and achieve the purpose of improving accuracy of human behavior identification.
On the basis of the foregoing embodiment, in an embodiment of the present application, the mapping the implicitly aligned features into a common subspace to perform explicit alignment processing includes:
inputting the first mode information and the second mode information into a convolutional neural network with an attention mechanism for implicit alignment;
and performing explicit alignment on the implicitly aligned first modality information and second modality information, and mapping different modality information into a common subspace by performing subspace learning on the different modality information. (i.e., mapping the first modality information and the second modality information into a common low-dimensional subspace), aligning the first modality information and the second modality information in the common subspace.
Specifically, the explicitly aligning the implicitly aligned first modality information and the second modality information includes:
substituting the first modality information and the second modality information into a first formula;
the first formula includes:
Figure GDA0003695750860000091
wherein X represents first modality information, Y represents second modality information, and X is d x ×T 1 Matrix of dimensions, Y being a number d y ×T 2 Matrix of dimensions, W x A mapping matrix representing the first modality information, and W x Is a d x X d dimension matrix, W y A mapping matrix representing the second modality information, and W y Is a d y Matrix of dimension x d, d x X d and d y Xd denotes the dimension of the two-dimensional matrix, V x And V y Represents a binary selection matrix, and
Figure GDA0003695750860000092
Δ represents a diagonal matrix; 1 represents a vector of appropriate dimensions with all values 1; i denotes a unit vector.
As mentioned above, the attention mechanism is added into the convolutional neural network, so that more discriminative and significant areas in the two-dimensional image can be captured, and the interference of some unimportant areas is ignored. For a given two-dimensional image, by introducing a mechanism of attention in the convolutional neural network, the class c is input into the softmax layer in the convolutional neural network
Figure GDA0003695750860000093
k denotes the kth unit, f k (x, y) refers to the activation function, c denotes class c, the output is:
Figure GDA0003695750860000094
the output result of the implicit alignment network needs to be further input into the explicit alignment network, that is, the alignment on the time-space scale is represented by mapping the information of different modalities into a common subspace through an explicit alignment method.
On the basis of the above embodiments, in an embodiment of the present application, with reference to fig. 5, the method for acquiring a sparse-shrinkage depth automatic encoder network includes:
s201: replacing the L2 norm of a weight matrix in a loss function of the traditional automatic encoder network with the F norm of a Jacobian matrix;
s202: introducing a sparse term into the loss function to achieve the aim of carrying out sparse constraint on the hidden layer of the traditional automatic encoder network;
s203: and determining the number of hidden nodes and sparse parameters by applying a particle swarm optimization algorithm so as to improve the traditional automatic encoder network into the sparse shrinkage depth automatic encoder network.
Specifically, the loss function of the sparse shrinkage depth automatic encoder network is as follows:
Figure GDA0003695750860000101
wherein, J SCAE Representing a loss function of the sparse-shrinkage depth autoencoder network, D representing a training data set, L (x, y) representing a cross-entropy loss function, λ representing a control attenuation coefficient, s 2 Representing the number of neurons of the hidden layer, beta representing a sparse term coefficient, j representing the jth hidden layer node, and j being more than or equal to 1 and less than or equal to s 2 J (x) represents a Jacobian matrix, ρ represents a sparse term parameter,
Figure GDA0003695750860000102
represents the average activity of the hidden neuron j, and | | represents the relative entropy calculation.
An automatic encoder (Autoencoder) network is an unsupervised neural network model, and the construction and specific structure of a conventional automatic encoder network in the prior art are well known to those skilled in the art, and are not described herein in detail.
To sum up, the embodiment of the present application provides a human behavior recognition method, where the human behavior recognition method first performs implicit alignment and feature learning on a first image and a second image according to two different modalities of information, namely, a first image containing static information and a second image containing dynamic information, extracted from video data to be recognized, by using a convolutional neural network with an attention mechanism, and maps the different modalities of the implicit alignment into a common subspace for explicit alignment, and then fuses the aligned different modality features by using a sparse shrinkage depth automatic encoder, thereby implementing a feature learning process simultaneously in a fusion process; and finally, training a deep belief network by using the fused high-robustness and strong-discriminability fused feature information to realize a high-precision human behavior recognition function. Through the description, the human behavior identification method provided by the embodiment of the application can fully mine and fuse information between static and dynamic different modes, fully learn high-level semantic features representing essential video information, finally realize accurate identification of human behaviors and achieve the purpose of improving accuracy of human behavior identification.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (3)

1. A human behavior recognition method is characterized by comprising the following steps:
extracting a first image and a second image from video data to be identified; the first image comprises static information of the video data to be identified, and the second image comprises dynamic information of the video data to be identified;
carrying out implicit alignment and feature learning on information of two different modes by using a convolutional neural network with an attention mechanism;
mapping the implicitly aligned features to a common subspace for explicit alignment processing;
performing deep fusion on the aligned different modal information by using a sparse shrinkage depth automatic encoder;
training a classifier deep belief network by using the fused features to realize accurate recognition of human behaviors;
the mapping the implicitly aligned features to a common subspace for explicit alignment processing includes:
inputting the first mode information and the second mode information into a convolutional neural network with an attention mechanism for implicit alignment;
performing explicit alignment on the implicitly aligned first modality information and second modality information, and mapping different modality information into a common subspace by performing subspace learning on the different modality information;
the performing explicit alignment on the implicitly aligned first modality information and second modality information includes:
substituting the first modality information and the second modality information into a first formula;
the first formula includes:
Figure FDA0003695750850000011
wherein X represents first modality information, Y represents second modality information, and X is d x ×T 1 Matrix of dimensions, Y is a d y ×T 2 Matrix of dimensions, W x A mapping matrix representing the first modality information, and W x Is a d x X d dimensional matrix, W y A mapping matrix representing the second modality information, and W y Is a d y Matrix of dimension x d, V x And V y Represents a binary selection matrix, and
Figure FDA0003695750850000021
Δ represents a diagonal matrix; 1 represents a vector in which all values are 1; i denotes a unit vector.
2. The human behavior recognition method according to claim 1, wherein the method for acquiring the sparse-contraction depth automatic encoder network comprises:
replacing the L2 norm of a weight matrix in a loss function of the traditional automatic encoder network with the F norm of a Jacobian matrix;
introducing a sparse term in the loss function;
and determining the number of hidden nodes and sparse parameters by applying a particle swarm optimization algorithm so as to improve the traditional automatic encoder network into the sparse shrinkage depth automatic encoder network.
3. The human behavior recognition method according to claim 2, wherein the loss function of the sparse-contraction depth auto-encoder network is:
Figure FDA0003695750850000022
wherein, J SCAE Representing a loss function of the sparse-shrinkage depth autoencoder network, D representing a training data set, L (x, y) representing a cross-entropy loss function, λ representing a control attenuation coefficient, s 2 Representing the number of neurons of the hidden layer, beta representing a sparse term coefficient, j representing the jth hidden layer node, and j being more than or equal to 1 and less than or equal to s 2 J (x) represents a Jacobian matrix, ρ represents a sparse term parameter,
Figure FDA0003695750850000023
mean Activity representing hidden neuron jJerk, | | | represents the relative entropy calculation.
CN201911366634.8A 2019-12-26 2019-12-26 Human behavior recognition method Active CN111523361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911366634.8A CN111523361B (en) 2019-12-26 2019-12-26 Human behavior recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911366634.8A CN111523361B (en) 2019-12-26 2019-12-26 Human behavior recognition method

Publications (2)

Publication Number Publication Date
CN111523361A CN111523361A (en) 2020-08-11
CN111523361B true CN111523361B (en) 2022-09-06

Family

ID=71900387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911366634.8A Active CN111523361B (en) 2019-12-26 2019-12-26 Human behavior recognition method

Country Status (1)

Country Link
CN (1) CN111523361B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001437B (en) * 2020-08-19 2022-06-14 四川大学 Modal non-complete alignment-oriented data clustering method
CN112487937B (en) * 2020-11-26 2022-12-06 北京有竹居网络技术有限公司 Video identification method and device, storage medium and electronic equipment
CN115116124B (en) * 2022-05-13 2024-07-19 大连海事大学 Action representation and recognition method based on vision and wireless bimodal joint perception

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104523266A (en) * 2015-01-07 2015-04-22 河北大学 Automatic classification method for electrocardiogram signals
CN105678216A (en) * 2015-12-21 2016-06-15 中国石油大学(华东) Spatio-temporal data stream video behavior recognition method based on deep learning
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
CN109508686A (en) * 2018-11-26 2019-03-22 南京邮电大学 A kind of Human bodys' response method based on the study of stratification proper subspace
CN109615019A (en) * 2018-12-25 2019-04-12 吉林大学 Anomaly detection method based on space-time autocoder
KR20190055632A (en) * 2017-11-15 2019-05-23 전자부품연구원 Object reconstruction apparatus using motion information and object reconstruction method using thereof
CN110135345A (en) * 2019-05-15 2019-08-16 武汉纵横智慧城市股份有限公司 Activity recognition method, apparatus, equipment and storage medium based on deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104523266A (en) * 2015-01-07 2015-04-22 河北大学 Automatic classification method for electrocardiogram signals
CN105678216A (en) * 2015-12-21 2016-06-15 中国石油大学(华东) Spatio-temporal data stream video behavior recognition method based on deep learning
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
KR20190055632A (en) * 2017-11-15 2019-05-23 전자부품연구원 Object reconstruction apparatus using motion information and object reconstruction method using thereof
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
CN109508686A (en) * 2018-11-26 2019-03-22 南京邮电大学 A kind of Human bodys' response method based on the study of stratification proper subspace
CN109615019A (en) * 2018-12-25 2019-04-12 吉林大学 Anomaly detection method based on space-time autocoder
CN110135345A (en) * 2019-05-15 2019-08-16 武汉纵横智慧城市股份有限公司 Activity recognition method, apparatus, equipment and storage medium based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A sparse auto-encoder-based deep neural network approach for induction motor faults classification;Wenjun Sun et al.;《Measurement》;20161231;第171-178页 *
Two-stream Flow-guided Convolutional Attention Networks for Action Recognitio;An Tran, Loong-Fah Cheong;《2017 IEEE International Conference on Computer Vision Workshops》;20171231;第3110-3119页 *
基于注意力机制和多模态特征融合的人体行为识别;吴汉卿;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20190715;I138-1150 *

Also Published As

Publication number Publication date
CN111523361A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN107506712B (en) Human behavior identification method based on 3D deep convolutional network
CN111523361B (en) Human behavior recognition method
CN108537136B (en) Pedestrian re-identification method based on attitude normalization image generation
Zhang et al. Deep hierarchical guidance and regularization learning for end-to-end depth estimation
CN109409222A (en) A kind of multi-angle of view facial expression recognizing method based on mobile terminal
Cherabier et al. Learning priors for semantic 3d reconstruction
Jaswanth et al. A novel based 3D facial expression detection using recurrent neural network
CN107862275A (en) Human bodys' response model and its construction method and Human bodys' response method
CN111191583A (en) Space target identification system and method based on convolutional neural network
CN105787458A (en) Infrared behavior identification method based on adaptive fusion of artificial design feature and depth learning feature
CN106709482A (en) Method for identifying genetic relationship of figures based on self-encoder
CN113343974B (en) Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement
CN112418032B (en) Human behavior recognition method and device, electronic equipment and storage medium
CN112507943B (en) Visual positioning navigation method, system and medium based on multitasking neural network
US11223782B2 (en) Video processing using a spectral decomposition layer
CN114692732B (en) Method, system, device and storage medium for updating online label
WO2021243947A1 (en) Object re-identification method and apparatus, and terminal and storage medium
CN112464844A (en) Human behavior and action recognition method based on deep learning and moving target detection
Pavel et al. Object class segmentation of RGB-D video using recurrent convolutional neural networks
US20220301311A1 (en) Efficient self-attention for video processing
Rybchak et al. Analysis of computer vision and image analysis technics
CN113255602A (en) Dynamic gesture recognition method based on multi-modal data
CN110111365A (en) Training method and device and method for tracking target and device based on deep learning
CN114399661A (en) Instance awareness backbone network training method
AU2020102476A4 (en) A method of Clothing Attribute Prediction with Auto-Encoding Transformations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant