CN106909938B - Visual angle independence behavior identification method based on deep learning network - Google Patents
Visual angle independence behavior identification method based on deep learning network Download PDFInfo
- Publication number
- CN106909938B CN106909938B CN201710082263.5A CN201710082263A CN106909938B CN 106909938 B CN106909938 B CN 106909938B CN 201710082263 A CN201710082263 A CN 201710082263A CN 106909938 B CN106909938 B CN 106909938B
- Authority
- CN
- China
- Prior art keywords
- visual angle
- deep learning
- space
- behavior
- learning network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a visual angle independence behavior identification method based on a deep learning network, which comprises the following steps: recording video frame images under a certain visual angle, and extracting and processing bottom layer features in a deep learning mode; modeling the obtained bottom layer characteristics, and obtaining a cube model according to a time sequence; and converting the cube models of all the visual angles into a cylindrical feature space mapping with a constant visual angle, and inputting the cylindrical feature space mapping into a classifier for training to obtain the visual angle independence classifier of the video behavior. According to the technical scheme, the deep learning network is adopted to analyze the human body behaviors under multiple visual angles, so that the robustness of the classification model is improved; the method is particularly suitable for training and learning based on big data, and can well exert the advantages of the method.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a visual angle independence behavior identification method based on a deep learning network.
Background
With the rapid development of information technology, computer vision has been in the best development period along with the emergence of concepts such as VR, AR and artificial intelligence, and the most important video behavior analysis in the field of computer vision is more and more favored by scholars at home and abroad. In a series of fields such as video monitoring, human-computer interaction, medical care, video retrieval and the like, video behavior analysis occupies a great proportion. Such as the now popular unmanned automobile project, video behavior analysis is very challenging. Due to the characteristics of complexity and diversity of human body actions and the influence of factors such as human body self-shielding, multi-scale, visual angle rotation and translation under multiple visual angles, the difficulty of video behavior identification is very high. How to accurately recognize human behaviors from multiple angles in real life and analyze the human behaviors is a very important research topic, and social requirements on behavior analysis are increasing.
The traditional research methods include the following:
based on the space-time characteristic points: and extracting the spatio-temporal feature points from the extracted video frame images, modeling and analyzing the spatio-temporal feature points, and finally classifying.
Based on the human skeleton: human skeleton information is extracted through an algorithm or a depth camera, and then video behaviors are classified through description and modeling of the skeleton information.
The behavior analysis method based on the space-time characteristic points and the skeleton information obtains remarkable results under the traditional single visual angle or single mode, but aiming at the existing areas with larger pedestrian flow, such as streets, airports, stations and the like, or a series of complex problems of human body shielding, illumination change, visual angle change and the like, the effect of the two analysis methods simply used in the actual life can not meet the requirements of people, and sometimes the robustness of the algorithm is poor.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a visual angle independence behavior identification method based on a deep learning network, which adopts the deep learning network to analyze human body behaviors under multiple visual angles and improves the robustness of a classification model; in particular, the deep learning network is suitable for training and learning based on big data and can well exert the advantages of the deep learning network.
The technical scheme of the invention is realized as follows:
a visual angle independence behavior recognition method based on a deep learning network comprises a training process of obtaining a classifier by utilizing a training sample set and a recognition process of recognizing a test sample by utilizing the classifier;
the training process comprises the steps of:
s1) inputting the video frame images Image 1 to Image i in chronological order at a certain angle;
s2) performing bottom layer feature extraction on the image input in the step S1) by adopting a CNN (Convolutional Neural Network) and pooling the bottom layer feature, and performing Spatial Transform Network (STN) reinforcement on the pooled bottom layer feature;
s3) pooling the Feature image (Feature Map) strengthened in the step S2) and inputting RNN (recurrent neural Network) for time modeling to obtain a time sequence related cube model;
s4) repeating the steps S1) to S3) to obtain a space cube model of the same behavior under a plurality of visual angles, converting the space cube model of each visual angle into a cylinder characteristic space mapping with a constant visual angle, and inputting the space cube model into a classifier for training as a training sample of the behavior;
s5) repeating the steps to obtain the view independence classifiers of various behaviors;
the identification process comprises the steps of:
s6) recording a video frame image under a certain visual angle, and performing bottom layer feature extraction and modeling by adopting the steps S1) to S3) to obtain a space cube model under the visual angle;
s7) converting the space cube model obtained in the step S6) into a cylinder feature space mapping with a constant view angle, and inputting the space cube model into a classifier for recognition to obtain a video behavior category.
In the above technical solution, step S2) preferably adopts a triple layer convolution operation to extract the bottom layer features; step S2) and step S3) perform a dimensionality reduction operation on the feature images, preferably using a maximum pooling method.
In the above technical solution, the spatial cube model of the same behavior at a certain viewing angle is obtained in step S3), and the steps S1) to S3) are repeatedly performed to obtain spatial cube models of the same behavior at a plurality of viewing angles.
In the technical scheme of the invention, an LSTM (Long-Short Term Memory, LSTM for Short) network is preferably adopted for time modeling, because a random gradient descent method is adopted in the backward propagation process of the deep learning network, and the problem of gradient disappearance of each layer can be prevented by adopting special gate operation in the LSTM.
In the above technical solution, step S4) specifically includes:
s41) repeating the operation steps S1) to S3), obtaining a space cube model of the same behavior at each visual angle, and integrating the space cube model into a cylinder space with x, y and z as coordinate axes, wherein the cylinder space represents the track description of the motion characteristics at each visual angle;
s42) applying the formula to the model obtained in step S41):
and carrying out polar coordinate transformation to obtain the spatial mapping of the cylinder with a constant angle.
In the above technical solution, further comprising: s0), the present invention preferably employs an IXMAS dataset.
Compared with the prior art, the technical scheme of the invention is different from the following steps:
1. and (3) performing feature extraction on the bottom layer features by using a CNN (computer network communication) method to obtain global features instead of key points obtained by the traditional method.
2. The resulting global features are feature-enhanced using the STN method, rather than modeling the resulting features directly.
3. And performing time modeling on the global features after the strengthening and dimension reduction operations by using an LSTM network, and adding important time information to make the global features have time relevance.
4. And (3) carrying out coordinate transformation on the space cube model of each visual angle of the same behavior by using polar coordinate transformation to obtain cylinder space mapping with unchanged angles, and then finishing training and classification and recognition by using the CNN.
The invention has the advantages that: the method comprises the steps that global high-level features are obtained by using a CNN method, the robustness of videos in actual life is good through the feature enhancement of STN, then time information is established by using an RNN network, finally different features in multiple visual angles are fused through polar coordinate transformation, the obtained descriptors with unchanged angles are trained and classified by using the CNN, the traditional skeleton and key point extraction operation is not used, and the features obtained by the global features are more comprehensive; the RNN obtains inter-frame time information, so that the behavior description is more complete and the applicability is stronger.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of the training process of the present invention;
FIG. 2 is a flow chart illustrating the identification process of the present invention;
FIG. 3 is a schematic diagram of a general human behavior recognition process;
FIG. 4 is a simplified flow chart of the extraction and modeling of underlying features;
fig. 5 is a flowchart of a general CNN process;
FIG. 6 is a simplified block diagram of a generic RNN;
FIG. 7 is a LSTM block diagram;
FIG. 8 is a flow chart of fusion classification for various views;
fig. 9 is a schematic diagram of the model of fig. 8 after the Motion History Volume is transformed by polar coordinates.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 and fig. 2, the method for identifying view-independent behaviors based on a deep learning network of the present invention includes a training process for obtaining a classifier by using a training sample set and an identification process for identifying a test sample by using the classifier;
the training process is shown in fig. 1 and comprises the following steps:
s1) inputting the video frame images Image 1 to Image i in chronological order at a certain angle;
s2) performing bottom layer feature extraction on the image input in the step S1) by adopting CNN (CNN) and pooling the image, and reinforcing the pooled bottom layer features by adopting STN;
s3) pooling the feature images strengthened in the step S2) and inputting RNN for time modeling to obtain a time sequence related cube model;
s4) repeating the steps S1) to S3) to obtain a space cube model of the same behavior under a plurality of visual angles, converting the space cube model of each visual angle into a cylinder characteristic space mapping with a constant visual angle, and inputting the space cube model into a classifier for training as a training sample of the behavior;
s5) repeating the steps to obtain the view-independent classifiers of various behaviors.
The identification process is shown in fig. 2 and comprises the following steps:
s6) recording a video frame image under a certain visual angle, and performing bottom layer feature extraction and modeling by adopting the steps S1) to S3) to obtain a space cube model under the visual angle;
s7) converting the space cube model obtained in the step S6) into a cylinder feature space mapping with a constant view angle, and inputting the space cube model into a classifier for recognition to obtain a video behavior category.
In the above technical solution, step S2) preferably adopts a triple layer convolution operation to extract the bottom layer features; step S2) and step S3) perform a dimensionality reduction operation on the feature images, preferably using a maximum pooling method.
In the above technical solution, the spatial cube model of the same behavior at a certain viewing angle is obtained in step S3), and the steps S1) to S3) are repeatedly performed to obtain spatial cube models of the same behavior at a plurality of viewing angles.
In the technical scheme of the invention, an LSTM (Long-Short Term Memory, LSTM for Short) network is preferably adopted for time modeling, because a random gradient descent method is adopted in the backward propagation process of the deep learning network, and the problem of gradient disappearance of each layer can be prevented by adopting special gate operation in the LSTM.
In the above technical solution, step S4) specifically includes:
s41) repeating the operation steps S1) to S3), obtaining a space cube model of the same behavior at each visual angle, and integrating the space cube model into a cylinder space with x, y and z as coordinate axes, wherein the cylinder space represents the track description of the motion characteristics at each visual angle;
s42) applying the formula to the model obtained in step S41):
and carrying out polar coordinate transformation to obtain the spatial mapping of the cylinder with a constant angle.
In the above technical solution, further comprising: s0) constructing a data set.
The present invention preferably employs an IXMAS dataset that contains five different perspectives, 12 persons and 14 movements per person, each of which is repeated three times. 11 of these were used as training data sets and the remaining 1 as test data sets.
Specifically, to recognize the behavior of "running", for example, first, running videos of 12 persons are collected from five viewpoints, wherein the running video of 11 persons is used as the training data set, and the remaining 1 person is used as the verification data set. Firstly, operating the running video frame image of a certain person at a visual angle according to the steps S1) to S3), and finally obtaining a cubic model related to the time sequence of the running video behavior at the visual angle, namely a spatial cubic model of the running video behavior at the visual angle; then, repeating the steps S1) to S3) to sequentially obtain space cube models of the running behaviors under other four visual angles; converting the space cube model of the running behavior under the five visual angles into a cylindrical feature space mapping with a constant visual angle, taking the cylindrical feature space mapping as a training sample of the running behavior of the person, and inputting the training sample into a classifier for training; after training samples of different people are trained for multiple times, the visual angle independence classifier of the 'running' behavior is obtained. Similarly, view-independent classifiers for various video behaviors can be constructed.
When the identification is performed, the steps S6) and S7) are performed, firstly, the video frame image of a person in the test sample at a certain viewing angle is operated according to the steps S1) to S3) to obtain a space cube model of the behavior at the viewing angle, and then the space cube model is transformed into a cylinder feature space mapping through polar coordinate transformation, and the cylinder feature space mapping is input into a classifier to be identified as a class. The identification process of other visual angles is the same.
For better understanding and illustrating the technical solutions of the present invention, the related technologies related to the above technical solutions will be explained and analyzed in detail below.
The method model comprises two main stages, namely extracting and modeling the bottom layer characteristics; secondly, the fusion and classification of all the visual angles are carried out, and the main innovation work is as follows.
The general flow of human behavior recognition is shown in fig. 3, in the figure, the feature extraction and feature representation stage is the key point of behavior recognition, and the result of the stage will ultimately affect the recognition accuracy and the algorithm robustness.
Fig. 4 shows a simplified flow chart of the underlying feature extraction and modeling.
In the technical scheme of the invention, the adopted deep learning frame is Caffe, and the video frames Image 1 to Image i under a certain visual angle in the figure 4 are input into the network according to the time sequence. Firstly, CNN is used for carrying out Feature extraction on an input image, then STN is used for strengthening features to enable the features to have certain robustness on translation, scale change and angle change, then pooling operation is carried out on a Feature image (Feature Map), a maximum pooling method is adopted, then the Feature image subjected to the pooling operation is input into an RNN layer for time modeling, and finally a Feature image sequence (Feature Maps Sequences) with inter-frame time relevance is obtained.
The technical scheme of the invention adopts three-layer convolution operation to extract bottom-layer features, and then performs dimension reduction operation on the features by a maximum pooling method. And inputting the pooled feature images into the STN layer to perform strengthening operation on the features, wherein the function of the STN network is to enable the obtained features to have robustness to translation, rotation and scale change. And then performing maximum pooling on the Feature images output by the STN, performing dimension reduction processing again, inputting the Feature images into an RNN (radio network) to enable the Feature images to be embedded with time information, and finally combining the obtained Feature Maps into a space cube according to a time sequence. The RNN used in the invention is an LSTM network, and because a random gradient descent method is adopted in the backward propagation process of the deep learning network, and a special gate operation in the LSTM is adopted, the problem of gradient disappearance of each layer can be prevented.
Among the above technical solutions, CNN is a highly efficient recognition method that has been developed and paid attention in recent years. In the 60's of the 20 th century, Hubel and Wiesel discovered their unique network structures that could effectively reduce the complexity of the feedback neural network when studying neurons for local sensitivity and direction selection in the feline cerebral cortex, and subsequently proposed CNN. At present, CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification, because the network avoids the complex preprocessing of the image and can directly input the original image, it has been more widely applied.
In general, the basic structure of CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to a local acceptance domain of the previous layer and extracts the feature of the local. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal.
The technical scheme of the invention is to use the feature mapping layer to extract the global bottom layer features in the video frame image and then carry out deeper processing on the bottom layer features.
The general processing flow of CNN is shown in fig. 5.
The layer to be used in the technical scheme of the invention is the Feature Map obtained after convolution, and the subsequent pooling and full-link layers are ignored. The CNN obtains the feature information of a single image, and the video information is to be processed, so that time information needs to be introduced, and the requirement for processing video behaviors cannot be met by using the CNN alone.
In the above technical solution, RNNs or called recurrent neural Networks are developed based on Feed-Forward Neural Networks (FNNs). Unlike conventional FNNs, RNNs introduce a directed loop that can deal with the problem of contextual relationships between those inputs. RNN includes Input units (inputs units) with Input set labeled { x }0,x1,…,xt-1,xt,xt+1…, and the Output set of Output units (Output units) is labeled as { o }0,o1,…,ot-1,ot,ot+1… }. RNN also contains Hidden units (Hidden units), whose output set we mark as s0,s1,…,st-1,st,St+1…, these implicit elements complete the most important work.
As shown in fig. 6, which is a simplified RNN structure, in fig. 6, one unidirectionally flowing information stream reaches the hidden unit from the input unit, and at the same time, another unidirectionally flowing information stream reaches the output unit from the hidden unit. In some cases, the RNN breaks the latter constraint, directs information from the output unit back to the hidden unit, these are called "background issues", and the input to the hidden layer also includes the state of the previous hidden layer, i.e. the nodes within the hidden layer can be self-connected or interconnected. Therefore, the time information is connected in the hidden layer, and the problem of additionally considering the time information is not needed. This is also a big advantage of RNN in processing video behavior features. Therefore, in general, processing with timing information is handed to RNN for deep learning.
A new model for processing time information is developed on the basis of the RNN: Long-Short Term Memory (LSTM). Because of the random gradient descent method adopted by the backward propagation of the deep learning network, the RNN has a problem that the gradient disappears, that is, the node perceptibility of the node at the later time to the node at the previous time is reduced. The LSTM introduces one core element, the Cell. A general block diagram of the LSTM is shown in fig. 7.
Fig. 8 is a flowchart illustrating fusion classification of views.
Obtaining a space cube model of the same action under multiple viewing angles according to the method of fig. 4, then integrating the space cube model of each viewing angle into a cylinder space with x, y and z as coordinate axes, the cylinder space representing the trajectory description of the motion characteristics under each viewing angle, then performing polar coordinate transformation by using a mathematical method, and converting the polar coordinate transformation into a space with r, θ and z coordinate axes, wherein the formula is as follows:
and then obtaining an angle-Invariant cylindrical Space Map (Invariant Cylinder Space Map), and finally inputting the obtained cylindrical Space Map into a classifier to obtain a behavior class, wherein the behavior class is classified by using a CNN (classification by using a CNN method, which is different from an SVM classifier because the CNN is originally used for classification. The Motion History Volume in fig. 8 and the model after polar transformation are shown in fig. 9.
The technical scheme of the invention adopts the deep learning method to extract the bottom layer information which is higher than the space-time characteristic points and the skeleton information of the traditional method and has better bone clavus.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (6)
1. A visual angle independence behavior recognition method based on a deep learning network comprises a training process of obtaining a classifier by utilizing a training sample set and a recognition process of recognizing a test sample by utilizing the classifier; the method is characterized in that:
the training process comprises the steps of:
s1) inputting the video frame images Image 1 to Image i in chronological order at a certain angle;
s2) performing bottom layer feature extraction on the image input in the step S1) by adopting CNN (CNN) and pooling the image, and reinforcing the pooled bottom layer features by adopting STN;
s3) pooling the feature images strengthened in the step S2) and inputting RNN for time modeling to obtain a time sequence related cube model;
s4) repeating the steps S1) to S3) to obtain a space cube model of the same behavior under a plurality of visual angles, converting the space cube model of each visual angle into a cylinder characteristic space mapping with a constant visual angle, and inputting the space cube model into a classifier for training as a training sample of the behavior;
s5) repeatedly executing the steps S1) -S4) on various other behaviors of different classes to obtain view-independent classifiers corresponding to the various behaviors;
the identification process comprises the steps of:
s6) recording a video frame image under a certain visual angle, and performing bottom layer feature extraction and modeling by adopting the steps S1) to S3) to obtain a space cube model under the visual angle;
s7) converting the space cube model obtained in the step S6) into a cylinder feature space mapping, and inputting the cylinder feature space mapping into a classifier for recognition to obtain a video behavior category.
2. The deep learning network-based perspective-independent behavior recognition method according to claim 1, wherein:
step S2) employs a triple layer convolution operation to extract the underlying features.
3. The deep learning network-based perspective-independent behavior recognition method according to claim 2, wherein:
step S2) and step S3) adopt a maximum pooling method to perform dimension reduction operation on the feature images.
4. The deep learning network-based perspective-independent behavior recognition method according to claim 1, wherein:
step S3) performs temporal modeling using the LSTM network.
5. The method for identifying perspective-independent behaviors based on a deep learning network according to claim 1, wherein the step S4) specifically includes:
s41) repeating the operation steps S1) to S3), obtaining a space cube model of the same behavior at each visual angle, and integrating the space cube model into a cylinder space with x, y and z as coordinate axes, wherein the cylinder space represents the track description of the motion characteristics at each visual angle;
s42) applying the formula to the model obtained in step S41):
and carrying out polar coordinate transformation to obtain the spatial mapping of the cylinder with a constant angle.
6. The deep learning network-based perspective-independent behavior recognition method according to claim 1, further comprising:
s0) constructing a data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710082263.5A CN106909938B (en) | 2017-02-16 | 2017-02-16 | Visual angle independence behavior identification method based on deep learning network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710082263.5A CN106909938B (en) | 2017-02-16 | 2017-02-16 | Visual angle independence behavior identification method based on deep learning network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106909938A CN106909938A (en) | 2017-06-30 |
CN106909938B true CN106909938B (en) | 2020-02-21 |
Family
ID=59208388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710082263.5A Active CN106909938B (en) | 2017-02-16 | 2017-02-16 | Visual angle independence behavior identification method based on deep learning network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106909938B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463878A (en) * | 2017-07-05 | 2017-12-12 | 成都数联铭品科技有限公司 | Human bodys' response system based on deep learning |
CN107609541B (en) * | 2017-10-17 | 2020-11-10 | 哈尔滨理工大学 | Human body posture estimation method based on deformable convolution neural network |
CN107679522B (en) * | 2017-10-31 | 2020-10-13 | 内江师范学院 | Multi-stream LSTM-based action identification method |
CN108121961A (en) * | 2017-12-21 | 2018-06-05 | 华自科技股份有限公司 | Inspection Activity recognition method, apparatus, computer equipment and storage medium |
CN108764050B (en) * | 2018-04-28 | 2021-02-26 | 中国科学院自动化研究所 | Method, system and equipment for recognizing skeleton behavior based on angle independence |
CN112287754A (en) * | 2020-09-23 | 2021-01-29 | 济南浪潮高新科技投资发展有限公司 | Violence detection method, device, equipment and medium based on neural network |
CN112686111B (en) * | 2020-12-23 | 2021-07-27 | 中国矿业大学(北京) | Attention mechanism-based multi-view adaptive network traffic police gesture recognition method |
CN113111721B (en) * | 2021-03-17 | 2022-07-05 | 同济大学 | Human behavior intelligent identification method based on multi-unmanned aerial vehicle visual angle image data driving |
CN113239819B (en) * | 2021-05-18 | 2022-05-03 | 西安电子科技大学广州研究院 | Visual angle normalization-based skeleton behavior identification method, device and equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1218936A (en) * | 1997-09-26 | 1999-06-09 | 松下电器产业株式会社 | Hand gesture identifying device |
CN101216896A (en) * | 2008-01-14 | 2008-07-09 | 浙江大学 | An identification method for movement by human bodies irrelevant with the viewpoint based on stencil matching |
CN103310233A (en) * | 2013-06-28 | 2013-09-18 | 青岛科技大学 | Similarity mining method of similar behaviors between multiple views and behavior recognition method |
CN105956560A (en) * | 2016-05-06 | 2016-09-21 | 电子科技大学 | Vehicle model identification method based on pooling multi-scale depth convolution characteristics |
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
-
2017
- 2017-02-16 CN CN201710082263.5A patent/CN106909938B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1218936A (en) * | 1997-09-26 | 1999-06-09 | 松下电器产业株式会社 | Hand gesture identifying device |
CN101216896A (en) * | 2008-01-14 | 2008-07-09 | 浙江大学 | An identification method for movement by human bodies irrelevant with the viewpoint based on stencil matching |
CN103310233A (en) * | 2013-06-28 | 2013-09-18 | 青岛科技大学 | Similarity mining method of similar behaviors between multiple views and behavior recognition method |
CN105956560A (en) * | 2016-05-06 | 2016-09-21 | 电子科技大学 | Vehicle model identification method based on pooling multi-scale depth convolution characteristics |
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
Non-Patent Citations (3)
Title |
---|
Long-term Recurrent Convolutional Networks forVisual Recognition and Description;Jeff Donahue;《IEEE》;20160901;全文 * |
View-independent human action recognition base on multi-view action images and discriminant learning;Alexanros;《IVMSP 2013》;20131231;全文 * |
View-independent human action recognition with Volume Motion Template;Myung-Cheol Roh;《Pattern Recognition Letters》;20101231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN106909938A (en) | 2017-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106909938B (en) | Visual angle independence behavior identification method based on deep learning network | |
Liao et al. | Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks | |
CN109919031B (en) | Human behavior recognition method based on deep neural network | |
Zhang et al. | Unsupervised discovery of object landmarks as structural representations | |
CN106709461B (en) | Activity recognition method and device based on video | |
CN107273800B (en) | Attention mechanism-based motion recognition method for convolutional recurrent neural network | |
CN110532900B (en) | Facial expression recognition method based on U-Net and LS-CNN | |
WO2020108362A1 (en) | Body posture detection method, apparatus and device, and storage medium | |
CN111709311B (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN112307995B (en) | Semi-supervised pedestrian re-identification method based on feature decoupling learning | |
CN113673510B (en) | Target detection method combining feature point and anchor frame joint prediction and regression | |
CN105139004A (en) | Face expression identification method based on video sequences | |
Yu et al. | Human action recognition using deep learning methods | |
CN111259735B (en) | Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
CN116246338B (en) | Behavior recognition method based on graph convolution and transducer composite neural network | |
CN111738074B (en) | Pedestrian attribute identification method, system and device based on weak supervision learning | |
Fan | Research and realization of video target detection system based on deep learning | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
CN117854155A (en) | Human skeleton action recognition method and system | |
CN114038011A (en) | Method for detecting abnormal behaviors of human body in indoor scene | |
CN113408721A (en) | Neural network structure searching method, apparatus, computer device and storage medium | |
Zhao et al. | Research on human behavior recognition in video based on 3DCCA | |
CN110659576A (en) | Pedestrian searching method and device based on joint judgment and generation learning | |
Yan et al. | [Retracted] Dance Action Recognition Model Using Deep Learning Network in Streaming Media Environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220114 Address after: 266000 room 403-2, building A2, Qingdao National University Science Park, No. 127, huizhiqiao Road, high tech Zone, Qingdao, Shandong Patentee after: Qingdao shengruida Technology Co.,Ltd. Address before: 266000 Laoshan campus, Songling Road, Laoshan District, Qingdao, Shandong, China, 99 Patentee before: QINGDAO University OF SCIENCE AND TECHNOLOGY |
|
TR01 | Transfer of patent right |