CN113139467A - Hierarchical structure-based fine-grained video action identification method - Google Patents

Hierarchical structure-based fine-grained video action identification method Download PDF

Info

Publication number
CN113139467A
CN113139467A CN202110444382.7A CN202110444382A CN113139467A CN 113139467 A CN113139467 A CN 113139467A CN 202110444382 A CN202110444382 A CN 202110444382A CN 113139467 A CN113139467 A CN 113139467A
Authority
CN
China
Prior art keywords
video
grained
fine
time sequence
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110444382.7A
Other languages
Chinese (zh)
Other versions
CN113139467B (en
Inventor
杨旸
杨文涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110444382.7A priority Critical patent/CN113139467B/en
Publication of CN113139467A publication Critical patent/CN113139467A/en
Application granted granted Critical
Publication of CN113139467B publication Critical patent/CN113139467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The fine-grained video motion recognition method based on the hierarchical structure aims at realizing fine-grained motion recognition in a video, and specifically comprises a two-stage process: the first stage identifies the large category of the action in a certain long-time video, and on the basis, the second stage identifies the fine-grained action; the method comprises the following specific steps: step one, data grading processing and feature extraction; secondly, extracting video characterization features; thirdly, inter-segment fusion, double-stream fusion and prediction; fourthly, extracting fine-grained action features; and fifthly, predicting and classifying fine-grained actions. The method is applied to fine-grained action classification, and can effectively finish identification and classification on fine-grained video actions.

Description

Hierarchical structure-based fine-grained video action identification method
Technical Field
The invention relates to the field of behavior recognition, in particular to a fine-grained video motion recognition method based on a hierarchical structure.
Background
Behavior recognition algorithms are a fundamental research problem in the field of computer vision, and the main content of the algorithm is to analyze human behaviors in videos, and generally classify human actions in given videos. Behavior recognition has been applied to many aspects of life, such as social monitoring, public safety, human-computer interaction, smart home, and the like. Many behavior recognition algorithms have been proposed, but it is still a challenging task to obtain better video representation and more detailed fine-grained motion recognition.
The best performing algorithms before deep learning enters the field of behavior recognition are the Dense trajectory method DT (Dense trajectories) [1] and the modified Dense trajectory method iDT (advanced Dense trajectories) [2 ]. The symbolic work of deep learning applied to the field of behavior recognition is the proposal of two stream network [3 ]. The double-flow network processes the video into a space flow (representing target) and a time sequence flow (representing action), and finally, the double flows are fused to obtain a classification result. The TSN (temporal Segment networks) 4 network is also a dual-stream fusion mode based on spatial stream time sequence stream, but it is a parallel operation of multiple networks, and finally, the fusion between segments and the dual-stream fusion are performed. In addition to the dual stream concept, 3D networks are also applied in behavior recognition. For example, C3D (conditional 3D) network [5] proposed 3D ConvNets to train on large-scale video data sets to learn spatio-temporal features of video, choosing the size 3 × 3 of the optimal convolution kernel. Both appearance and motion information can be modeled using C3D. In addition, there are skeleton-based behavior recognition methods, such as behavior recognition using a spatio-temporal graph convolutional network [6 ]. The algorithm models dynamic bones based on a time series representation of human joint positions and extends graph convolution into a space-time graph convolution network to capture such space-time variation relationships. The fine-grained actions have high similarity in scenes, clothes and postures, so that the algorithm is not strong in applicability, and meanwhile, the algorithm for classifying the fine-grained actions is relatively few.
[1]Heng Wang,Alexander
Figure BDA0003036200510000021
Cordelia Schmid,et al.Action Recognition by Dense Trajectories.The IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2011,Colorado Springs,United States.pp.3169-3176.
[2]Wang H,Schmid C.Action Recognition with Improved Trajectories.Proceedings of the 2013 IEEE International Conference on Computer Vision.IEEE,2013.
[3]Simonyan K,Zisserman A.Two Stream Convolutional Networks for Action Recognition in Videos,Advances in neural information processing systems,2014.
[4]Wang L,Xiong Y,Wang Z,et al.Temporal Segment Networks:Towards Good Practices for Deep Action Recognition European Conference on Computer Vision 2016.
[5]Tran D,Bourdev L,Fergus R,et al.Learning Spatio temporal Features with 3D Convolutional Networks.2015IEEE International Conference on Computer Vision(ICCV).
[6]Yan S,Xiong Y,Lin D.Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition.The IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2018.
Disclosure of Invention
In order to solve the problems in the prior art in fine-grained behavior recognition, the invention provides a fine-grained video action recognition method based on a hierarchical structure.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the method comprises the steps of firstly, carrying out hierarchical data processing on a long-time-sequence video, extracting a frame of RGB image and extracting optical flow information near the frame for each section after the long-time-sequence video is segmented; secondly, sending a plurality of video frames and optical flow characteristics of the long time sequence video into a plurality of parallel double-flow networks for characteristic extraction, wherein each double-flow network consists of a space flow and a time sequence flow; thirdly, fusing the segments by a plurality of parallel networks, then fusing the spatial stream and the time sequence stream, giving higher weight to the spatial stream during the fusion, and outputting the major category of the video action by the fusion information through a prediction function; fourthly, after the large-class motion recognition is completed, the fine-grained motion obtained by the hierarchical data processing is recognized, and on the basis of the known large-class motion to which the fine-grained motion belongs, one frame of image and interframe optical flow information are extracted from each section of fine-grained motion and input into the double-flow network; fifthly, performing double-flow fusion on double-flow network output, giving higher weight to the time sequence flow during fusion, and performing video fine-grained action identification through a prediction function; the first stage of the two stages includes the first to third steps, and the second stage includes the fourth to fifth steps.
The long-time-sequence video hierarchical data processing method comprises the following specific implementation processes: the data processing of an original input video is hierarchical, and long-time sequence video sampling multi-frame information of a certain complete action is used as the representation of the video and comprises multi-frame images and inter-frame optical flow information; and then, the long-time-sequence action video is divided into a plurality of fine-grained action segments, each video segment comprises a section of fine-grained action, and each section of fine-grained action video samples one frame of information to be used as the representation of the current fine-grained action video segment.
The method comprises the following steps of sending a plurality of video frames and optical flow characteristics of the long time sequence video into a plurality of parallel double-flow networks for characteristic extraction, wherein the specific implementation process comprises the following steps: the video feature processing structure is in a layered double-stage mode, wherein a first stage processes multi-frame RGB images and inter-frame optical flow information obtained by long-time video sampling, and a plurality of double-flow networks perform feature extraction in parallel.
Extracting a frame of image and interframe optical flow information from each fine-grained action and inputting the extracted frame of image and interframe optical flow information into a double-flow network, wherein the specific implementation process comprises the following steps: the video feature processing structure is a hierarchical two-stage structure, wherein the second stage processes a frame of image and optical flow information of fine-grained motion video sampling, and a single network is used for feature extraction.
The multiple parallel networks are fused between segments, then the spatial stream and the time sequence stream are fused, the spatial stream is endowed with higher weight during fusion, and the specific implementation process is as follows: the processing weights are different during the spatial time sequence fusion of the two stages, after the multiple parallel network segments are fused during the large class identification of the first stage, the spatial characteristics relative to the time sequence characteristics occupy the main position in the large class identification, and the spatial stream has higher weight than the time sequence stream when the weighted fusion is adopted for the spatial stream time sequence stream.
The double-flow network output performs double-flow fusion, and gives higher weight to the time sequence flow during fusion, and the specific implementation process is as follows: the processing weights are different during the spatial time sequence fusion of the two stages, when fine-grained action recognition is carried out in the second stage, due to the fact that spatial information is close, time sequence characteristics and spatial characteristics occupy the main position in the fine-grained action recognition, and time sequence flows have higher weights than spatial flows in the process that the spatial flow time sequence flows adopt weighting fusion.
Compared with the prior art, the invention has the following innovation points:
because fine-grained human body actions often have higher similarity in scenes, clothes, postures and even motion trajectories, the traditional video behavior recognition algorithm is not ideal in fine-grained action classification effect. The invention provides a novel hierarchical double-stage fine-grained behavior recognition method, on the basis of a hierarchical data processing and double-stage feature processing structure, a first stage recognizes the category of a fine-grained action through extracted video features, and space flow is given higher weight in the process; and the second stage completes fine-grained action identification on the basis of the known large category, and the process gives higher weight to the time sequence flow. Compared with the traditional behavior recognition algorithm for fine-grained action recognition, the method can achieve better recognition effect.
Drawings
Fig. 1 is a flowchart of a fine-grained video motion recognition method for a dual-flow network according to the present invention.
Fig. 2(a) shows extraction of RGB frames of a certain video, fig. 2(b) shows a horizontal component of current frame optical flow information, and fig. 2(c) shows a vertical component of current frame optical flow information.
Fig. 3 is a structural diagram of a two-stage hierarchical fine-grained motion recognition method.
Fig. 4 is a basic flow of the first stage large class category identification.
Fig. 5 is a basic flow of the second stage fine-grained action recognition.
Detailed Description
The invention is described in further detail below with reference to the following figures and embodiments:
as shown in fig. 1, the fine-grained video motion recognition method based on the hierarchical structure is specifically implemented as a two-stage process: the method comprises the steps that in the first stage, a large category such as archery is identified to which actions in a certain long-time video belong; and on the basis, the second stage identifies fine-grained actions, such as the process of drawing a bow by the fine-grained actions in archery movement. The method comprises the following specific steps: step one, performing hierarchical data processing on a long-time sequence video, and extracting an RGB image of each section and optical flow information near the frame after the long-time sequence video is segmented; secondly, sending a plurality of video frames and optical flow characteristics into a plurality of parallel double-flow networks for characteristic extraction, wherein each double-flow network consists of a space flow and a time sequence flow; thirdly, fusing the segments by a plurality of parallel networks, then fusing the spatial stream and the time sequence stream, giving higher weight to the spatial stream during the fusion, and outputting the major category of the video action by the fusion information through a prediction function; fourthly, after the large-class motion recognition is completed, the fine-grained motion obtained by the hierarchical data processing is recognized, and on the basis of the known large-class motion to which the fine-grained motion belongs, one frame of image and interframe optical flow information are extracted from each section of fine-grained motion and input into the double-flow network; and fifthly, performing double-flow fusion on the double-flow network output, giving higher weight to the time sequence flow during fusion, and performing video fine-grained action identification through a prediction function.
The fine-grained video motion recognition method based on the hierarchical structure is characterized in that the whole two-stage process is shown in FIG. 3. The first stage comprises a first step to a third step, and the specific frame structure is shown in FIG. 4; the second stage includes the fourth step to the fifth step, and the specific framework is shown in fig. 5. The specific process of each step is described in detail below.
The first step is as follows: data classification processing and feature extraction
Firstly, video data is processed in a grading mode, and fine-grained division is carried out on a long-time sequence video to obtain each fine-grained action segment forming the long-time sequence video. When the feature frame is sampled for long-time sequence video modeling, the extracted features cannot completely contain information required by behavior recognition due to too small sampling rate, and feature information redundancy is caused due to too large sampling rate, so that the computational complexity is improved. Therefore, a sparse sampling method is adopted to equally divide the video into several independent video segments according to time length, specifically, a video segment is divided into K segments, and the K segments are marked as { S1,S2,…,SkRandomly sampling each fragment to obtain an RGB frame so as to represent the spatial information of the video; based on the above samplingThe video frame obtains the optical flow information of the current frame and the frames nearby the current frame so as to represent the motion information of the video. Processing all K fragments to obtain the representation of each video segment, and recording the representation as { T }1,T2,…,TkAnd each element comprises the spatial characteristics and the time sequence motion characteristics of the video. Fig. 2(a), 2(b) and 2(c) are extracted video representations, where fig. 2(a) represents the extracted RGB frames, fig. 2(b) is the horizontal component of the optical flow information, and fig. 2(c) is the vertical component of the optical flow information.
The second step is that: video characterization feature extraction
And (3) inputting the video representation extracted in the step (1) into a plurality of parallel double-flow networks, wherein each double-flow network is composed of two branches of a time sequence flow and a space flow. The spatial characteristics of the video, namely RGB frame information input spatial stream, are subjected to characteristic extraction; the time sequence characteristic of the video, namely the optical flow information is input into the time sequence flow for characteristic extraction. The concrete expression is as follows: applying a network with parameter w to segment TkScore of post-output network, denoted as F (T)k,w)。
The third step: intersegment fusion, dual stream fusion and prediction
After the features extracted from the multiple video segments are obtained in the step 2, the aggregation function is adopted to fuse the network prediction scores among the multiple segments, and the method is specifically represented as follows:
G=G(F(T1,w),F(T2,w),…,F(Tk,w)) (1)
and G is an aggregation function among a plurality of video segments, the specific form adopts an average pooling function, and the average of the network output scores belonging to the same class is taken as the final network score of the current class. Meanwhile, the network adopts a variant cross entropy loss function, which is defined as:
Figure BDA0003036200510000061
where y is the true value, G is the aggregation function between multiple video segments, C is the number of categories, and subscripts i, j are the category indices.
And performing double-stream fusion after obtaining the respective category prediction scores of the spatial stream and the time sequence stream, and specifically adopting a weighted average form to obtain the category prediction score of each large class, wherein the step aims at performing large class identification, and a strategy of giving higher weight to the spatial stream is adopted for realizing the distinction between different large classes, and the weight is specifically expressed as that the spatial stream is 2:1 in comparison with the time sequence stream. And on the basis of obtaining the class prediction score, performing probability prediction on each class by adopting a prediction function H, wherein the form of H specifically adopts a general softmax function.
The fourth step: fine-grained motion feature extraction
And 3, under the condition that the category of the fine-grained motion belongs to the large category obtained in the step 3, performing similar operation on the fine-grained motion obtained by data classification processing, extracting single video frames and interframe optical flows of fine-grained motion segments, representing spatial target information and time sequence motion information according to the single video frames and interframe optical flows, and inputting the spatial target information and the time sequence motion information into a double-flow network for feature extraction. The basic feature extraction network is a BN-inclusion block, so that convergence is accelerated, overfitting is restrained, and dropout operation is introduced; in order to solve the problem that the amount of fine-grained action data is relatively small, data expansion operations including random cutting, horizontal turning, corner cutting and multi-scale cutting are adopted.
The fifth step: fine-grained action prediction classification
And 4, performing fusion between the two streams after the output of the two-stream network with the fine-grained action is obtained through the step 4, and considering that the motion information contained in the time-series stream is the key for distinguishing the fine-grained action under the condition that the spatial background and the target appearance characteristics of the fine-grained action are similar, performing weighted average on the two-stream fusion is to give higher weight to the time-series stream, and specifically, the weight of the spatial stream to the time-series stream is 1: 2. And performing probability prediction by a prediction function softmax after the double-flow fusion, and outputting a final fine-grained action category.

Claims (6)

1. The fine-grained video motion recognition method based on the hierarchical structure is characterized by comprising the following steps: the fine-grained action recognition is composed of two stages, wherein the first stage recognizes large categories, and the second stage recognizes fine-grained actions on the basis of the first stage; the method specifically comprises the following steps: step one, performing hierarchical data processing on a long-time sequence video, and extracting an RGB image of each section and optical flow information near the frame after the long-time sequence video is segmented; secondly, sending a plurality of video frames and optical flow characteristics of the long time sequence video into a plurality of parallel double-flow networks for characteristic extraction, wherein each double-flow network consists of a space flow and a time sequence flow; thirdly, fusing the segments by a plurality of parallel networks, then fusing the spatial stream and the time sequence stream, giving higher weight to the spatial stream during the fusion, and outputting the major category of the video action by the fusion information through a prediction function; fourthly, after the large-class motion recognition is completed, the fine-grained motion obtained by the hierarchical data processing is recognized, and on the basis of the known large-class motion to which the fine-grained motion belongs, one frame of image and interframe optical flow information are extracted from each section of fine-grained motion and input into the double-flow network; fifthly, performing double-flow fusion on double-flow network output, giving higher weight to the time sequence flow during fusion, and performing video fine-grained action identification through a prediction function; the first stage of the two stages includes the first to third steps, and the second stage includes the fourth to fifth steps.
2. The hierarchical structure-based fine-grained video motion recognition method according to claim 1, characterized in that: in the first step, the step of processing the long-time-sequence video by stages specifically comprises the following steps: the data processing of an original input video is hierarchical, and long-time sequence video sampling multi-frame information of a certain complete action is used as the representation of the video and comprises multi-frame images and inter-frame optical flow information; and then, the long-time-sequence action video is divided into a plurality of fine-grained action segments, each video segment comprises a section of fine-grained action, and each section of fine-grained action video samples one frame of information to be used as the representation of the current fine-grained action video segment.
3. The hierarchical structure-based fine-grained video motion recognition method according to claim 1, characterized in that: in the second step, the feature extraction is performed by sending the video frames and the optical flow features of the long-time sequence video into a plurality of parallel double-flow networks, specifically: the video feature processing structure is in a layered double-stage mode, wherein a first stage processes multi-frame RGB images and inter-frame optical flow information obtained by long-time video sampling, and a plurality of double-flow networks perform feature extraction in parallel.
4. The hierarchical structure-based fine-grained video motion recognition method according to claim 1, characterized in that: in the fourth step, extracting a frame of image and interframe optical flow information from each fine-grained action and inputting the extracted frame of image and interframe optical flow information into a double-flow network specifically comprises the following steps: the video feature processing structure is a hierarchical two-stage structure, wherein the second stage processes a frame of image and optical flow information of fine-grained motion video sampling, and a single network is used for feature extraction.
5. The hierarchical structure-based fine-grained video motion recognition method according to claim 1, characterized in that: in the third step, the multiple parallel networks perform the fusion between the segments, and then the spatial stream and the time sequence stream are fused, and the spatial stream is given higher weight during the fusion, specifically: the processing weights are different during the spatial time sequence fusion of the two stages, after the multiple parallel network segments are fused during the large class identification of the first stage, the spatial characteristics relative to the time sequence characteristics occupy the main position in the large class identification, and the spatial stream has higher weight than the time sequence stream when the spatial stream is weighted and fused.
6. The hierarchical structure-based fine-grained video motion recognition method according to claim 1, characterized in that: and fifthly, performing double-flow fusion on the double-flow network output, and giving higher weight to the time sequence flow during fusion, specifically: the processing weights are different during the spatial time sequence fusion of the two stages, when fine-grained action recognition is carried out in the second stage, due to the fact that spatial information is close, time sequence characteristics and spatial characteristics occupy the main position in the fine-grained action recognition, and time sequence flows have higher weights than spatial flows in the process that the spatial flow time sequence flows adopt weighting fusion.
CN202110444382.7A 2021-04-23 2021-04-23 Fine granularity video action recognition method based on hierarchical structure Active CN113139467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110444382.7A CN113139467B (en) 2021-04-23 2021-04-23 Fine granularity video action recognition method based on hierarchical structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110444382.7A CN113139467B (en) 2021-04-23 2021-04-23 Fine granularity video action recognition method based on hierarchical structure

Publications (2)

Publication Number Publication Date
CN113139467A true CN113139467A (en) 2021-07-20
CN113139467B CN113139467B (en) 2023-04-25

Family

ID=76811831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110444382.7A Active CN113139467B (en) 2021-04-23 2021-04-23 Fine granularity video action recognition method based on hierarchical structure

Country Status (1)

Country Link
CN (1) CN113139467B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599789A (en) * 2016-07-29 2017-04-26 北京市商汤科技开发有限公司 Video class identification method and device, data processing device and electronic device
CN107862376A (en) * 2017-10-30 2018-03-30 中山大学 A kind of human body image action identification method based on double-current neutral net
CN108280443A (en) * 2018-02-23 2018-07-13 深圳市唯特视科技有限公司 A kind of action identification method based on deep feature extraction asynchronous fusion network
CN110110686A (en) * 2019-05-14 2019-08-09 中国石油大学(华东) Based on the human motion recognition methods for losing double-current convolutional neural networks more
CN110163127A (en) * 2019-05-07 2019-08-23 国网江西省电力有限公司检修分公司 A kind of video object Activity recognition method from thick to thin
CN110598654A (en) * 2019-09-18 2019-12-20 合肥工业大学 Multi-granularity cross modal feature fusion pedestrian re-identification method and re-identification system
CN111627052A (en) * 2020-04-30 2020-09-04 沈阳工程学院 Action identification method based on double-flow space-time attention mechanism
CN112131908A (en) * 2019-06-24 2020-12-25 北京眼神智能科技有限公司 Action identification method and device based on double-flow network, storage medium and equipment
US20210081673A1 (en) * 2019-09-12 2021-03-18 Nec Laboratories America, Inc Action recognition with high-order interaction through spatial-temporal object tracking

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599789A (en) * 2016-07-29 2017-04-26 北京市商汤科技开发有限公司 Video class identification method and device, data processing device and electronic device
CN107862376A (en) * 2017-10-30 2018-03-30 中山大学 A kind of human body image action identification method based on double-current neutral net
CN108280443A (en) * 2018-02-23 2018-07-13 深圳市唯特视科技有限公司 A kind of action identification method based on deep feature extraction asynchronous fusion network
CN110163127A (en) * 2019-05-07 2019-08-23 国网江西省电力有限公司检修分公司 A kind of video object Activity recognition method from thick to thin
CN110110686A (en) * 2019-05-14 2019-08-09 中国石油大学(华东) Based on the human motion recognition methods for losing double-current convolutional neural networks more
CN112131908A (en) * 2019-06-24 2020-12-25 北京眼神智能科技有限公司 Action identification method and device based on double-flow network, storage medium and equipment
US20210081673A1 (en) * 2019-09-12 2021-03-18 Nec Laboratories America, Inc Action recognition with high-order interaction through spatial-temporal object tracking
CN110598654A (en) * 2019-09-18 2019-12-20 合肥工业大学 Multi-granularity cross modal feature fusion pedestrian re-identification method and re-identification system
CN111627052A (en) * 2020-04-30 2020-09-04 沈阳工程学院 Action identification method based on double-flow space-time attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOFAN G.等: "FINE-GRAINED ACTION RECOGNITION ON A NOVEL BASKETBALL DATASET", 《ICASSP 2020》 *
王倩 等: "基于双流卷积神经网络的时序动作定位", 《软件导刊》 *

Also Published As

Publication number Publication date
CN113139467B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN109101896B (en) Video behavior identification method based on space-time fusion characteristics and attention mechanism
CN108875624B (en) Face detection method based on multi-scale cascade dense connection neural network
EP3777207B1 (en) Content-specific neural network distribution
CN109190479A (en) A kind of video sequence expression recognition method based on interacting depth study
CN110688927B (en) Video action detection method based on time sequence convolution modeling
CN110378208B (en) Behavior identification method based on deep residual error network
CN108764148B (en) Multi-region real-time action detection method based on monitoring video
CN113239801B (en) Cross-domain action recognition method based on multi-scale feature learning and multi-level domain alignment
CN112001308B (en) Lightweight behavior identification method adopting video compression technology and skeleton features
Chen et al. Action recognition with temporal scale-invariant deep learning framework
CN109583334B (en) Action recognition method and system based on space-time correlation neural network
Wang et al. Intermediate fused network with multiple timescales for anomaly detection
CN112200096B (en) Method, device and storage medium for realizing real-time abnormal behavior identification based on compressed video
CN112036379A (en) Skeleton action identification method based on attention time pooling graph convolution
Xu et al. Prediction-cgan: Human action prediction with conditional generative adversarial networks
Wu et al. Dss-net: Dynamic self-supervised network for video anomaly detection
Komagal et al. Real time background subtraction techniques for detection of moving objects in video surveillance system
CN105956604B (en) Action identification method based on two-layer space-time neighborhood characteristics
CN113139467B (en) Fine granularity video action recognition method based on hierarchical structure
Ouyang et al. The comparison and analysis of extracting video key frame
Majhi et al. Temporal pooling in inflated 3dcnn for weakly-supervised video anomaly detection
CN112487926A (en) Scenic spot feeding behavior identification method based on space-time diagram convolutional network
CN109583335B (en) Video human behavior recognition method based on temporal-spatial information fusion
CN114120076B (en) Cross-view video gait recognition method based on gait motion estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant