CN113673559A - Video character space-time feature extraction method based on residual error network - Google Patents

Video character space-time feature extraction method based on residual error network Download PDF

Info

Publication number
CN113673559A
CN113673559A CN202110793379.6A CN202110793379A CN113673559A CN 113673559 A CN113673559 A CN 113673559A CN 202110793379 A CN202110793379 A CN 202110793379A CN 113673559 A CN113673559 A CN 113673559A
Authority
CN
China
Prior art keywords
residual
residual error
convolution
network
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110793379.6A
Other languages
Chinese (zh)
Other versions
CN113673559B (en
Inventor
陈志�
江婧
岳文静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110793379.6A priority Critical patent/CN113673559B/en
Publication of CN113673559A publication Critical patent/CN113673559A/en
Application granted granted Critical
Publication of CN113673559B publication Critical patent/CN113673559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a method for extracting space-time characteristics of video characters based on a residual error network, which solves the problems of high calculation cost and large memory requirement in the process of extracting the space-time characteristics in a video. The invention first decomposes the 3D filter into spatial and temporal forms, then designs three different forms of residual blocks for the decomposed (2D +1D) form convolution kernel based on a residual network, and then designs each residual block at a different location of the entire ResNet structure. And finally, combining the video image with a designed hourglass structure with deep convolution added at the tail end of a residual path to form a new 3D residual network to extract the space-time characteristics of the video character. The invention can enhance the diversity of the network structure, so that the whole network can be universally used for various video analysis tasks and the performance and the time efficiency are improved.

Description

Video character space-time feature extraction method based on residual error network
Technical Field
The invention belongs to the technical field of computer vision domains, and particularly relates to a video character space-time feature extraction method based on a residual error network.
Background
At present, in the field of image recognition, especially a feature extraction part is a big hot spot in the field of computer vision. The quality of feature extraction has a great influence on generalization capability, and the work aims to establish features which provide some image information and are not redundant from a set of initial data, so as to promote subsequent detection or classification tasks.
Most of feature extraction methods are for feature extraction processing of images, and the main methods are as follows: HOG (histogram of oriented gradients), SIFT (scale invariant feature transform), HAAR, etc. The current method for directly extracting features from video is TSN (time slot network), C3D.
The TSN network is composed of a time flow convolution network and a space flow convolution network, the TSN randomly samples a plurality of segments from a given video, then each selected segment makes a preliminary judgment on the type according to the information of the segment, and finally a final video prediction result is obtained according to the segments in a comprehensive mode. TSN is modeled for a long-range time mechanism, using sparse sampling strategies and video-level surveillance to make a given video learning efficiency most efficient and effective.
Another C3D uses 3D convolutional neural network to construct a network structure, which is more suitable for extracting spatio-temporal features than 2D convolutional neural network, 2D convolutional neural network ignores information in time after each operation, while 3D convolutional and pooling operations are more effective to model time information, C3D is the best learner, and the most effective convolution kernel size is 3 x 3.
With the intellectualization of various devices and the rapid growth of multimedia on the internet, videos slowly become a brand new communication mode between users, which is a great test for encouraging the development of the leading-edge technology and the development of the advanced technology. The video is composed of many time series of frames, which is more complex than the picture video, and the shot switching is frequent, which adds difficulty to training a universal powerful classifier for extracting spatio-temporal features, the spatio-temporal information can be extracted from the video by using a common method to train a new 3D convolutional neural network, the time information between the present and continuous frames of each video frame can be accessed, but the training of a 3DCNN network from scratch is computationally expensive, and the size of the model is increased by 2 times compared with that of 2 DCNN. These problems are all problems that are now urgently sought to be solved.
Disclosure of Invention
The technical problem is as follows: the invention aims to solve the technical problem of designing a novel loss function in a crowded scene to improve the robustness of the problem of multi-person posture estimation, and provides a video character space-time feature extraction method based on a residual error network.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:
a video character space-time feature extraction method based on a residual error network comprises the following steps:
step 1) inputting a video V, wherein the video V is a multi-person video comprising two or more persons, the video size is c multiplied by n multiplied by h multiplied by w, wherein c is the number of channels, n is the number of frames in a single video, and h and w are the height and width of each frame;
step 2) decomposing the 3D convolution filter with size 3 x 3 into spatial and temporal (2D +1D) forms, i.e. a spatial 2D convolution filter and a temporal 1D convolution filter, using the 1 x 3 convolution filter and the 3 x 1 convolution filter instead of the 3 x 3 convolution filter;
step 3) combining the decoupled spatial 2-dimensional convolution filter and temporal 1-dimensional convolution filter with a residual error network, and designing 3 different 3D residual error blocks: the device comprises a 3D serial residual block, a 3D parallel residual block and a 3D serial-parallel residual block;
and 4) respectively combining the 3 kinds of residual blocks in the step 3) with the hourglass structure, positioning shortcuts to be connected with high-dimensional representation, and obtaining 3 kinds of hourglass residual structures: the sandglass residual error serial structure HRS-I, the sandglass residual error parallel structure HRS-II and the sandglass residual error serial and parallel structure HRS-III;
step 5) respectively fusing the 3 hourglass residual error structures in the step 4) into a residual error network to form three new residual error networks; combining the 3 hourglass residual error structures in the step 4) and then fusing the combined hourglass residual error structures into a residual error network to form another new residual error network; comparing the four obtained residual error networks to obtain a residual error network with the best performance;
step 6) training the residual error network with the best performance obtained in the step 5) on a gpu of 1080ti by using a data set, wherein 70% of the data set is used as a training set, 10% of the data set is used as a verification set, and 20% of the data set is used as a test set;
and 7) carrying out space-time feature extraction on the video V by using the trained new residual error network.
Further, the step 3) specifically comprises the following steps:
step 31) setting a residual function to be F (x)l),H(xl)=F(xl)+xl+1Wherein H (x)l) Features learned for residual networks, xl+1Is the output of the l residual unit;
step 32) setting F (x)l) When equal to 0, H (x)l)=xlThe output x of the ith residual unit can be obtainedl+1=xl+F'*xlWherein F'. xlRepresents the result of performing a residual function F on x;
step 33) designing a series residual block to connect the one-dimensional convolution filter and the two-dimensional convolution filter in a series mode; let the residual function be T (S (x)l) Output is represented as x)l+1=xl(1+ T 'S'), where T denotes the use of a one-dimensional filter, S denotes the use of a two-dimensional filter, and T ', S' are the results of performing residual function T, S, respectively;
step 34) designing a parallel residual block, arranging two convolution filters on different paths in parallel, so that the two convolution filters have no direct influence and only indirect influence, and accumulating the two convolution filters into a final output; let the residual function be T (x)l)+S(xl) The output is represented as xl+1=xl(1+T'+s’);
Step 35) designing serial-parallel residual blocks, and simultaneously constructing direct influence between the one-dimensional convolution filter and the two-dimensional convolution filter and final output to realize quick connection of the spatial dimension sum of the serial residual blocks; let the residual function be S (x)l)+T(S(xl) Output is represented as x)l+1=xl(1+T's'+s')。
Further, the step 4) specifically includes the following steps:
step 41) in order to ensure that the shortcut is connected with the high-dimensional representation, reversing the sequence of two point-by-point convolutions, wherein the point-by-point convolutions are convolutions which are performed by 1 x 1, and extracting the features on a single point to obtain a feature map;
step 42) provide
Figure BDA0003161896890000031
In order to input the tensor,
Figure BDA0003161896890000032
is the output tensor of the residual structure, where Df×DfXm is the size of the characteristic map obtained in the input step 41), regardless of the depth convolution layer and the active layer, the hourglass structure is represented as:
Figure BDA0003161896890000033
wherein
Figure BDA0003161896890000034
For the point convolution of the channel spread,
Figure BDA0003161896890000035
convolution for the point with reduced channel;
step 43) adding depth convolution at the tail end of the residual path, and designing point direction convolution in the middle of the depth convolution; the hourglass structure can be expressed as:
Figure BDA0003161896890000036
wherein
Figure BDA0003161896890000037
For the 1 st point direction convolution,
Figure BDA0003161896890000038
convolution for the 1 st depth direction; wherein
Figure BDA0003161896890000039
For the 2 nd point direction convolution,
Figure BDA00031618968900000310
is the 2 nd depth direction convolution.
Further, the step 5) specifically comprises the following steps:
step 51) setting three residual blocks which are combined with the hourglass structure and then respectively called as a serial hourglass residual structure HRS-I, a parallel hourglass residual structure HRS-II and a serial hourglass residual structure HRS-III, and respectively replacing all residual units in ResNet-50 with HRS-I, HRS-II and HRS-III to form three new residual networks;
step 52) sequentially forming a new hourglass residual error structure chain by the HRS-I, HRS-II and the HRS-III to replace all residual error units in the ResNet-50 to obtain a new residual error network
And 53) comparing the three new residual error networks formed in the step 51) with the residual error network obtained in the step 52) to obtain the residual error network with the best performance.
Further, in the step 6), the residual error network with the best performance obtained in the step 5) is efficiently trained, and 5 short videos of 5 seconds are randomly selected from each video.
Further, in the step 6), a new residual network is trained, and the droop rate is empirically set to 0.1.
Further, in the step 6), a new residual error network is trained, and the learning rate is initialized to 0.001 according to experience.
Has the advantages that: compared with the prior art, the invention has the following beneficial effects:
according to the method, the 3D filter is decomposed into a spatial form and a temporal form, then the decomposed (2D +1D) form of the residual block is designed, three forms of the residual block are designed, and then the residual block and the designed hourglass structure with the deep convolution added at the tail end of the residual path are combined to form a new 3D residual network for carrying out space-time characteristic extraction.
Drawings
FIG. 1 is a flow chart of a method for extracting spatiotemporal features of video characters based on a residual error network.
Fig. 2 is a graph of the decoupled (2D +1D) version in conjunction with a residual network.
Fig. 3 is a diagram of a combination of a residual block and an hourglass configuration.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
a video character space-time feature extraction method based on a residual error network comprises the following steps:
FIG. 1 is a flow chart of a method for extracting spatiotemporal features of video characters based on a residual error network, and firstly, a clipped video is input, the size of the video is c multiplied by n multiplied by h multiplied by w, wherein c is the number of channels, n is the number of frames in a single video, and h and w are the height and width of each frame. The video is acquired in a large data set, Sports-1m, then a 3D convolution filter with size 3 x 3 is decomposed into spatial and temporal (2D +1D) forms, and the convolution filter of 3 x 3 is replaced by a convolution filter of 1 x 3 and a convolution filter of 3 x 1. The decoupled (2D +1D) version is then combined with the residual network, the combined network being shown in fig. 2. 3 different 3D residual blocks were designed: the device comprises a 3D serial residual block, a 3D parallel residual block and a 3D serial-parallel residual block;
and then designing a residual error network with an hourglass structure similar to a classic bottleneck structure, wherein the hourglass residual error structure is different from the bottleneck structure, and adding deep convolution at the tail end of a residual error path. The deconstruction shown in fig. 2 is combined with the hourglass structure and shortcuts are put to the connected high-dimensional representation. The method has the advantages that shortcut connection high-dimensional representation is ensured, the sequence of two point-by-point convolutions is reversed, the point-by-point convolutions are performed by 1 x 1 convolution, feature extraction on a single point is performed, then deep convolutions are added at the tail end of a residual path, point direction convolutions are designed in the middle of the deep convolutions, loss in module feature extraction can be reduced due to the fact that the two deep convolutions are performed in a high-dimensional space, and richer feature representations are extracted.
And then, fusing the hourglass residual structure into a residual network to form a new 3D residual network, training the new network on a Sprots-1M data set, and randomly selecting 5 short videos of 5 seconds from each video. During training, the mini-batch is set to 128 frames/clip and the discharge rate is set to 0.1. The learning rate is also initialized to 0.001, 60K per iteration divided by 10.
And finally, performing space-time feature extraction on the video by using the trained network.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (7)

1. A video character space-time feature extraction method based on a residual error network is characterized by comprising the following steps:
step 1) inputting a video V, wherein the video V is a multi-person video comprising two or more persons, the video size is c multiplied by n multiplied by h multiplied by w, wherein c is the number of channels, n is the number of frames in a single video, and h and w are the height and width of each frame;
step 2) decomposing the 3D convolution filter with size 3 x 3 into spatial and temporal (2D +1D) forms, i.e. a spatial 2D convolution filter and a temporal 1D convolution filter, using the 1 x 3 convolution filter and the 3 x 1 convolution filter instead of the 3 x 3 convolution filter;
step 3) combining the decoupled spatial 2-dimensional convolution filter and temporal 1-dimensional convolution filter with a residual error network, and designing 3 different 3D residual error blocks: the device comprises a 3D serial residual block, a 3D parallel residual block and a 3D serial-parallel residual block;
and 4) respectively combining the 3 kinds of residual blocks in the step 3) with the hourglass structure, positioning shortcuts to be connected with high-dimensional representation, and obtaining 3 kinds of hourglass residual structures: the sandglass residual error serial structure HRS-I, the sandglass residual error parallel structure HRS-II and the sandglass residual error serial and parallel structure HRS-III;
step 5) respectively fusing the 3 hourglass residual error structures in the step 4) into a residual error network to form three new residual error networks; combining the 3 hourglass residual error structures in the step 4) and then fusing the combined hourglass residual error structures into a residual error network to form another new residual error network; comparing the four obtained residual error networks to obtain a residual error network with the best performance;
step 6) training the residual error network with the best performance obtained in the step 5) on a gpu of 1080ti by using a data set, wherein 70% of the data set is used as a training set, 10% of the data set is used as a verification set, and 20% of the data set is used as a test set;
and 7) carrying out space-time feature extraction on the video V by using the trained new residual error network.
2. The method as claimed in claim 1, wherein the step 3) comprises the following steps:
step 31) setting a residual function to be F (x)l),H(xl)=F(xl)+xl+1Wherein H (x)l) Features learned for residual networks, xl+1Is the output of the l residual unit;
step 32) setting F (x)l) When equal to 0, H (x)l)=xlThe output x of the ith residual unit can be obtainedl+1=xl+F′*xlWherein F'. xlRepresents the result of performing a residual function F on x;
step 33) designing a series residual block to connect the one-dimensional convolution filter and the two-dimensional convolution filter in a series mode; let the residual function be T (S (x)l) Output is represented as x)l+1=xl(1+ T 'S'), where T denotes the use of a one-dimensional filter, S denotes the use of a two-dimensional filter, and T ', S' are the results of performing residual function T, S, respectively;
step 34) designing a parallel residual block, arranging two convolution filters on different paths in parallel, so that the two convolution filters have no direct influence and only indirect influence, and accumulating the two convolution filters into a final output; let the residual function be T (x)l)+S(xl) The output is expressed asxl+1=xl(1+T′+s′);
Step 35) designing serial-parallel residual blocks, and simultaneously constructing direct influence between the one-dimensional convolution filter and the two-dimensional convolution filter and final output to realize quick connection of the spatial dimension sum of the serial residual blocks; let the residual function be S (x)l)+T(S(xl) Output is represented as x)l+1=xl(1+T′s′+s′)。
3. The method for extracting spatiotemporal features of video people based on residual error network as claimed in claim 1, wherein said step 4) comprises the following steps:
step 41) in order to ensure that the shortcut is connected with the high-dimensional representation, reversing the sequence of two point-by-point convolutions, wherein the point-by-point convolutions are convolutions which are performed by 1 x 1, and extracting the features on a single point to obtain a feature map;
step 42) provide
Figure FDA0003161896880000021
In order to input the tensor,
Figure FDA0003161896880000022
is the output tensor of the residual structure, where Df×DfXm is the size of the characteristic map obtained in the input step 41), regardless of the depth convolution layer and the active layer, the hourglass structure is represented as:
Figure FDA0003161896880000023
wherein
Figure FDA0003161896880000024
For the point convolution of the channel spread,
Figure FDA0003161896880000025
convolution for the point with reduced channel;
step 43) adding depth convolution at the tail end of the residual path, and designing point direction convolution in the middle of the depth convolution; the hourglass structure can be expressed as:
Figure FDA0003161896880000026
wherein
Figure FDA0003161896880000027
For the 1 st point direction convolution,
Figure FDA0003161896880000028
convolution for the 1 st depth direction; wherein
Figure FDA0003161896880000029
For the 2 nd point direction convolution,
Figure FDA00031618968800000210
is the 2 nd depth direction convolution.
4. The method as claimed in claim 1, wherein the step 5) comprises the following steps:
step 51) setting three residual blocks which are combined with the hourglass structure and then respectively called as a serial hourglass residual structure HRS-I, a parallel hourglass residual structure HRS-II and a serial hourglass residual structure HRS-III, and respectively replacing all residual units in ResNet-50 with HRS-I, HRS-II and HRS-III to form three new residual networks;
step 52) sequentially forming a new hourglass residual error structure chain by the HRS-I, HRS-II and the HRS-III to replace all residual error units in the ResNet-50 to obtain a new residual error network
And 53) comparing the three new residual error networks formed in the step 51) with the residual error network obtained in the step 52) to obtain the residual error network with the best performance.
5. The method as claimed in claim 1, wherein the residual network with the best performance obtained in step 5) is efficiently trained in step 6), and 5 short videos of 5 seconds are randomly selected from each video.
6. The method as claimed in claim 1, wherein the step 6) trains a new residual network, and empirically sets a droop rate to 0.1.
7. The method as claimed in claim 1, wherein the step 6) trains a new residual network, and the learning rate is initialized to 0.001 according to experience.
CN202110793379.6A 2021-07-14 2021-07-14 Video character space-time characteristic extraction method based on residual error network Active CN113673559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110793379.6A CN113673559B (en) 2021-07-14 2021-07-14 Video character space-time characteristic extraction method based on residual error network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110793379.6A CN113673559B (en) 2021-07-14 2021-07-14 Video character space-time characteristic extraction method based on residual error network

Publications (2)

Publication Number Publication Date
CN113673559A true CN113673559A (en) 2021-11-19
CN113673559B CN113673559B (en) 2023-08-25

Family

ID=78539265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110793379.6A Active CN113673559B (en) 2021-07-14 2021-07-14 Video character space-time characteristic extraction method based on residual error network

Country Status (1)

Country Link
CN (1) CN113673559B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137388A1 (en) * 2016-11-14 2018-05-17 Samsung Electronics Co., Ltd. Method and apparatus for analyzing facial image
CN112149504A (en) * 2020-08-21 2020-12-29 浙江理工大学 Motion video identification method combining residual error network and attention of mixed convolution
CN112348766A (en) * 2020-11-06 2021-02-09 天津大学 Progressive feature stream depth fusion network for surveillance video enhancement
CN112883929A (en) * 2021-03-26 2021-06-01 全球能源互联网研究院有限公司 Online video abnormal behavior detection model training and abnormal detection method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137388A1 (en) * 2016-11-14 2018-05-17 Samsung Electronics Co., Ltd. Method and apparatus for analyzing facial image
CN112149504A (en) * 2020-08-21 2020-12-29 浙江理工大学 Motion video identification method combining residual error network and attention of mixed convolution
CN112348766A (en) * 2020-11-06 2021-02-09 天津大学 Progressive feature stream depth fusion network for surveillance video enhancement
CN112883929A (en) * 2021-03-26 2021-06-01 全球能源互联网研究院有限公司 Online video abnormal behavior detection model training and abnormal detection method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
谈咏东;王永雄;陈姝意;缪银龙;: "(2+1)D多时空信息融合模型及在行为识别的应用", 信息与控制, no. 06 *
郭明祥;宋全军;徐湛楠;董俊;谢成军;: "基于三维残差稠密网络的人体行为识别算法", 计算机应用, no. 12 *

Also Published As

Publication number Publication date
CN113673559B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
CN107273800B (en) Attention mechanism-based motion recognition method for convolutional recurrent neural network
CN109389055B (en) Video classification method based on mixed convolution and attention mechanism
CN105095862B (en) A kind of human motion recognition method based on depth convolution condition random field
CN111259782B (en) Video behavior identification method based on mixed multi-scale time sequence separable convolution operation
Tran et al. Two-stream flow-guided convolutional attention networks for action recognition
CN112149504A (en) Motion video identification method combining residual error network and attention of mixed convolution
CN110378208B (en) Behavior identification method based on deep residual error network
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
Kulhare et al. Key frame extraction for salient activity recognition
CN106650617A (en) Pedestrian abnormity identification method based on probabilistic latent semantic analysis
Wang et al. Cascade multi-head attention networks for action recognition
CN113920581A (en) Method for recognizing motion in video by using space-time convolution attention network
CN115393396B (en) Unmanned aerial vehicle target tracking method based on mask pre-training
CN111626178B (en) Compressed domain video motion recognition method and system based on new spatio-temporal feature stream
Salem et al. Semantic image inpainting using self-learning encoder-decoder and adversarial loss
Muzammul et al. A survey on deep domain adaptation and tiny object detection challenges, techniques and datasets
Omi et al. Model-agnostic multi-domain learning with domain-specific adapters for action recognition
CN109002808B (en) Human behavior recognition method and system
Mullick et al. Learning deep and compact models for gesture recognition
CN114863520B (en) Video expression recognition method based on C3D-SA
CN114612305B (en) Event-driven video super-resolution method based on stereogram modeling
CN113673559B (en) Video character space-time characteristic extraction method based on residual error network
CN111881794B (en) Video behavior recognition method and system
Chaturvedi et al. Constrained manifold learning for videos
CN113283400A (en) Skeleton action identification method based on selective hypergraph convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant