CN109389035A - Low latency video actions detection method based on multiple features and frame confidence score - Google Patents

Low latency video actions detection method based on multiple features and frame confidence score Download PDF

Info

Publication number
CN109389035A
CN109389035A CN201810998778.4A CN201810998778A CN109389035A CN 109389035 A CN109389035 A CN 109389035A CN 201810998778 A CN201810998778 A CN 201810998778A CN 109389035 A CN109389035 A CN 109389035A
Authority
CN
China
Prior art keywords
frame
picture
light stream
confidence score
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810998778.4A
Other languages
Chinese (zh)
Inventor
宋砚
李泽超
孙莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN201810998778.4A priority Critical patent/CN109389035A/en
Publication of CN109389035A publication Critical patent/CN109389035A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The low latency video actions detection method based on multiple features and frame confidence score that the present invention provides a kind of, comprising: step 1, data prediction is carried out to data set, obtains RGB picture and light stream pictures;Step 2, Three dimensional convolution-deconvolution CDC neural network model is constructed;Step 3, RGB picture obtained in step 1 and light stream picture training set are separately input to be trained in the network model of step 2, obtain trained model;Step 4, the test set of RGB picture and light stream picture is respectively put into trained two models of step 3, generated after the output of two models and merged, obtain the confidence score of each frame, generation acts segment;Step 5, the movement segment obtained using step 4 is chosen the frame number of different weight percentage respectively in timing and made comparisons with true value, and low latency motion detection result is obtained.

Description

Low latency video actions detection method based on multiple features and frame confidence score
Technical field
The present invention relates to the video human motion detection technologies in computer vision technique, especially a kind of to be based on multiple features With the low latency video actions detection method of frame confidence score.
Background technique
With the reach of science and the raising of computer technology level, people have higher to the acquisition analysis of information Secondary requirement, increasingly, it is desired that computer can recognize the world, i.e. computer vision as people by vision.Human body Action recognition has become highly developed as one research hotspot of computer vision field, various investigative technique methods.Movement Detection is developed by action recognition, and the purpose is to the positions of the location action in one section of long video without editing, simultaneously It needs to provide correct label to the movement in video.
There is researcher to propose the concept of low latency detection (Low-latency Detection).Delay is originally interactive A key index in formula experiencing system refers to that user is making movement to obtaining the time difference between system feedback.It will This concept expands to identification field, it can be understood as generates from observation data to obtaining the time difference correct recognition result. We can be regarded as, and early stage, real-time, continuous and online recognition one kind is extensive.For simple, low latency movement inspection Survey refers to, for the long video of non-editing, is identified and is positioned every to the movement content having been observed that in playing process The beginning and end of one movement.The difficult point of low latency recognition detection mostlys come from two aspects: 1) the incomplete sight of data The property surveyed, i.e., will identify the type of the behavior in the case where only observing part behavioral data and need to position each Movement starts over;2) to the timeliness requirement of algorithm, that is, require algorithm that can detect as soon as possible while video acquisition And identify the type of behavior.The two difficult points cause many traditional algorithms not can be used directly in such problem.
The automatic detection of mankind's activity has much potential applications in video, supervises as video understands with detection, automatic video frequency Control and human-computer interaction etc..It further says again, many applications is required to detect activity as soon as possible.The human body of low latency Motion analysis increasingly highlights its importance in the man-machine interactive system of multiplicity.For man-machine interactive system, system The minimum of reaction delay is a very important Consideration.Excessively high delay not only seriously reduces the use of interactive system Family experience, while but also certain specific interactive systems, such as the electronic game that gesture control or enhancing perceive, forfeiture are inhaled Gravitation is to be difficult to popularize.Particularly, low latency detection is very important in terms of manufacture machine people, such as disposes a machine Before people helps a patient to stand up, need first to detect that this patient is intended to that movement done.Or it can be with for one The robot of emotion communication is carried out with the mankind, it allows for the emotional state that the mankind are accurately and rapidly found from facial expression, It appropriate in time can respond in this way.In addition, low latency detection can also make system give a forecast in advance.For example, if Early warning can be just provided when hazardous act not yet occurs, then being possible to prevent the generation of some hazard events.To sum up institute It states, the research of the human body low latency motion detection based on video just becomes a critically important research direction, has great Commercial value and realistic meaning.
Summary of the invention
The purpose of the present invention is to provide a kind of low latency video actions detection side based on multiple features and frame confidence score Method can provide complete data observation, and calculate in real time.
Realize the technical solution of the object of the invention are as follows: a kind of low latency video actions based on multiple features and frame confidence score Detection method, comprising the following steps:
Step 1, data prediction is carried out to data set, obtains RGB picture and light stream pictures;
Step 2, Three dimensional convolution-deconvolution CDC neural network model is constructed;
Step 3, RGB picture obtained in step 1 and light stream picture training set are separately input to the network model of step 2 In be trained, obtain trained model;
Step 4, the test set of RGB picture and light stream picture is respectively put into trained two models of step 3, is generated It after the output of two models and merges, obtains the confidence score of each frame, generation acts segment;
Step 5, the movement segment obtained using step 4, chosen respectively in timing the frame number of different weight percentage and with it is true Value is made comparisons, and low latency motion detection result is obtained.
Compared with prior art, it has the advantage that and needs first extraction movement segment to put with traditional complete motion detection Enter network class difference, the present invention only needs chronologically to input frame sequence in a network, can obtain the action classification of every frame, be A kind of motion detection method based on frame.Meanwhile invention introduces the loss functions of a Rank loss, can constrain mould The monotone nondecreasing that type exports a correct label detects score, so as to detect the beginning of movement as soon as possible, realizes low latency Motion detection.Also, present invention uses two kinds of data training networks, and one is RGB picture, have sufficiently used space characteristics, The other is light stream picture, has sufficiently used temporal characteristics, finally space-time training data is combined, is extracted action message, is mentioned The confidence score of high frame classification, to improve the precision of motion detection.
The invention will be further described with reference to the accompanying drawings of the specification.
Detailed description of the invention
Fig. 1 is the basic framework schematic diagram of video low latency motion detection technology.
Fig. 2 is light stream figure.
Fig. 3 is CDC network structure.
Fig. 4 is frame confidence score figure.
Specific embodiment
The present invention is described in more detail with reference to the accompanying drawing:
The present invention proposes a kind of low latency video actions detection method based on multiple features and frame confidence score, and it is more to include building Layer Three dimensional convolution network extracts RGB and light stream picture, extracts the processes such as frame confidence score, low latency detection, to the length of non-editing Video carries out a series of calculating, and the generation of movement can be detected in video display process and judges its classification.Video is low to be prolonged The basic framework of slow motion detection technology is as shown in Figure 1, the present invention is carried out according to this basic framework.
Step 1, the long video of non-editing, including training set and test set, with the picture format of png, according to 25FPS's Frame per second is read.
Step 2, as shown in Fig. 2, the continuous RGB picture read in never editing long video is obtained using TVL1 optical flow algorithm Take light stream picture.Every two frames RGB picture generates two single channel light stream pictures on one group of direction x, y by algorithm.Light stream is calculated The specific method is as follows for method:
Assuming that the gray value of a point m (x, y) is I (x, y, t) on image in moment t, after dt, which is moved To new position m'(x+dx, y+dy), which is I (x+dx, y+dy, t+dt), it is assumed that is arrived after point movement in image It is equal to the gray value of movement front position up to the gray value of position, then has:
I (x, y, t)=I (x+dx, y+dy, t+dt)
Taylor's formula expansion will be carried out on the right of equation, it may be assumed that
Wherein ε represents the infinite event of second order, due to dt → 0, ignores ε, available
If u, v is respectively velocity vector of the light stream in X-axis and Y direction, and is had It enablesA then available light stream Basic Constraint Equation:
Ixu+Iyv+It=0
In order to solve above formula unique solution u and v, it is necessary to add other constraint condition.TVL1 algorithm is according to flatness vacation If --- the movement of each pixel is distributed with the gym suit of its field point from flatness, be joined smooth item and is established light stream mould Type is as follows:
E is the energy function of optical flow estimation, and λ is the weight constant of data item,WithIt is two-dimensional gradient, passes through minimum Change energy function E solution and obtains u and v.
Step 3, CDC network is constructed, CDC network structure is as shown in Figure 3.CDC network is using C3D network structure First part of the conv1a-conv5b as CDC, wherein the pond of layer 5 is changed to 1 × 2 × 2.Then by the three-dimensional of C3D Full articulamentum after convolutional network is changed to CDC filter.CDC6 layers by the output data (512, L/8,4,4) after convolution in space Upper down-sampling, up-sampling becomes (4096, L/4,1,1) in time, and CDC7 layers up-sample CDC6 layers of output in time As (4096, L/2,1,1), then CDC8 layers upper one layer of output is continued to be up-sampled in time as (K+1, L, 1,1), Finally by the classification results of softmax layers of generation L frame.
Step 4, there are two the loss functions of whole network, one is the Classification Loss function based on cross entropy, another It is the loss function based on Rank loss.Whole loss function calculates as follows:
Wherein,It is Classification Loss function,It is Rank loss function, λrIt is a constant, is set as 6 here.Classification damage Losing function is calculated with cross entropy, as follows:
Wherein, ytIt is the true tag of t frame in training sequence,It is that t frame belongs to correct classification ytDetection score, As the softmax of network model is exported.
The present invention also proposed a Rank Loss function on the basis of being based on Classification Loss function.As shown in figure 4, During low latency detects video, it is seen that the frame number of a movement is more, belongs to the detection score of some correct classification Then can be higher, confidence level is bigger;Conversely, the detection score that this movement belongs to some error category can be lower, confidence level is smaller. Therefore, when a movement occurs, its detection score can be a monotone nondecreasing curve.According to this characteristic, if in t Interior there is no movements to change, and the detection score of t frame is not less than the detection score of former frame certainly.Therefore, one is constructed A Rank Loss function.If t frame is not acted and changed, loss function calculates as follows:
If changed in the movement of t frame, i.e., t frame and t-1 frame are not belonging to same category, and loss function calculates such as Under:
Step 5, the input of CDC network first tier is 32 frame images in video, is sliced using every 32 frame of video as one It inputs in network, (1~32), (33~64) ... frame is not overlapped as input, slice window.Use RGB picture and light Flow graph piece is put into the CDC network built respectively as training set in the manner described above and starts to train, and uses stochastic gradient Decline (SGD) optimization object function, initial learning rate is set as 1e-6, and batch size is set as 4, after 25 epoch of iteration most Two training patterns are obtained eventually.
Step 6, classified respectively to RGB and light stream test set picture using two training patterns in step 5, extracted The output of layer network second from the bottom takes maximum confidence score as this to get belonging to the confidence score of every one kind to every frame The motion detection score of class finally merges the output score of RGB picture and light stream picture as final frame confidence score.
Step 7, the classification of every frame is obtained according to the frame confidence score in step 6, in one section of successive video frames, if phase Adjacent two frames belong to same category, and just successively merging these frames becomes as small fragment.
Step 8, it if the small fragment in step 7 is close two-by-two in time series, is i.e. differed between two small fragments As soon as frame number less than 20 frames, they is continued to be merged into a large fragment, become final movement segment.
Step 9, the movement segment obtained using step 8, chooses the frame number of different weight percentage respectively in timing, and such as one The movement segment of 50 frames therefrom takes preceding 3/10 frame number, that is, preceding 15 frame of the segment is taken to carry out low latency motion detection.By this Preceding 15 frame and preceding 3/10 frame number of true movement segment do intersection, the degree of overlapping of the two are obtained, then according to different IOU thresholds Value calculates mean accuracy (AP), is averaged out classification finally to obtain mean accuracy mean value (mAP).Low latency motion detection effect Fruit is by mAP (mean accuracy mean value) come what is evaluated, if mean accuracy mean value is high, this low latency detection effect is just It is good.(that is result that Map is equivalent to the detection of this low latency).

Claims (8)

1. a kind of low latency video actions detection method based on multiple features and frame confidence score, which is characterized in that including following Step:
Step 1, data prediction is carried out to data set, obtains RGB picture and light stream pictures;
Step 2, Three dimensional convolution-deconvolution CDC neural network model is constructed;
Step 3, by RGB picture obtained in step 1 and light stream picture training set be separately input in the network model of step 2 into Row training, obtains trained model;
Step 4, the test set of RGB picture and light stream picture is respectively put into trained two models of step 3, generates two It after the output of model and merges, obtains the confidence score of each frame, generation acts segment;
Step 5, the movement segment obtained using step 4 is chosen the frame number of different weight percentage respectively in timing and made with true value Compare, obtains low latency motion detection result.
2. the method according to claim 1, wherein the step 1 specifically includes:
Step 1.1. is the long video of non-editing, including training set and test set, with the picture format of png, according to the frame of 25FPS Rate is read, as RGB pictures;
Never the continuous RGB picture read in editing long video is obtained light stream picture using TVL1 optical flow algorithm by step 1.2..
3. according to the method described in claim 2, it is characterized in that, optical flow algorithm in step 1.2 are as follows:
Step 1.2.1, it is assumed that in moment t, the gray value of a point m (x, y) is I (x, y, t) on image, after dt, is somebody's turn to do Point moves to new position m'(x+dx, y+dy), which is I (x+dx, y+dy, t+dt);
Assuming that the gray value of in-position is equal to the gray value of movement front position after point movement in image, i.e.,
I (x, y, t)=I (x+dx, y+dy, t+dt) (1)
Step 1.2.2 will carry out Taylor's formula expansion, i.e., on the right of formula (1)
Wherein, ε represents the infinite event of second order, due to dt → 0, ignores ε and obtains
Step 1.2.3 if u, v are respectively velocity vector of the light stream in X-axis and Y direction, and hasIt enablesThen obtain light stream Basic Constraint Equation:
Ixu+Iyv+It=0 (4)
Step 1.2.4, (u, v) forms light stream pictures.
4. the method according to claim 1, wherein the step 2 specifically includes the following steps:
Step 2.1, first part of the CDC network using the conv1a-conv5b of C3D network structure as CDC, wherein by the 5th The pond of layer is changed to 1 × 2 × 2;Full articulamentum after the Three dimensional convolution network of C3D is changed to CDC filter;CDC6 layers by convolution Output data (512, L/8,4,4) afterwards spatially down-sampling, in time up-sampling become (4096, L/4,1,1);CDC7 Layer up-samples CDC6 layers of output in time becomes (4096, L/2,1,1);CDC8 layers by upper one layer of output continue when Between upper up-sampling become (K+1, L, 1,1);Finally by the classification results of softmax layers of generation L frame;
Step 2.2, whole loss function L is designedt
Wherein,It is Classification Loss function,It is Rank loss function, λrIt is a constant;
Classification Loss function
Wherein, ytIt is the true tag of t frame in training sequence,It is that t frame belongs to correct classification ytDetection score;
Rank loss function is divided into what two kinds of situations calculated, changes if t frame does not act, and loss function calculates such as Formula (7)
If changed in the movement of t frame, i.e., t frame and t-1 frame are not belonging to same category, and loss function calculates in this way (8):
5. according to the method described in claim 4, it is characterized in that, the step 3 specifically includes the following steps:
The input of CDC network first tier is 32 frame images in video, is inputted in network using every 32 frame of video as a slice, (1~32), (33~64) ... frame is not overlapped as input, slice window;The RGB picture and light obtained using step 1 Flow graph piece is put into the CDC network built respectively as training set in the manner described above and starts to train, and obtains two training Model.
6. according to the method described in claim 5, it is characterized in that, the step 4 specifically includes the following steps:
Step 4.1, classified respectively to RGB and light stream test set picture using two training patterns in step 3, extraction is fallen The output of several second layer networks takes maximum confidence score as such to get belonging to the confidence score of every one kind to every frame Motion detection score, the output score of RGB picture and light stream picture is finally done into average value processing as final frame confidence score;
Step 4.2, the classification of every frame is obtained according to the frame confidence score in step 4.1, in one section of successive video frames, if phase Adjacent two frames belong to same category, and just successively merging these frames becomes as small fragment;
Step 4.3, if as soon as the frame number differed between two small fragments in step 4.2 continues them less than 20 frames It is merged into a large fragment, becomes final movement segment.
7. according to the method described in claim 6, it is characterized in that, in step 4.2 each frame can all provide the frame belong to it is each The prediction score of classification, that highest one kind of prediction score are the classification of the frame.
8. according to the method described in claim 6, it is characterized in that, the step 5 specifically includes the following steps:
Step 5.1, the movement segment obtained using step 4, chooses the frame number of different weight percentage respectively in timing, low to carry out Delay voltage detection;
Step 5.2, the frame extracted in step 5.1 is done into intersection with the same number of frames of true movement segment and obtains the overlapping of the two Degree, then calculates mean accuracy according to different IOU threshold values, is averaged out classification finally to obtain mean accuracy mean value, obtains To low latency testing result.
CN201810998778.4A 2018-08-30 2018-08-30 Low latency video actions detection method based on multiple features and frame confidence score Pending CN109389035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810998778.4A CN109389035A (en) 2018-08-30 2018-08-30 Low latency video actions detection method based on multiple features and frame confidence score

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810998778.4A CN109389035A (en) 2018-08-30 2018-08-30 Low latency video actions detection method based on multiple features and frame confidence score

Publications (1)

Publication Number Publication Date
CN109389035A true CN109389035A (en) 2019-02-26

Family

ID=65418545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810998778.4A Pending CN109389035A (en) 2018-08-30 2018-08-30 Low latency video actions detection method based on multiple features and frame confidence score

Country Status (1)

Country Link
CN (1) CN109389035A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886225A (en) * 2019-02-27 2019-06-14 浙江理工大学 A kind of image gesture motion on-line checking and recognition methods based on deep learning
CN110007675A (en) * 2019-04-12 2019-07-12 北京航空航天大学 A kind of Vehicular automatic driving decision system based on driving situation map and the training set preparation method based on unmanned plane
CN111027472A (en) * 2019-12-09 2020-04-17 北京邮电大学 Video identification method based on fusion of video optical flow and image space feature weight
US20210158483A1 (en) * 2019-11-26 2021-05-27 Samsung Electronics Co., Ltd. Jointly learning visual motion and confidence from local patches in event cameras
CN113221633A (en) * 2021-03-24 2021-08-06 西安电子科技大学 Weak supervision time sequence behavior positioning method based on hierarchical category model
CN113678137A (en) * 2019-08-18 2021-11-19 聚好看科技股份有限公司 Display device
CN116453010A (en) * 2023-03-13 2023-07-18 彩虹鱼科技(广东)有限公司 Ocean biological target detection method and system based on optical flow RGB double-path characteristics

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292912A (en) * 2017-05-26 2017-10-24 浙江大学 A kind of light stream method of estimation practised based on multiple dimensioned counter structure chemistry
US20170336398A1 (en) * 2016-04-26 2017-11-23 Washington State University Compositions and methods for antigen detection incorporating inorganic nanostructures to amplify detection signals

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170336398A1 (en) * 2016-04-26 2017-11-23 Washington State University Compositions and methods for antigen detection incorporating inorganic nanostructures to amplify detection signals
CN107292912A (en) * 2017-05-26 2017-10-24 浙江大学 A kind of light stream method of estimation practised based on multiple dimensioned counter structure chemistry

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
VAN-MINH KHONG等: "Improving human action recognition with two-stream 3D convolutional neural network", 《2018 1ST INTERNATIONAL CONFERENCE ON MULTIMEDIA ANALYSIS AND PATTERN RECOGNITION (MAPR)》 *
ZHENG SHOU等: "CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION(CVPR)》 *
赵谦 等: "《智能视频图像处理技术与应用》", 30 September 2016, 西安电子科技大学出版社 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886225A (en) * 2019-02-27 2019-06-14 浙江理工大学 A kind of image gesture motion on-line checking and recognition methods based on deep learning
CN109886225B (en) * 2019-02-27 2020-09-15 浙江理工大学 Image gesture action online detection and recognition method based on deep learning
CN110007675A (en) * 2019-04-12 2019-07-12 北京航空航天大学 A kind of Vehicular automatic driving decision system based on driving situation map and the training set preparation method based on unmanned plane
CN113678137A (en) * 2019-08-18 2021-11-19 聚好看科技股份有限公司 Display device
CN113678137B (en) * 2019-08-18 2024-03-12 聚好看科技股份有限公司 Display apparatus
US20210158483A1 (en) * 2019-11-26 2021-05-27 Samsung Electronics Co., Ltd. Jointly learning visual motion and confidence from local patches in event cameras
US11694304B2 (en) * 2019-11-26 2023-07-04 Samsung Electronics Co., Ltd. Jointly learning visual motion and confidence from local patches in event cameras
CN111027472A (en) * 2019-12-09 2020-04-17 北京邮电大学 Video identification method based on fusion of video optical flow and image space feature weight
CN113221633A (en) * 2021-03-24 2021-08-06 西安电子科技大学 Weak supervision time sequence behavior positioning method based on hierarchical category model
CN113221633B (en) * 2021-03-24 2023-09-19 西安电子科技大学 Weak supervision time sequence behavior positioning method based on hierarchical category model
CN116453010A (en) * 2023-03-13 2023-07-18 彩虹鱼科技(广东)有限公司 Ocean biological target detection method and system based on optical flow RGB double-path characteristics

Similar Documents

Publication Publication Date Title
CN109389035A (en) Low latency video actions detection method based on multiple features and frame confidence score
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN109919031B (en) Human behavior recognition method based on deep neural network
Hu et al. 3D separable convolutional neural network for dynamic hand gesture recognition
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN109886358B (en) Human behavior recognition method based on multi-time-space information fusion convolutional neural network
CN110188239B (en) Double-current video classification method and device based on cross-mode attention mechanism
CN110084228A (en) A kind of hazardous act automatic identifying method based on double-current convolutional neural networks
CN112784798A (en) Multi-modal emotion recognition method based on feature-time attention mechanism
CN112766172B (en) Facial continuous expression recognition method based on time sequence attention mechanism
CN108764019A (en) A kind of Video Events detection method based on multi-source deep learning
CN113158861B (en) Motion analysis method based on prototype comparison learning
CN114049381A (en) Twin cross target tracking method fusing multilayer semantic information
CN104537684A (en) Real-time moving object extraction method in static scene
Iosifidis et al. Neural representation and learning for multi-view human action recognition
CN114529984A (en) Bone action recognition method based on learnable PL-GCN and ECLSTM
CN113313123A (en) Semantic inference based glance path prediction method
CN112906520A (en) Gesture coding-based action recognition method and device
CN114360067A (en) Dynamic gesture recognition method based on deep learning
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN116363738A (en) Face recognition method, system and storage medium based on multiple moving targets
Nale et al. Suspicious human activity detection using pose estimation and lstm
CN105956604B (en) Action identification method based on two-layer space-time neighborhood characteristics
Wei et al. Learning facial expression and body gesture visual information for video emotion recognition
CN113850182A (en) Action identification method based on DAMR-3 DNet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190226