CN108647591A - Activity recognition method and system in a kind of video of view-based access control model-semantic feature - Google Patents
Activity recognition method and system in a kind of video of view-based access control model-semantic feature Download PDFInfo
- Publication number
- CN108647591A CN108647591A CN201810379626.6A CN201810379626A CN108647591A CN 108647591 A CN108647591 A CN 108647591A CN 201810379626 A CN201810379626 A CN 201810379626A CN 108647591 A CN108647591 A CN 108647591A
- Authority
- CN
- China
- Prior art keywords
- image sequence
- feature vector
- image
- gru
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of Activity recognition methods in video of view-based access control model semantic feature, extract short-term space-time visual signature first with Three dimensional convolution neural network, avoid the high computation complexity brought using light stream or intensive method of loci;Then the object detector based on convolutional neural networks is utilized to extract the semanteme and spatial positional information of people and object, it constructs personage's body spatial position feature and carries out Fusion Features with space-time visual signature, the recognition accuracy of interbehavior in video is improved using additional semantic information;Finally, on the basis of the short-term space-time visual signature with versatility of extraction, Activity recognition accuracy rate is improved by the long-term action feature of Recognition with Recurrent Neural Network extraction specificity.The present invention can solve the technical issues of computation complexity present in the existing Activity recognition method for video is high, Activity recognition accuracy rate is low and can not extract through the long-term action feature of entire video time dimension.
Description
Technical field
Technical field of computer vision of the present invention, more particularly, to row in a kind of video of view-based access control model-semantic feature
For recognition methods and system.
Background technology
Have become the popular research neck of computer vision field for the Activity recognition problem of video data type
Domain.At present for the Activity recognition in video, mainly there are 3 kinds of methods:Optical flow method, Recognition with Recurrent Neural Network method and Three dimensional convolution
Neural network.
For optical flow method, the accuracy rate of Activity recognition is higher, but because the computation complexity of optical flow method is high, institute
The real-time of calculating is cannot achieve with it;The input data of Recognition with Recurrent Neural Network includes mainly two kinds:First, using convolutional Neural
The feature of the single-frame images of network extraction, this feature lack time-domain related information, cause the recognition accuracy of method low;Two
It is light stream or intensive trace information, as optical flow method, the computation complexity of method can be caused high;For Three dimensional convolution nerve
For network, input data is the image sequence segment of regular length, therefore this method is merely able to the short-term of extraction versatility
Space-time visual signature, and the long-term action feature through entire video time dimension cannot be extracted.
Invention content
For the disadvantages described above or Improvement requirement of the prior art, the present invention provides a kind of view-based access control model-semantic features
Activity recognition method and system in video, it is intended that solving to count present in the existing Activity recognition method for video
It calculates that complexity is high, Activity recognition accuracy rate is low and the long-term action feature through entire video time dimension can not be extracted
Technical problem.
To achieve the above object, according to one aspect of the present invention, a kind of video of view-based access control model-semantic feature is provided
Middle Activity recognition method, includes the following steps:
(1) obtain image sequence from data set, down-sampled processing carried out to the image sequence, with obtain it is down-sampled after figure
As sequence V={ vt, t ∈ 0,1 ..., T-1, and will be down-sampled after image sequence be sliced, with obtain it is N number of have fix
The image sequence segment of length, wherein T indicate that the length of image sequence, N indicate the quantity of image sequence segment.
(2) each image in N number of image sequence segment with regular length is zoomed in and out and cutting is handled, and will
In N number of image sequence segment input Three dimensional convolution neural network, to obtain N number of space-time visual feature vector.
(3) piece image will be chosen in each image sequence segment obtained in step (1), which is zoomed in and out and
Cutting is handled, and the image after scaling and cutting is input in object detector, to obtain confidence level and the position of kind of object
Offset, and according to the confidence level of kind of object and position offset construction people-object space position feature vector.
(4) people-object space position that will be obtained in the space-time visual feature vector obtained in step (2) and step (3)
Feature vector carries out Fusion Features.
(5) feature vector after step (4) Fusion Features is inputted into Recognition with Recurrent Neural Network, to obtain long-term action feature.
(6) the long-term action feature obtained to step (5) using Softmax graders is classified, and is corresponded to generating
The class probability of each behavior type.
Preferably, image sequence is sliced and specifically uses following formula:
Wherein TcIt is the frame step-length of image sequence segment, δ is the frame length of image sequence in image sequence segment, n ∈ 0,
1 ... N-1, and have Tc=8, δ=16.
Preferably, the Three dimensional convolution neural network used is C3D networks, and the physical detector used is that resolution ratio is 300
× 300 more box detectors of single-shot.
Preferably, which is characterized in that input N number of image sequence segment in Three dimensional convolution neural network, when obtaining N number of
Image sequence segment is inputted C3D by the process of empty visual feature vector first specifically, for each image sequence segment
Then network uses the output of the 5th pond layer in C3D networks as short-term space-time visual signature, finally that this feature figure is regular
The feature vector for being 8192 for 1 length, wherein the output matrix size of the 5th pond layer is 1 × 4 × 4 × 512.
Preferably, step (3) is specifically, first, and physical detector is exported according to the image after the scaling of input and cutting
Corresponding to multiple output vectors of multiple bounding boxes, each output vector include L kind of object confidence level P={ pl }, with
And position offset [x, y, w, h], wherein l ∈ 0,1 ... L-1, L indicate the number of kind of object, plIndicate first of object kind
The confidence level of class;Then the corresponding output vector of institute's bounding box is merged, it is more with the correspondence for obtaining multiple detection objects
Spatial position feature vector [q, the x/W that a length is 5I,y/HI,w/WI,h/HI], wherein q indicates the affiliated object kind of detection object
The confidence level of class, x and y are respectively the transverse and longitudinal coordinate of the bounding box of detection object, and w and h are respectively the bounding box of detection object
It is wide and high, WIAnd HIThe width and height of image after respectively scaling and cut;Finally, for each of all L kind of object
For kind of object, it is sky to construct a length using the spatial position feature vector of highest 5 detection objects of its confidence level
Between position feature vector length × L × 5 feature vector.
Preferably, the Recognition with Recurrent Neural Network used in step (5) is 3 layers of GRU networks, is by one layer of full articulamentum and 3
Level joins GRU layers of composition, and full articulamentum has 4096 neurons, and the neuronal quantity of GRU units is in first two layers of GRU networks
4096, the neuronal quantity of GRU units is 256 in last layer, and the output of preceding layer GRU units is later layer GRU units
Input.
Preferably, the Recognition with Recurrent Neural Network used in step (5) is combination GRU networks, is by 3 layers of full articulamentum and one
GRU layers of layer is constituted, and has 4096 neurons in preceding two layers of full articulamentum, there is 512 neurons in the full articulamentum of last layer,
The neuronal quantity of GRU units is 512 in GRU layers.
It is another aspect of this invention to provide that Activity recognition system in a kind of video of view-based access control model-semantic feature is provided,
Including:
First module carries out down-sampled processing, to be dropped for obtaining image sequence from data set to the image sequence
Image sequence V={ v after samplingt, t ∈ 0,1 ..., T-1, and will be down-sampled after image sequence be sliced, to obtain N
A image sequence segment with regular length, wherein T indicate that the length of image sequence, N indicate the quantity of image sequence segment.
Second module, for each image in N number of image sequence segment with regular length to be zoomed in and out and cut out
Processing is cut, and N number of image sequence segment is inputted in Three dimensional convolution neural network, to obtain N number of space-time visual feature vector.
Third module, for piece image will to be chosen in each image sequence segment obtained in the first module, to the figure
Picture zooms in and out and cutting processing, the image after scaling and cutting is input in object detector, to obtain kind of object
Confidence level and position offset, and people-object space position feature is constructed according to the confidence level of kind of object and position offset
Vector.
The people-obtained in 4th module, space-time visual feature vector for will be obtained in the second module and third module
Object space position feature vector carries out Fusion Features.
5th module inputs Recognition with Recurrent Neural Network, to be grown for the feature vector after merging the 4th modular character
Phase behavioural characteristic.
6th module, the long-term action feature for being obtained using the 5th module of Softmax graders pair are classified, with
Generate the class probability corresponding to each behavior type.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show
Beneficial effect:
(1) computation complexity of the invention is low, can ensure the real-time calculated:Due to being used in step of the present invention (2)
Three dimensional convolution neural network extracts acts and efforts for expediency feature, avoids using the high computation complexity that is brought using optical flow method, real
The Activity recognition of quickly and efficiently rate is showed.
(2) Activity recognition accuracy rate of the invention is high:Since the present invention constructs people-object space position in step (3)
Feature vector is set, the recognition accuracy of the interbehavior in video between people and object is improved.
(3) long-term based on acts and efforts for expediency feature extraction using improved GRU network structures in step (5) due to the present invention
Behavioural characteristic can further increase recognition accuracy.
Description of the drawings
The schematic diagram of the 3 layers of GRU networks used in the step of Fig. 1 is the method for the present invention (5).
The schematic diagram of the combination GRU networks used in the step of Fig. 2 is the method for the present invention (5).
Fig. 3 is the ratio of Fig. 1, GRU networks shown in Fig. 2 and conventional monolayers GRU networks in terms of behavior recognition accuracy
Compared with schematic diagram.
Fig. 4 is that the present invention is based on the flow charts of Activity recognition method in the video of vision-semantic feature.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
It does not constitute a conflict with each other and can be combined with each other.
The present invention proposes a kind of short-term space-time vision mode (Long-Short of length-of fusion people-object vision relationship
Term Spatio-Temporal Visual Model with Human-Object Visual Relationship, this hair
It is bright), short-term space-time visual signature is extracted first with Three dimensional convolution neural network, is avoided using light stream or intensive method of loci
The high computation complexity brought;Then the object detector based on convolutional neural networks is utilized to extract the semanteme and sky of people and object
Between location information, construction people-object space position feature simultaneously with space-time visual signature carry out Fusion Features, utilize additional semanteme
Information improves the recognition accuracy of interbehavior in video;Finally, the Short-term characteristic based on fusion proposes a kind of improved cycle
Neural network extraction long-term action feature passes through that is, on the basis of the short-term space-time visual signature with versatility of extraction
The long-term action feature of Recognition with Recurrent Neural Network extraction specificity improves the accuracy rate of Activity recognition.
As shown in figure 4, the present invention is based on Activity recognition methods in the video of vision-semantic feature to include the following steps:
(1) obtain image sequence from data set, down-sampled processing carried out to the image sequence, with obtain it is down-sampled after figure
As sequence V={ vt, t ∈ 0,1 ..., T-1, wherein T indicate image sequence length, and will be down-sampled after image sequence into
Row slice, to obtain N number of image sequence segment with regular length, wherein N indicates the quantity of image sequence segment, specially
Integer between 5 to 10.
Specifically, the data set used in this step is the UCF101 Activity recognition data sets acquired from Youtube, drop
The interval of sampling processing is 5 frames.
Image sequence is sliced and specifically uses following formula:
Wherein TcIt is the frame step-length of image sequence segment, δ is the frame length of image sequence in image sequence segment, n ∈ 0,
1 ... N-1, and have Tc=8, δ=16.
It is 3 image sequences image sequence cutting for example, for length is 32 image sequence (i.e. T=32)
Segment, each image sequence segment include 16 width images, and adjacent two image sequence segments have the overlapping of 8 width images.
(2) each image in N number of image sequence segment with regular length is zoomed in and out and cutting handles (example
Such as, 112 × 112 resolution sizes are scaled and are cut to, the resolution ratio which depends on Three dimensional convolution neural network is big
It is small), and N number of image sequence segment is inputted in Three dimensional convolution neural network, to obtain N number of space-time visual feature vector.
The dimension of each image sequence segment inputted in Three dimensional convolution neural network is 16 × 112 × 112 × 3.
In this step, the Three dimensional convolution neural network used is C3D networks, the space-time for extracting image sequence segment
Visual signature.
Image sequence segment is inputted C3D networks by this step first specifically, for each image sequence segment,
Then use the output of the 5th pond (pool5) layer in C3D networks as short-term space-time visual signature (wherein the 5th pond layer
Output matrix size is 1 × 4 × 4 × 512, i.e. the characteristic pattern that 512 resolution ratio are 4 × 4), finally by this feature figure it is regular be 1
The feature vector that a length is 8192.
(3) piece image will be chosen in each image sequence segment obtained in step (1), which is zoomed in and out and
(for example, scaling and being cut to 300 × 300 resolution sizes, which depends on follow-up object detector for cutting processing
Resolution sizes), the image after scaling and cutting is input in object detector, to obtain confidence level and the position of kind of object
Offset is set, and according to the confidence level of kind of object and position offset construction people-object space position feature vector.
Specifically, the physical detector used in this step is the more box detectors of single-shot that resolution ratio is 300 × 300
(Single shot multibox detector, abbreviation SSD300).
Specifically, first, physical detector corresponds to more this step according to the image output after the scaling of input and cutting
Multiple output vectors of a bounding box, each output vector include L kind of object confidence level P={ pl } and position it is inclined
Shifting amount [x, y, w, h], wherein l ∈ 0,1 ... L-1, L indicate the number of kind of object, plIndicate the credible of first kind of object
Degree;Then merge that (merging process is using non-maxima suppression (Non to the corresponding output vector of institute's bounding box
Maximum Suppression, abbreviation NMS) algorithm), the space bit that the multiple length of correspondence to obtain multiple detection objects are 5
Set feature vector [q, x/WI,y/HI,w/WI,h/HI], wherein q indicates that the confidence level of the affiliated kind of object of detection object, x and y are divided
Not Wei detection object bounding box transverse and longitudinal coordinate, w and h are respectively the width and height of the bounding box of detection object, WIAnd HIRespectively
The width and height of image after scaling and cutting;Finally, it for each kind of object in all L kind of object, utilizes
It is spatial position feature vector length that the spatial position feature vector of highest 5 detection objects of its confidence level, which constructs a length,
The feature vector of × L × 5.Since the SSD300 in this step can detect 201 kinds of kind of object, and for each object kind
Class chooses the feature vector of 5 highest objects of probability, so L=201 is to get the feature vector for being 5025 to length.
(4) people-object space position that will be obtained in the space-time visual feature vector obtained in step (2) and step (3)
Feature vector carries out Fusion Features.
Specifically, the Fusion Features process of this step is exactly to be by space-time visual signature and length that length is 8192
5025 people-object space position feature splices, and becomes the feature vector that a length is 13217 to merge.
(5) feature vector after step (4) Fusion Features is inputted into Recognition with Recurrent Neural Network, to obtain long-term action feature.
The Recognition with Recurrent Neural Network used in this step is gating cycle unit (Gated Recurrent Unit, abbreviation
GRU)。
The present invention proposes 2 kinds of improved GRU network structures, wherein external square indicates the feature vector of input.It is right
In fusion feature, input is feature vector that the length after Fusion Features is 13217.GRU networks input short-term at any time
Space-time visual signature simultaneously generates long-term action feature under All Time scale.
Be a kind of 3 layers of GRU networks (3-Layer Stacked GRU, abbreviation sGRU) shown in Fig. 1, connected entirely by one layer
It connecing layer (Fool connection layer, abbreviation FC) and 3 levels joins GRU layers of composition, full articulamentum has 4096 neurons,
The neuronal quantity of GRU units is 4096 in first two layers of GRU networks, and the neuronal quantity of GRU units is in last layer
256, the output of preceding layer GRU units is the input of later layer GRU units.The purpose of this framework is deep by increasing GRU networks
Degree, improves the learning ability of network.
Long-term action feature vector length after the output of above-mentioned sGRU networks is 256.
A kind of combination GRU networks (Composite GRU, abbreviation cGRU) shown in Fig. 2, be by 3 layers of full articulamentum and
One layer GRU layers are constituted, and have 4096 neurons in preceding two layers of full articulamentum, there is 512 neurons in the full articulamentum of last layer,
The neuronal quantity of GRU units is 512 in GRU layers.The purpose of this framework be first two layers full articulamentum can to input feature vector into
Row dimensionality reduction, and last GRU layers can learn to long-term behavioural characteristic.
Long-term action feature vector length after the output of above-mentioned cGRU networks is 512.
(6) the long-term action feature obtained to step (5) using Softmax graders is classified, and is corresponded to generating
The class probability of each behavior type.
The finally obtained output of this step is probability vector:PB={ pb, wherein b ∈ 0,1 ... B-1, B indicates behavior type
Quantity, each element representation in probability vector corresponds to the class probability of each behavior type.
Because UCF101 data used in the present invention are concentrated with 101 behavior types, B=101, in probability vector
Maximum element pyThe behavior type that corresponding y-th of behavior type as finally identifies.
Experimental result
Test is the video data in UCF101 Activity recognition data sets, the video in UCF101 data sets using data set
It is acquired from YouTube, shares 101 behavior types, 13320 video clips are not only various with behavior type
Property, the also diversity of camera motion, gestures of object, article size, shooting visual angle, background and illumination etc..UCF101
In behavior type can be divided into 5 big types:People-object interaction, people-people's interaction, plays an instrument and is transported with sport at limb action
It is dynamic.
(1) recognition accuracy
Recognition accuracy refers to that totally 3783 samples, method identify that correct sample number accounts for total number of samples for test set
Ratio.Test can help to analyze each module to method performance using the accuracy rate of the method for different module combination modes
It influences.The accuracy rate of each method is as shown in table 1 below, and the method for wherein italic textual representation has used the intensive track of improvement or light stream
Information.
As can be seen that the method for the present invention has been respectively increased 8.2% compared to LSTM composite models method and C3D methods, accuracy rate
With 10.2%.Depth god is used only in the method for having used light stream compared to other or having improved intensive trace information, the method for the present invention
The feature of original sequence is extracted through network, inference speed is faster.In fact, it is a kind of artificial to improve intensive track
The feature of rule construct, the histogram of gradients based on optical flow tracking and image, and the operation of light stream consumes a large amount of computing resource
And the time.In the method for the different GRU network structures of 2 kinds of uses, the method for the present invention achieves best performance, has been more than to use
Improve the accuracy rate of multi-hop feature storehouse (Multi-skip feature stacking) method 3.4% of intensive trace information.
Accuracy rate of 1 each method of table on UCF101 data sets
(2) influence of the GRU networks to method performance
This section tests the method for using sGRU networks, cGRU networks and single layer GRU networks, wherein single layer GRU nets
Network is used for benchmark test, and single layer GRU networks include 512 neurons, and feature vector directly inputs GRU networks, is a kind of basis
Recognition with Recurrent Neural Network structure.
Each method is compared about the accuracy rate of GRU networks as shown in figure 3, so that the method for the present invention is compared using cGRU networks makes
The accuracy rate that 3.7% is improved with the method for sGRU networks, compared to the standard that the method using single layer GRU networks improves 5.5%
True rate.
Using the method for the present invention of single layer GRU networks and sGRU networks user-object space position feature the case where
It is lower just to have reached other using accuracy rate similar in the method for light stream or the intensive trace information of improvement, illustrate to use single layer GRU nets
The ability to express of network and the long-term action feature of sGRU networks extraction is poor.For the excessive feature vector of length, as feature is melted
The feature vector that length after conjunction is 13217, sGRU networks are since parameter amount is excessive, and not only reasoning and training speed are slow, but also
It is easy to cause over-fitting.And single layer GRU networks, since network depth is excessively shallow, learning performance is poor, be easy to cause poor fitting.cGRU
Web vector graphic fully-connected network carries out dimensionality reduction to feature, the long-term behavioural characteristic of GRU e-learnings is reused, due to network parameter
Amount is few, and not only reasoning and training speed faster, but also do not easily cause over-fitting, accuracy rate higher.
To sum up, cGRU networks preferably realize the function that long-term action feature is extracted on the basis of Short-term characteristic.
(3) computation rate
Computation rate such as the following table 2 institute of the method for the present invention and the other 4 kinds Activity recognition methods based on UCF101 data sets
Show, test uses one piece of K40Tesla GPU.Because the computation complexity of optical flow algorithm is high, intensive track and binary-flow network are improved
The GPU of the optical flow algorithm used in (Two-stream networks) realizes 91.4 times and 274.6 slower than C3D method respectively
Times.Because the method for the present invention contains people-object space characteristic extracting module and long-term action characteristic extracting module, containing additional
SSD300 and cGRU networks, so 2.5 times slower than individual C3D networks, but be still far faster than intensive using improving
The method of track and Optic flow information has reached 125.2 frames/second, realizes the super real-time of calculating.
The computation rate of 2 each method of table compares
People-object space position feature extraction module be divided between having used 16 it is down-sampled, for each video clip only
People-object space feature of one sub-picture of extraction is needed, so the calculating time has shared the meter of each image in video segment
On evaluation time.By independent test, 17.8 frames of computation rate/second of SSD300, i.e. 56.18ms/ frames, the calculating time after sharing
For 3.51ms/ frames.And C3D networks every time make inferences the video clip comprising 16 frame images, computation rate be 313.9 frames/
Second, i.e. 3.19ms/ frames.Theoretically, the calculating time and space visual feature extraction module of people-object space position extraction module
The calculating time add up to 6.70ms/ frames, i.e. 149.3 frames/second.And in actual test, the computation rate of the method for the present invention is
125.2 frames/second, this is because method consumes the additional calculating time during pretreatment and cGRU network reasonings etc., but
It is the calculating time much smaller than SSD300 and C3D network reasonings.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, all within the spirits and principles of the present invention made by all any modification, equivalent and improvement etc., should all include
Within protection scope of the present invention.
Claims (8)
1. a kind of Activity recognition method in video of view-based access control model-semantic feature, which is characterized in that include the following steps:
(1) obtain image sequence from data set, down-sampled processing carried out to the image sequence, with obtain it is down-sampled after image sequence
Arrange V={ vt, t ∈ 0,1 ..., T-1, and will be down-sampled after image sequence be sliced, with obtain it is N number of have regular length
Image sequence segment, wherein T indicate image sequence length, N indicate image sequence segment quantity.
(2) each image in N number of image sequence segment with regular length is zoomed in and out and cutting is handled, and will be N number of
Image sequence segment inputs in Three dimensional convolution neural network, to obtain N number of space-time visual feature vector.
(3) piece image will be chosen in each image sequence segment obtained in step (1), which is zoomed in and out and cut
Processing, the image after scaling and cutting is input in object detector, to obtain the confidence level and position offset of kind of object
Amount, and according to the confidence level of kind of object and position offset construction people-object space position feature vector.
(4) people-object space position feature that will be obtained in the space-time visual feature vector obtained in step (2) and step (3)
Vector carries out Fusion Features.
(5) feature vector after step (4) Fusion Features is inputted into Recognition with Recurrent Neural Network, to obtain long-term action feature.
(6) the long-term action feature obtained to step (5) using Softmax graders is classified, to generate corresponding to each
The class probability of kind behavior type.
2. Activity recognition method in video according to claim 1, which is characterized in that by image sequence be sliced specific
It is to use following formula:
Wherein TcIt is the frame step-length of image sequence segment, δ is the frame length of image sequence in image sequence segment, n ∈ 0,1 ... N-
1, and have Tc=8, δ=16.
3. Activity recognition method in video according to claim 1 or 2, which is characterized in that the Three dimensional convolution nerve used
Network is C3D networks, and the physical detector used is the more box detectors of single-shot that resolution ratio is 300 × 300.
4. Activity recognition method in video as claimed in any of claims 1 to 3, which is characterized in that by N number of image
Sequence fragment inputs in Three dimensional convolution neural network, to obtain the process of N number of space-time visual feature vector specifically, for each
For image sequence segment, image sequence segment is inputted into C3D networks first, then uses the 5th pond layer in C3D networks
Output is used as short-term space-time visual signature, finally by this feature figure it is regular be feature vector that 1 length is 8192, wherein the 5th
The output matrix size of pond layer is 1 × 4 × 4 × 512.
5. Activity recognition method in video according to claim 4, which is characterized in that step (3) is specifically, first, object
Multiple output vectors of the detector according to the image output after the scaling of input and cutting corresponding to multiple bounding boxes are managed, it is each defeated
Outgoing vector includes the confidence level P={ p of L kind of objectlAnd position offset [x, y, w, h], wherein l ∈ 0,1 ... L-
1, L indicates the number of kind of object, plIndicate the confidence level of first of kind of object;Then to the corresponding output of institute's bounding box
Vector merges, spatial position feature vector [q, the x/W that the multiple length of the correspondence to obtain multiple detection objects are 5I,y/
HI,w/WI,h/HI], wherein q indicates that the confidence level of the affiliated kind of object of detection object, x and y are respectively the bounding box of detection object
Transverse and longitudinal coordinate, w and h are respectively the width and height of the bounding box of detection object, WIAnd HIImage after respectively scaling and cut
It is wide and high;Finally, for each kind of object in all L kind of object, highest 5 detections of its confidence level are utilized
The spatial position feature vector of object constructs the feature vector that a length is spatial position feature vector length × L × 5.
6. Activity recognition method in video according to claim 1, which is characterized in that the cycle god used in step (5)
It is 3 layers of GRU networks through network, is to join GRU layers by one layer of full articulamentum and 3 levels to constitute, full articulamentum there are 4096 nerves
Member, the neuronal quantity of GRU units is 4096 in first two layers of GRU networks, and the neuronal quantity of GRU units is in last layer
256, the output of preceding layer GRU units is the input of later layer GRU units.
7. Activity recognition method in video according to claim 1, which is characterized in that the cycle god used in step (5)
It is combination GRU networks through network, is to be made of 3 layers of full articulamentum and one layer GRU layers, there are 4096 in preceding two layers of full articulamentum
Neuron, there is 512 neurons in the full articulamentum of last layer, and the neuronal quantity of GRU units is 512 in GRU layers.
8. Activity recognition system in a kind of video of view-based access control model-semantic feature, which is characterized in that including:
First module carries out down-sampled processing for obtaining image sequence from data set to the image sequence, down-sampled to obtain
Image sequence V={ v afterwardst, t ∈ 0,1 ..., T-1, and will be down-sampled after image sequence be sliced, to obtain N number of tool
There are the image sequence segment of regular length, wherein T to indicate that the length of image sequence, N indicate the quantity of image sequence segment.
Second module, for each image in N number of image sequence segment with regular length zoom in and out and cutting at
Reason, and N number of image sequence segment is inputted in Three dimensional convolution neural network, to obtain N number of space-time visual feature vector.
Third module, for piece image will to be chosen in each image sequence segment obtained in the first module, to the image into
Row scaling and cutting processing, the image after scaling and cutting are input in object detector, to obtain the credible of kind of object
Degree and position offset, and according to the confidence level of kind of object and position offset construction people-object space position feature vector.
People-the object obtained in 4th module, space-time visual feature vector for will be obtained in the second module and third module
Spatial position feature vector carries out Fusion Features.
5th module inputs Recognition with Recurrent Neural Network, to obtain long-term row for the feature vector after merging the 4th modular character
It is characterized.
6th module, the long-term action feature for being obtained using the 5th module of Softmax graders pair are classified, to generate
Corresponding to the class probability of each behavior type.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810379626.6A CN108647591A (en) | 2018-04-25 | 2018-04-25 | Activity recognition method and system in a kind of video of view-based access control model-semantic feature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810379626.6A CN108647591A (en) | 2018-04-25 | 2018-04-25 | Activity recognition method and system in a kind of video of view-based access control model-semantic feature |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108647591A true CN108647591A (en) | 2018-10-12 |
Family
ID=63747734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810379626.6A Withdrawn CN108647591A (en) | 2018-04-25 | 2018-04-25 | Activity recognition method and system in a kind of video of view-based access control model-semantic feature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647591A (en) |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109615358A (en) * | 2018-11-01 | 2019-04-12 | 北京伟景智能科技有限公司 | A kind of dining room automatic settlement method and system based on deep learning image recognition |
CN109784295A (en) * | 2019-01-25 | 2019-05-21 | 佳都新太科技股份有限公司 | Video stream characteristics recognition methods, device, equipment and storage medium |
CN109977773A (en) * | 2019-02-18 | 2019-07-05 | 华南理工大学 | Human bodys' response method and system based on multi-target detection 3D CNN |
CN109977872A (en) * | 2019-03-27 | 2019-07-05 | 北京迈格威科技有限公司 | Motion detection method, device, electronic equipment and computer readable storage medium |
CN110070002A (en) * | 2019-03-29 | 2019-07-30 | 上海理工大学 | A kind of Activity recognition method based on 3D convolutional neural networks |
CN110348290A (en) * | 2019-05-27 | 2019-10-18 | 天津中科智能识别产业技术研究院有限公司 | Coke tank truck safe early warning visible detection method |
CN110427831A (en) * | 2019-07-09 | 2019-11-08 | 淮阴工学院 | A kind of human action classification method based on fusion feature |
CN110490109A (en) * | 2019-08-09 | 2019-11-22 | 郑州大学 | A kind of online human body recovery action identification method based on monocular vision |
CN110503076A (en) * | 2019-08-29 | 2019-11-26 | 腾讯科技(深圳)有限公司 | Video classification methods, device, equipment and medium based on artificial intelligence |
CN110598608A (en) * | 2019-09-02 | 2019-12-20 | 中国航天员科研训练中心 | Non-contact and contact cooperative psychological and physiological state intelligent monitoring system |
CN111259838A (en) * | 2020-01-20 | 2020-06-09 | 山东大学 | Method and system for deeply understanding human body behaviors in service robot service environment |
CN111507421A (en) * | 2020-04-22 | 2020-08-07 | 上海极链网络科技有限公司 | Video-based emotion recognition method and device |
WO2020206850A1 (en) * | 2019-04-09 | 2020-10-15 | 华为技术有限公司 | Image annotation method and device employing high-dimensional image |
CN111783692A (en) * | 2020-07-06 | 2020-10-16 | 广东工业大学 | Action recognition method and device, electronic equipment and storage medium |
CN111783760A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Character recognition method and device, electronic equipment and computer readable storage medium |
CN112232283A (en) * | 2020-11-05 | 2021-01-15 | 深兰科技(上海)有限公司 | Bubble detection method and system based on optical flow and C3D network |
CN113807318A (en) * | 2021-10-11 | 2021-12-17 | 南京信息工程大学 | Action identification method based on double-current convolutional neural network and bidirectional GRU |
US11270147B1 (en) | 2020-10-05 | 2022-03-08 | International Business Machines Corporation | Action-object recognition in cluttered video scenes using text |
US11322234B2 (en) | 2019-07-25 | 2022-05-03 | International Business Machines Corporation | Automated content avoidance based on medical conditions |
US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11423223B2 (en) | 2019-12-02 | 2022-08-23 | International Business Machines Corporation | Dynamic creation/expansion of cognitive model dictionaries based on analysis of natural language content |
US11423252B1 (en) | 2021-04-29 | 2022-08-23 | International Business Machines Corporation | Object dataset creation or modification using labeled action-object videos |
US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US11625422B2 (en) | 2019-12-02 | 2023-04-11 | Merative Us L.P. | Context based surface form generation for cognitive system dictionaries |
US11636346B2 (en) | 2019-05-06 | 2023-04-25 | Brown University | Recurrent neural circuits |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
CN117158904A (en) * | 2023-09-08 | 2023-12-05 | 上海市第四人民医院 | Old people cognitive disorder detection system and method based on behavior analysis |
US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
CN117158904B (en) * | 2023-09-08 | 2024-05-24 | 上海市第四人民医院 | Old people cognitive disorder detection system and method based on behavior analysis |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451552A (en) * | 2017-07-25 | 2017-12-08 | 北京联合大学 | A kind of gesture identification method based on 3D CNN and convolution LSTM |
-
2018
- 2018-04-25 CN CN201810379626.6A patent/CN108647591A/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451552A (en) * | 2017-07-25 | 2017-12-08 | 北京联合大学 | A kind of gesture identification method based on 3D CNN and convolution LSTM |
Non-Patent Citations (1)
Title |
---|
XINHUA LIU,ET AL.: "An Optimization Model for Human Activity Recognition Inspired by Information on Human-Object Interaction", 《IEEE:2018 10TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION》 * |
Cited By (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11797304B2 (en) | 2018-02-01 | 2023-10-24 | Tesla, Inc. | Instruction set architecture for a vector computational unit |
US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11983630B2 (en) | 2018-09-03 | 2024-05-14 | Tesla, Inc. | Neural networks for embedded devices |
US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
CN109615358B (en) * | 2018-11-01 | 2020-11-03 | 北京伟景智能科技有限公司 | Deep learning image recognition-based restaurant automatic settlement method and system |
CN109615358A (en) * | 2018-11-01 | 2019-04-12 | 北京伟景智能科技有限公司 | A kind of dining room automatic settlement method and system based on deep learning image recognition |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11908171B2 (en) | 2018-12-04 | 2024-02-20 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
CN109784295B (en) * | 2019-01-25 | 2020-12-25 | 佳都新太科技股份有限公司 | Video stream feature identification method, device, equipment and storage medium |
CN109784295A (en) * | 2019-01-25 | 2019-05-21 | 佳都新太科技股份有限公司 | Video stream characteristics recognition methods, device, equipment and storage medium |
US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
CN109977773B (en) * | 2019-02-18 | 2021-01-19 | 华南理工大学 | Human behavior identification method and system based on multi-target detection 3D CNN |
CN109977773A (en) * | 2019-02-18 | 2019-07-05 | 华南理工大学 | Human bodys' response method and system based on multi-target detection 3D CNN |
US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
CN109977872A (en) * | 2019-03-27 | 2019-07-05 | 北京迈格威科技有限公司 | Motion detection method, device, electronic equipment and computer readable storage medium |
CN110070002A (en) * | 2019-03-29 | 2019-07-30 | 上海理工大学 | A kind of Activity recognition method based on 3D convolutional neural networks |
WO2020206850A1 (en) * | 2019-04-09 | 2020-10-15 | 华为技术有限公司 | Image annotation method and device employing high-dimensional image |
US11636346B2 (en) | 2019-05-06 | 2023-04-25 | Brown University | Recurrent neural circuits |
CN110348290A (en) * | 2019-05-27 | 2019-10-18 | 天津中科智能识别产业技术研究院有限公司 | Coke tank truck safe early warning visible detection method |
CN110427831A (en) * | 2019-07-09 | 2019-11-08 | 淮阴工学院 | A kind of human action classification method based on fusion feature |
US11322234B2 (en) | 2019-07-25 | 2022-05-03 | International Business Machines Corporation | Automated content avoidance based on medical conditions |
CN110490109B (en) * | 2019-08-09 | 2022-03-25 | 郑州大学 | Monocular vision-based online human body rehabilitation action recognition method |
CN110490109A (en) * | 2019-08-09 | 2019-11-22 | 郑州大学 | A kind of online human body recovery action identification method based on monocular vision |
CN110503076A (en) * | 2019-08-29 | 2019-11-26 | 腾讯科技(深圳)有限公司 | Video classification methods, device, equipment and medium based on artificial intelligence |
CN110503076B (en) * | 2019-08-29 | 2023-06-30 | 腾讯科技(深圳)有限公司 | Video classification method, device, equipment and medium based on artificial intelligence |
CN110598608A (en) * | 2019-09-02 | 2019-12-20 | 中国航天员科研训练中心 | Non-contact and contact cooperative psychological and physiological state intelligent monitoring system |
CN110598608B (en) * | 2019-09-02 | 2022-01-14 | 中国航天员科研训练中心 | Non-contact and contact cooperative psychological and physiological state intelligent monitoring system |
US11625422B2 (en) | 2019-12-02 | 2023-04-11 | Merative Us L.P. | Context based surface form generation for cognitive system dictionaries |
US11423223B2 (en) | 2019-12-02 | 2022-08-23 | International Business Machines Corporation | Dynamic creation/expansion of cognitive model dictionaries based on analysis of natural language content |
CN111259838B (en) * | 2020-01-20 | 2023-02-03 | 山东大学 | Method and system for deeply understanding human body behaviors in service robot service environment |
CN111259838A (en) * | 2020-01-20 | 2020-06-09 | 山东大学 | Method and system for deeply understanding human body behaviors in service robot service environment |
CN111507421A (en) * | 2020-04-22 | 2020-08-07 | 上海极链网络科技有限公司 | Video-based emotion recognition method and device |
CN111783760A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Character recognition method and device, electronic equipment and computer readable storage medium |
US11775845B2 (en) | 2020-06-30 | 2023-10-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Character recognition method and apparatus, electronic device and computer readable storage medium |
CN111783760B (en) * | 2020-06-30 | 2023-08-08 | 北京百度网讯科技有限公司 | Character recognition method, device, electronic equipment and computer readable storage medium |
CN111783692A (en) * | 2020-07-06 | 2020-10-16 | 广东工业大学 | Action recognition method and device, electronic equipment and storage medium |
US11270147B1 (en) | 2020-10-05 | 2022-03-08 | International Business Machines Corporation | Action-object recognition in cluttered video scenes using text |
US11928849B2 (en) | 2020-10-05 | 2024-03-12 | International Business Machines Corporation | Action-object recognition in cluttered video scenes using text |
CN112232283B (en) * | 2020-11-05 | 2023-09-01 | 深兰科技(上海)有限公司 | Bubble detection method and system based on optical flow and C3D network |
CN112232283A (en) * | 2020-11-05 | 2021-01-15 | 深兰科技(上海)有限公司 | Bubble detection method and system based on optical flow and C3D network |
US11423252B1 (en) | 2021-04-29 | 2022-08-23 | International Business Machines Corporation | Object dataset creation or modification using labeled action-object videos |
CN113807318B (en) * | 2021-10-11 | 2023-10-31 | 南京信息工程大学 | Action recognition method based on double-flow convolutional neural network and bidirectional GRU |
CN113807318A (en) * | 2021-10-11 | 2021-12-17 | 南京信息工程大学 | Action identification method based on double-current convolutional neural network and bidirectional GRU |
CN117158904A (en) * | 2023-09-08 | 2023-12-05 | 上海市第四人民医院 | Old people cognitive disorder detection system and method based on behavior analysis |
CN117158904B (en) * | 2023-09-08 | 2024-05-24 | 上海市第四人民医院 | Old people cognitive disorder detection system and method based on behavior analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647591A (en) | Activity recognition method and system in a kind of video of view-based access control model-semantic feature | |
Chen et al. | Motion guided spatial attention for video captioning | |
Wang et al. | A self-training approach for point-supervised object detection and counting in crowds | |
Aich et al. | Bidirectional attention network for monocular depth estimation | |
Yang et al. | A part-aware multi-scale fully convolutional network for pedestrian detection | |
Varior et al. | Multi-scale attention network for crowd counting | |
CN110276253A (en) | A kind of fuzzy literal detection recognition method based on deep learning | |
CN109816689A (en) | A kind of motion target tracking method that multilayer convolution feature adaptively merges | |
Mahjourian et al. | Geometry-based next frame prediction from monocular video | |
Wang et al. | Robust object detection via instance-level temporal cycle confusion | |
CN110135446A (en) | Method for text detection and computer storage medium | |
Mo et al. | Background noise filtering and distribution dividing for crowd counting | |
CN112101344B (en) | Video text tracking method and device | |
Xu et al. | BANet: A balanced atrous net improved from SSD for autonomous driving in smart transportation | |
CN110175597A (en) | Video target detection method integrating feature propagation and aggregation | |
Zhu et al. | CACrowdGAN: Cascaded attentional generative adversarial network for crowd counting | |
Liu et al. | Density-aware and background-aware network for crowd counting via multi-task learning | |
Chu et al. | Attention guided feature pyramid network for crowd counting | |
Chen et al. | SSR-HEF: crowd counting with multiscale semantic refining and hard example focusing | |
Aliakbarian et al. | Deep action-and context-aware sequence learning for activity recognition and anticipation | |
Ju et al. | An improved YOLO V3 for small vehicles detection in aerial images | |
Li et al. | Multi-Scale correlation module for video-based facial expression recognition in the wild | |
Huang et al. | Video frame prediction with dual-stream deep network emphasizing motions and content details | |
CN112184767A (en) | Method, device, equipment and storage medium for tracking moving object track | |
de Almeida Maia et al. | Action recognition in videos using multi-stream convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20181012 |