CN108764148A - Multizone real-time action detection method based on monitor video - Google Patents
Multizone real-time action detection method based on monitor video Download PDFInfo
- Publication number
- CN108764148A CN108764148A CN201810534453.0A CN201810534453A CN108764148A CN 108764148 A CN108764148 A CN 108764148A CN 201810534453 A CN201810534453 A CN 201810534453A CN 108764148 A CN108764148 A CN 108764148A
- Authority
- CN
- China
- Prior art keywords
- tube
- action
- classification
- yolo
- models
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The multizone real-time action detection method based on monitor video that the invention discloses a kind of having following steps:Model training stage and test phase, wherein model training stage is to obtain training data:The database of the specific action marked;The dense optical flow of the video sequence in training data is calculated, the light stream sequence of the video sequence in training data is obtained, and the light stream image in light stream sequence is labeled;Using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively obtain RGB yolo v3 models and light stream yolo v3 models.The present invention can not only realize the space-time position detection to specific action in monitor video, and can realize the real-time processing to monitoring.
Description
Technical field
The invention belongs to computer vision fields, and in particular to the human action detecting system under monitor video scene.
Background technology
More and more universal with the application of monitor and control facility, more technologies based on monitoring are applied, and action recognition is made
For one of wherein of great value technology, it is mainly used in indoor, man-machine device under the environment of plant interaction and public environment
Security fields are used for the detection and identification of particular risk action.
The action identification method being mostly based in monitor video focuses primarily upon action recognition and the classification of entire scene
In task, this kind of video is usually the good video clip of artificial treatment, is generally only acted comprising a kind of in video clip, but this
Kind video and natural video clip differ greatly, and Task is also placed on detection operation in the entire time with a part of scholar
The position for starting from receiving occurred on axis, but the beginning and end of action in video and dynamic is obtained in practical application
It is all very useful to make the range that occurs in space, additionally while existing motion detection method is in existing database and competing
Achieve good detection result in match, but these methods be typically all by entire video be divided into many fritters or
Person handles entire video, then exports the space-time position acted in this section of video again, and to reach real-time action detection
The other processing of video frame level will be realized, so such methods have no idea to be deployed in monitoring system.
It is universal with monitoring device, in monitor video the detection of human action be increasingly becoming a popular research and lead
Domain, Wang L., Qiao Y., Tang X.'s " Action recognition with trajectory-pooled deep
Convolutional descriptors. " are (in 2015IEEE Conference on Computer Vision and
Pattern Recognition(CVPR)(2015).) pass through in method and integrate deep neural network and extract video features and utilization
The feature of intensive track algorithm arrived.Realize the action recognition to entire video, D.Tran, L.Bourdev, R.Fergus,
L.Torresani, and M.Paluri.'s " Learning spatiotemporal features with 3d
Convolutional networks. " are (in 2015IEEE International Conference on Computer
Vision (ICCV) (2015)) method proposes to extract the people in video with 3D convolution sum 3D pooling to form C3D frames
Body motion characteristic, Simonyan K, Zisserman A.'s " Two-Stream Convolutional Networks for
By the way that RGB is schemed in Action Recognition in Videos " (in Computational Linguistics, 2014)
As sequential extraction procedures light stream sequence, merged with convolutional neural networks training and the feature that two networks obtain respectively to realize
To the recognition effect of action.Although those above model achieves good effect, this method can only be realized to whole
A video is identified, and is unable to the space-time position of location action.
" the Finding action tubes " of G.Gkioxari and J.Malik. is (in IEEE Int.Conf.on
Computer Vision and Pattern Recognition, 2015.) action of each frame is mainly detected in
Then action proposal that proposals reconnects each frame forms action sequence, J.Lu, r.Xu, and J.J.Corso
" Human action segmentation with hierarchical supervoxel consistency " are (in IEEE
Int.Conf.on Computer Vision and Pattern Recognition, June 2015) in propose a kind of layer
The MRF models of secondaryization will there is high-level human motion and apparent low-level visual segment to connect to realize in video
In segmentation to action, these methods mainly realize the segmentation that space is carried out to the action in video, and these algorithms need
The other region proposals of a large amount of frame level are wanted to need a large amount of calculate.
" the Temporal Action Localization with Pyramid of of Yuan J, Ni B, Yang X
Score Distribution Features " are (in IEEE:Computer Vision and Pattern
Recognition.2016 iDT features score distribution pyramid feature (Pyramid of a kind of to video extraction is based in)
Score Distribution Feature, PSDF), LSTM networks have been reused later, and PSDF characteristic sequences have been handled,
And it handles to obtain the prediction of behavior segment according to the behavior classification confidence of the frame-level of output.Shou Z,Wang
D, Chang S F.'s " Temporal Action Localization in Untrimmed Videos via Multi-
Stage CNNs " are (in IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) (2016)) in first use sliding window method generate sizes video clip (segment), reuse the multistage
Network (Segment-CNN) handle, finally use non-maximization and inhibit, to remove the segment of overlapping, to complete prediction.
Shou Z, Chan J, Zareian A, " CDC:Convolutional-De-Convolutional Networks for
Precise Temporal Action Localization in Untrimmed Videos " are (in 2017IEEE
Conference on Computer Vision and Pattern Recognition (CVPR) (2017)) in be based on C3D
(3D CNN networks) devises a convolution against convolutional network (CDC), inputs a bit of video, the action classification of output frame rank
Probability.The network is primarily used to be finely adjusted the trip boundary in temporal action detection so that action
Boundary is more accurate, although frame above can reach real-time effect, algorithm above be mainly realization action when
Between dimension accurately detect, and can not achieve action space-time detection.
J.C.van Gemert, M.Jain, E.Gati, and C.G.Snoek.'s " APT:Action localization
Using unsupervised poly- in proposals from dense trajectories " (in BMVC, volume 2, page 4,2015)
Class is proposed to generate the when empty action of one group of boundary frame.Since this method is based on intensive track characteristic, this method can not
Detect the action characterized by small movement.P.Weinzaepfel, Z.Harchaoui, and C.Schmid.'s " Learning
to track for spatio-temporal action localization”(IEEE Computer Vision and
Pattern Recognition, 2015.) it is held by proposing to be combined the regions frame level EdgeBoxes with tracing detection frame
The space-time detection that action is made.However, the detection of the time dimension of action is still through the multi-scale sliding window mouth on each track
To realize so that for longer video sequence this method inefficiency.
Invention content
The present invention proposes that a kind of multizone based on monitor video is real for some problems existing for existing motion detection
When motion detection method.The technological means that the present invention uses is as follows:
A kind of multizone real-time action detection method based on monitor video, it is characterised in that there are following steps:
Model training stage:
A1, training data is obtained:The database of the specific action marked;
A2, calculate training data in video sequence dense optical flow, obtain training data in video sequence light stream
Sequence, and the light stream image in light stream sequence is labeled;
A3, using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively
Obtain RGB yolo v3 models and light stream yolo v3 models;
Test phase:
B1, the sparse optical flow image sequence that video is extracted by pyramid Lucas-Kanande optical flow methods, then video
RGB image sequence and sparse optical flow image sequence be respectively fed in RGB yolo v3 models and light stream yolo v3 models, RGB
A series of detection blocks of yolo v3 models output are detected using first n of non-maxima suppression method extraction everything classification
FrameI=1 ... n, there are one the labels of action classification and a probability score for belonging to the action for each detection blockA series of detection blocks of light stream yolo v3 models output extract everything class using non-maxima suppression method
Other preceding n detection blockK=1 ... n, each detection block is there are one the label of action classification and belongs to the one of the action
A probability score sLight The detection block of RGB yolo v3 models and the output of light stream yolo v3 models is traversed respectively, each
The detection block of RGB yolo v3 models outputWith the detection of the same action classification of light stream yolo v3 models output
FrameIt hands over and compares, and the detection of the same action classification maximum friendship and than the output of corresponding light stream yolo v3 models
Frame is set asIf maximum friendship is simultaneously compared more than threshold k, corresponding two RGB yolo v3 models and light stream yolo
The probability score of the detection block of v3 models output is fused toDetection as RGB yolo v3 models output
FrameConfidence level,Meet following formula:
Wherein,It indicatesWithFriendship and ratio,For with
It hands over and than maximum same action classificationProbability score;
B2, according to fusion obtain each RGB yolo v3 models output detection block each action classification confidence
Score is spent, the detection block connected between the RGB image sequence of video forms tube:
Tube is initialized, is carried out using the detection block of the first frame image in the RGB image sequence of video initial
The first frame image changed in tube, such as the RGB image sequence of video produces n detection block, then initial n tube, video
RGB image sequence in the tube number of a certain action classification of first frame image be:
nClassification(1)=n;
Following operation is carried out to all action classifications respectively:
The detection block that S1, each tube and t frames of matching generate, traversal first belongs to the tube of same action classification, if this is dynamic
There is n tube as classification, the average value of confidence levels of the tube per frame is sought each tube, as the value of the tube, and to this
The value of n tube of action classification carries out descending arrangement form list listClassification, when determining the action classification of each tube, definition
One list I={ lt-k+1…ltIt is used for determining the action classification of tube, list I={ lt-k+1…ltBe used for storing tube's
The action classification of k frames afterwards;
S2, traversal of lists listClassificationIn t framesI=1 ... n, therefrom selection meet the following conditionsIt is added in tube:
Traversal of lists listClassificationIn tube, and select in t frames and tube is the same as action classificationIt is matched, such as
Fruit shouldFriendship and ratio with the detection block in the last frame image of tube are more than threshold value d, then thisIt is added to
Queue H_listClassificationIn;
IfThen select H_listClassificationMiddle confidence level is highestIt is added in tube, and
In traversal t frames againWhen i=1 ... n, it is highest to reject confidence level
IfThen the tube is added without anyAnd remain unchanged, if continuous k frames
Tube is not added newThen terminate the tube;
If t frames have and are not matchedIt is denoted asAll tube are then traversed to ask respectivelyWith it is all
The friendships of tube last frames and ratio, and choose friendship and than being more than threshold value k, and hand over and than maximum tube, be denoted as T*,
It is added in the tube, T*Meet following formula:
IfThen IfThenTiFor i-th of tube, Ti
(t-1) the t-1 frames for being i-th of tube;
If still there is the detection block not being matched in t frames, using the detection block as starting point, new tube is generated, is used in combination
The detection block initializes tube as the first frame image of the tube;
S3, all tube have been matchedAfterwards, the action classification list I=of the rear k frames of each tube is updated
{lt-k+1…lt, wherein ltFor the action classification of the t frames of tube, the action classification L of each tube is updated, each tube is counted
Rear k frames action classification I={ lt-k+1…lt, wherein action classification L of most action classifications as the tube, meet with
Lower formula:
If li=c, then g (li, c)=1;If li≠ c, then g (li, c)=0, c be a certain action classification, that is, count I
={ lt-k+1…ltIn action classification, the largest number of action classifications are the action classification of the tube.
In the step A1, the database of the specific action marked is the Action Detection data of UCF101
Collection.
In the step A2, training data is calculated using the calcOpticalFlowFarneback functions in the libraries OpenCV
In video sequence dense optical flow.
Compared with prior art, the present invention can not only realize the space-time position detection to specific action in monitor video, and
And it can realize real-time processing to monitoring.
The present invention can be widely popularized in fields such as computer visions based on the above reasons.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technology description to do simply to introduce, it should be apparent that, the accompanying drawings in the following description is this hair
Some bright embodiments for those of ordinary skill in the art without having to pay creative labor, can be with
Obtain other attached drawings according to these attached drawings.
Fig. 1 be the present invention specific implementation mode in hand over and than calculate schematic diagram.
Fig. 2 be the present invention specific implementation mode in the multizone real-time action detection method based on monitor video entirety
Schematic diagram.
Fig. 3 be the present invention specific implementation mode in the multizone real-time action detection method program flow based on monitor video
Cheng Tu.
Fig. 4 be the present invention specific implementation mode in a certain frame image processing procedure schematic diagram.
Fig. 5 be the present invention specific implementation mode in consecutive image sequence processing procedure schematic diagram.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
The every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As Figure 1-Figure 5, a kind of multizone real-time action detection method based on monitor video has following steps:
Model training stage:
A1, training data is obtained:The database of the specific action marked;
A2, calculate training data in video sequence dense optical flow, obtain training data in video sequence light stream
Sequence, and the light stream image in light stream sequence is labeled;
A3, using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively
Obtain RGB yolo v3 models and light stream yolo v3 models;
Test phase:
B1, the sparse optical flow image sequence that video is extracted by pyramid Lucas-Kanande optical flow methods, then video
RGB image sequence and sparse optical flow image sequence be respectively fed in RGB yolo v3 models and light stream yolo v3 models, RGB
A series of detection blocks of yolo v3 models output are detected using first n of non-maxima suppression method extraction everything classification
FrameI=1 ... n, there are one the labels of action classification and a probability score for belonging to the action for each detection blockA series of detection blocks of light stream yolo v3 models output extract everything class using non-maxima suppression method
Other preceding n detection blockK=1 ... n, each detection block is there are one the label of action classification and belongs to the one of the action
A probability score sLight The detection block of RGB yolo v3 models and the output of light stream yolo v3 models is traversed respectively, each
The detection block of RGB yolo v3 models outputWith the detection of the same action classification of light stream yolo v3 models output
FrameIt hands over and compares, and the detection of the same action classification maximum friendship and than the output of corresponding light stream yolo v3 models
Frame is set asIf maximum friendship is simultaneously compared more than threshold k, corresponding two RGB yolo v3 models and light stream yolo
The probability score of the detection block of v3 models output is fused toDetection as RGB yolo v3 models output
FrameConfidence level,Meet following formula:
Wherein,It indicatesWithFriendship and ratio,For with
It hands over and than maximum same action classificationProbability score,Indicate probability score, class such as, the friendship of image A and B and ratio
IOU (A, B) can as shown in Figure 1,
Wherein area (A) is expressed as the area of image A, and area (A) ∩ area (B) are the area of image intersection.
B2, according to fusion obtain each RGB yolo v3 models output detection block each action classification confidence
Score is spent, the detection block connected between the RGB image sequence of video forms tube:
Tube is initialized, is carried out using the detection block of the first frame image in the RGB image sequence of video initial
The first frame image changed in tube, such as the RGB image sequence of video produces n detection block, then initial n tube, video
RGB image sequence in the tube number of a certain action classification of first frame image be:
nClassification(1)=n;
Following operation is carried out to all action classifications respectively:
The detection block that S1, each tube and t frames of matching generate, traversal first belongs to the tube of same action classification, if this is dynamic
There is n tube as classification, the average value of confidence levels of the tube per frame is sought each tube, as the value of the tube, and to this
The value of n tube of action classification carries out descending arrangement form list listClassification,Determine each tube's
When action classification, a list I={ l is definedt-k+1…ltIt is used for determining the action classification of tube, list I={ lt-k+1…
ltBe used for store tube rear k frames action classification;
S2, traversal of lists listClassificationIn t framesI=1 ... n, therefrom selection meet the following conditionsIt is added in tube:
Traversal of lists listClassificationIn tube, and select in t frames and tube is the same as action classificationIt is matched,
If shouldFriendship and ratio with the detection block in the last frame image of tube are more than threshold value d, then thisIt is added
To queue H_listClassificationIn;
IfThen select H_listClassificationMiddle confidence level is highestIt is added in tube, and
In traversal t frames againWhen i=1 ... n, it is highest to reject confidence level
IfThen the tube is added without anyAnd remain unchanged, if continuous k frames
Tube is not added newThen terminate the tube;
If t frames have and are not matchedIt is denoted asAll tube are then traversed to ask respectivelyWith it is all
The friendships of tube last frames and ratio, and choose friendship and than being more than threshold value k, and hand over and than maximum tube, be denoted as T*,
It is added in the tube, T*Meet following formula:
IfThen IfThenTiFor i-th of tube,
Ti(t-1) the t-1 frames for being i-th of tube;
If still there is the detection block not being matched in t frames, using the detection block as starting point, new tube is generated, is used in combination
The detection block initializes tube as the first frame image of the tube;
S3, all tube have been matchedAfterwards, the action classification list I=of the rear k frames of each tube is updated
{lt-k+1…lt, wherein ltFor the action classification of the t frames of tube, the action classification L of each tube is updated, each tube is counted
Rear k frames action classification I={ lt-k+1…lt, wherein action classification L of most action classifications as the tube, meet with
Lower formula:
If li=c, then g (li, c)=1;If li≠ c, then g (li, c)=0, c be a certain action classification, that is, count I
={ lt-k+1…ltIn action classification, the largest number of action classifications are the action classification of the tube.
(a) indicates the RGB image sequence of video in Fig. 2;(b) indicate optical flow algorithm test phase using golden word in OpenCV
Tower Lucas-Kanande optical flow methods extract sparse optical flow image, and the training stage is extraction dense optical flow image;(c) be
The sparse optical flow image arrived;(d) it is motion detection model, a RGB yolo to be trained using the RGB image sequence of video
V3 models, another is the light stream yolo v3 models trained with light stream sequence;(e) inspection of RGB yolo v3 models output is indicated
Survey result;(f) testing result of light stream yolo v3 models is indicated;(g) indicate to merge the output of two models as a result, being had
There is the feature of more preferable robustness;(h) indicate that the feature obtained using fusion connects the detection block between the RGB image sequence of video
It is connected in tube.
Fig. 4 (a) is the image in the RGB image sequence of video;(b) image pair in the RGB image sequence of video is indicated
The light stream image answered;(c) indicate what the image in the RGB image sequence of video exported after RGB yolo v3 model treatments
Testing result;(d) testing result that light stream image exports after light stream yolo v3 model treatments is indicated;
Consecutive image sequence in Fig. 5 videos;(a) it indicates equidistantly to take the image in the RGB image sequence of video;(b)
Indicate the corresponding light stream sequence of image in the RGB image sequence of video;(c) image in the RGB image sequence of video is indicated
The testing result exported after RGB yolo v3 model treatments;(d) indicate light stream sequence by light stream yolo v3 models
The testing result exported after reason;(e) tube that the testing result by fusion (c) and (d) obtains is indicated;
In the step A1, the database of the specific action marked is the Action Detection data of UCF101
Collection.
In the step A2, training data is calculated using the calcOpticalFlowFarneback functions in the libraries OpenCV
In video sequence dense optical flow.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Present invention has been described in detail with reference to the aforementioned embodiments for pipe, it will be understood by those of ordinary skill in the art that:Its according to
So can with technical scheme described in the above embodiments is modified, either to which part or all technical features into
Row equivalent replacement;And these modifications or replacements, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (3)
1. a kind of multizone real-time action detection method based on monitor video, it is characterised in that have following steps:
Model training stage:
A1, training data is obtained:The database of the specific action marked;
A2, calculate training data in video sequence dense optical flow, obtain training data in video sequence light stream sequence,
And the light stream image in light stream sequence is labeled;
A3, using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively obtain
RGB yolo v3 models and light stream yolo v3 models;
Test phase:
B1, the sparse optical flow image sequence that video is extracted by pyramid Lucas-Kanande optical flow methods, then video
RGB image sequence and sparse optical flow image sequence are respectively fed in RGB yolo v3 models and light stream yolo v3 models, RGB
A series of detection blocks of yolo v3 models output are detected using first n of non-maxima suppression method extraction everything classification
FrameThere are one the labels of action classification and a probability score for belonging to the action for each detection blockA series of detection blocks of light stream yolo v3 models output extract everything class using non-maxima suppression method
Other preceding n detection block Each detection block is there are one the label of action classification and belongs to the one of the action
A probability score The detection block of RGB yolo v3 models and the output of light stream yolo v3 models is traversed respectively, each
The detection block of RGB yolo v3 models outputWith the detection of the same action classification of light stream yolo v3 models output
FrameIt hands over and compares, and the detection of the same action classification maximum friendship and than the output of corresponding light stream yolo v3 models
Frame is set asIf maximum friendship is simultaneously compared more than threshold k, corresponding two RGB yolo v3 models and light stream yolo
The probability score of the detection block of v3 models output is fused toDetection as RGB yolo v3 models output
FrameConfidence level,Meet following formula:
Wherein,It indicatesWithFriendship and ratio,For withIt hands over simultaneously
Than maximum same action classificationProbability score;
B2, divided according to the confidence level of each action classification for the detection block for merging obtained each RGB yolo v3 models output
Number, the detection block connected between the RGB image sequence of video form tube:
Tube is initialized, is initialized using the detection block of the first frame image in the RGB image sequence of video
tube;
Following operation is carried out to all action classifications respectively:
The detection block that S1, each tube and t frames of matching generate, traversal first belongs to the tube of same action classification, if the action class
There is not n tube, the average value of confidence levels of the tube per frame is sought each tube, as the value of the tube, and to the action
The value of n tube of classification carries out descending arrangement form list listClassification, when determining the action classification of each tube, define one
A list I={ lt-k+1…ltIt is used for determining the action classification of tube, list I={ lt-k+1…ltIt is used for storing the rear k of tube
The action classification of frame;
S2, traversal of lists listClassificationIn t framesTherefrom selection meets the following conditionsAdd
It is added in tube:
Traversal of lists listClassificationIn tube, and select in t frames and tube is the same as action classificationIt is matched, if shouldFriendship and ratio with the detection block in the last frame image of tube are more than threshold value d, then thisIt is added to queue
H_listClassificationIn;
IfThen select H_listClassificationMiddle confidence level is highestIt is added in tube, and again
Secondary traversal t framesWhen, it is highest to reject confidence level
IfThen the tube is added without anyAnd remain unchanged, if continuous k frames tube is
It is not added newThen terminate the tube;
If t frames have and are not matchedIt is denoted asAll tube are then traversed to ask respectivelyWith all tube
The friendship of last frame and ratio, and choose friendship and than being more than threshold value k, and hand over and than maximum tube, be denoted as T*,It is added
Into the tube, T*Meet following formula:
IfThen IfThenTiFor i-th of tube,
Ti(t-1) the t-1 frames for being i-th of tube;
If still there is the detection block not being matched in t frames, using the detection block as starting point, new tube is generated, the inspection is used in combination
It surveys frame and initializes tube as the first frame image of the tube;
S3, all tube have been matchedAfterwards, the action classification list I={ l of the rear k frames of each tube are updatedt-k+1…
lt, wherein ltFor the action classification of the t frames of tube, the action classification L of each tube is updated, counts the rear k frames of each tube
Action classification I={ lt-k+1…lt, wherein action classification L of most action classifications as the tube, meets following formula:
If li=c, then g (li, c)=1;If li≠ c, then g (li, c)=0, c be a certain action classification, that is, count I=
{lt-k+1…ltIn action classification, the largest number of action classifications are the action classification of the tube.
2. the multizone real-time action detection method according to claim 1 based on monitor video, it is characterised in that:It is described
In step A1, the database of the specific action marked is the Action Detection data sets of UCF101.
3. the multizone real-time action detection method according to claim 1 based on monitor video, it is characterised in that:It is described
In step A2, the video sequence in training data is calculated using the calcOpticalFlowFarneback functions in the libraries OpenCV
Dense optical flow.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810534453.0A CN108764148B (en) | 2018-05-30 | 2018-05-30 | Multi-region real-time action detection method based on monitoring video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810534453.0A CN108764148B (en) | 2018-05-30 | 2018-05-30 | Multi-region real-time action detection method based on monitoring video |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108764148A true CN108764148A (en) | 2018-11-06 |
CN108764148B CN108764148B (en) | 2020-03-10 |
Family
ID=64003645
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810534453.0A Active CN108764148B (en) | 2018-05-30 | 2018-05-30 | Multi-region real-time action detection method based on monitoring video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108764148B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447014A (en) * | 2018-11-07 | 2019-03-08 | 东南大学-无锡集成电路技术研究所 | A kind of online behavioral value method of video based on binary channels convolutional neural networks |
CN109711344A (en) * | 2018-12-27 | 2019-05-03 | 东北大学 | A kind of intelligentized specific exceptions behavioral value method in front end |
CN109740454A (en) * | 2018-12-19 | 2019-05-10 | 贵州大学 | A kind of human body posture recognition methods based on YOLO-V3 |
CN109886165A (en) * | 2019-01-23 | 2019-06-14 | 中国科学院重庆绿色智能技术研究院 | A kind of action video extraction and classification method based on moving object detection |
CN111126153A (en) * | 2019-11-25 | 2020-05-08 | 北京锐安科技有限公司 | Safety monitoring method, system, server and storage medium based on deep learning |
WO2020114120A1 (en) * | 2018-12-07 | 2020-06-11 | 深圳光启空间技术有限公司 | Method for identifying vehicle information, system, memory device, and processor |
CN111353452A (en) * | 2020-03-06 | 2020-06-30 | 国网湖南省电力有限公司 | Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images |
CN114049396A (en) * | 2021-11-05 | 2022-02-15 | 北京百度网讯科技有限公司 | Method and device for marking training image and tracking target, electronic equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140254882A1 (en) * | 2013-03-11 | 2014-09-11 | Adobe Systems Incorporated | Optical Flow with Nearest Neighbor Field Fusion |
CN105512618A (en) * | 2015-11-27 | 2016-04-20 | 北京航空航天大学 | Video tracking method |
CN106709461A (en) * | 2016-12-28 | 2017-05-24 | 中国科学院深圳先进技术研究院 | Video based behavior recognition method and device |
CN107316007A (en) * | 2017-06-07 | 2017-11-03 | 浙江捷尚视觉科技股份有限公司 | A kind of monitoring image multiclass object detection and recognition methods based on deep learning |
-
2018
- 2018-05-30 CN CN201810534453.0A patent/CN108764148B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140254882A1 (en) * | 2013-03-11 | 2014-09-11 | Adobe Systems Incorporated | Optical Flow with Nearest Neighbor Field Fusion |
CN105512618A (en) * | 2015-11-27 | 2016-04-20 | 北京航空航天大学 | Video tracking method |
CN106709461A (en) * | 2016-12-28 | 2017-05-24 | 中国科学院深圳先进技术研究院 | Video based behavior recognition method and device |
CN107316007A (en) * | 2017-06-07 | 2017-11-03 | 浙江捷尚视觉科技股份有限公司 | A kind of monitoring image multiclass object detection and recognition methods based on deep learning |
Non-Patent Citations (4)
Title |
---|
ALAAELDIN EL-NOUBY 等: "Real-Time End-to-End Action Detection with Two-Stream Networks", 《ARXIV》 * |
CHRISTOPH FEICHTENHOFER 等: "Detect to Track and Track to Detect", 《ARXIV》 * |
PHILIPPE WEINZAEPFEL 等: "Learning to track for spatio-temporal action localization", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 * |
黄铁军 等: "多媒体技术研究:2013-面向智能视频监控的视觉感知与处理", 《中国图象图形学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447014A (en) * | 2018-11-07 | 2019-03-08 | 东南大学-无锡集成电路技术研究所 | A kind of online behavioral value method of video based on binary channels convolutional neural networks |
WO2020114120A1 (en) * | 2018-12-07 | 2020-06-11 | 深圳光启空间技术有限公司 | Method for identifying vehicle information, system, memory device, and processor |
CN111291779A (en) * | 2018-12-07 | 2020-06-16 | 深圳光启空间技术有限公司 | Vehicle information identification method and system, memory and processor |
CN109740454A (en) * | 2018-12-19 | 2019-05-10 | 贵州大学 | A kind of human body posture recognition methods based on YOLO-V3 |
CN109711344A (en) * | 2018-12-27 | 2019-05-03 | 东北大学 | A kind of intelligentized specific exceptions behavioral value method in front end |
CN109886165A (en) * | 2019-01-23 | 2019-06-14 | 中国科学院重庆绿色智能技术研究院 | A kind of action video extraction and classification method based on moving object detection |
CN111126153A (en) * | 2019-11-25 | 2020-05-08 | 北京锐安科技有限公司 | Safety monitoring method, system, server and storage medium based on deep learning |
CN111353452A (en) * | 2020-03-06 | 2020-06-30 | 国网湖南省电力有限公司 | Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images |
CN114049396A (en) * | 2021-11-05 | 2022-02-15 | 北京百度网讯科技有限公司 | Method and device for marking training image and tracking target, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN108764148B (en) | 2020-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764148A (en) | Multizone real-time action detection method based on monitor video | |
CN109472232B (en) | Video semantic representation method, system and medium based on multi-mode fusion mechanism | |
CN110472554B (en) | Table tennis action recognition method and system based on attitude segmentation and key point features | |
CN107679491B (en) | 3D convolutional neural network sign language recognition method fusing multimodal data | |
Joshi et al. | Robust sports image classification using InceptionV3 and neural networks | |
Xu et al. | Two-stream region convolutional 3D network for temporal activity detection | |
CN109815826A (en) | The generation method and device of face character model | |
Amirgholipour et al. | A-CCNN: adaptive CCNN for density estimation and crowd counting | |
CN108399435B (en) | Video classification method based on dynamic and static characteristics | |
Rangasamy et al. | Deep learning in sport video analysis: a review | |
CN110110648B (en) | Action nomination method based on visual perception and artificial intelligence | |
CN107633226A (en) | A kind of human action Tracking Recognition method and system | |
CN110969078A (en) | Abnormal behavior identification method based on human body key points | |
CN109214285A (en) | Detection method is fallen down based on depth convolutional neural networks and shot and long term memory network | |
CN108537181A (en) | A kind of gait recognition method based on the study of big spacing depth measure | |
CN111563404B (en) | Global local time representation method for video-based person re-identification | |
CN113963032A (en) | Twin network structure target tracking method fusing target re-identification | |
Flórez et al. | Hand gesture recognition following the dynamics of a topology-preserving network | |
Chalasani et al. | Egocentric gesture recognition for head-mounted ar devices | |
CN112597980A (en) | Brain-like gesture sequence recognition method for dynamic vision sensor | |
CN111368770B (en) | Gesture recognition method based on skeleton point detection and tracking | |
Chaudhary et al. | Tsnet: deep network for human action recognition in hazy videos | |
He et al. | What catches the eye? Visualizing and understanding deep saliency models | |
CN115410119A (en) | Violent movement detection method and system based on adaptive generation of training samples | |
CN109858351B (en) | Gait recognition method based on hierarchy real-time memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |