CN108764148A - Multizone real-time action detection method based on monitor video - Google Patents

Multizone real-time action detection method based on monitor video Download PDF

Info

Publication number
CN108764148A
CN108764148A CN201810534453.0A CN201810534453A CN108764148A CN 108764148 A CN108764148 A CN 108764148A CN 201810534453 A CN201810534453 A CN 201810534453A CN 108764148 A CN108764148 A CN 108764148A
Authority
CN
China
Prior art keywords
tube
action
classification
yolo
models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810534453.0A
Other languages
Chinese (zh)
Other versions
CN108764148B (en
Inventor
陈东岳
任方博
王森
贾同
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201810534453.0A priority Critical patent/CN108764148B/en
Publication of CN108764148A publication Critical patent/CN108764148A/en
Application granted granted Critical
Publication of CN108764148B publication Critical patent/CN108764148B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The multizone real-time action detection method based on monitor video that the invention discloses a kind of having following steps:Model training stage and test phase, wherein model training stage is to obtain training data:The database of the specific action marked;The dense optical flow of the video sequence in training data is calculated, the light stream sequence of the video sequence in training data is obtained, and the light stream image in light stream sequence is labeled;Using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively obtain RGB yolo v3 models and light stream yolo v3 models.The present invention can not only realize the space-time position detection to specific action in monitor video, and can realize the real-time processing to monitoring.

Description

Multizone real-time action detection method based on monitor video
Technical field
The invention belongs to computer vision fields, and in particular to the human action detecting system under monitor video scene.
Background technology
More and more universal with the application of monitor and control facility, more technologies based on monitoring are applied, and action recognition is made For one of wherein of great value technology, it is mainly used in indoor, man-machine device under the environment of plant interaction and public environment Security fields are used for the detection and identification of particular risk action.
The action identification method being mostly based in monitor video focuses primarily upon action recognition and the classification of entire scene In task, this kind of video is usually the good video clip of artificial treatment, is generally only acted comprising a kind of in video clip, but this Kind video and natural video clip differ greatly, and Task is also placed on detection operation in the entire time with a part of scholar The position for starting from receiving occurred on axis, but the beginning and end of action in video and dynamic is obtained in practical application It is all very useful to make the range that occurs in space, additionally while existing motion detection method is in existing database and competing Achieve good detection result in match, but these methods be typically all by entire video be divided into many fritters or Person handles entire video, then exports the space-time position acted in this section of video again, and to reach real-time action detection The other processing of video frame level will be realized, so such methods have no idea to be deployed in monitoring system.
It is universal with monitoring device, in monitor video the detection of human action be increasingly becoming a popular research and lead Domain, Wang L., Qiao Y., Tang X.'s " Action recognition with trajectory-pooled deep Convolutional descriptors. " are (in 2015IEEE Conference on Computer Vision and Pattern Recognition(CVPR)(2015).) pass through in method and integrate deep neural network and extract video features and utilization The feature of intensive track algorithm arrived.Realize the action recognition to entire video, D.Tran, L.Bourdev, R.Fergus, L.Torresani, and M.Paluri.'s " Learning spatiotemporal features with 3d Convolutional networks. " are (in 2015IEEE International Conference on Computer Vision (ICCV) (2015)) method proposes to extract the people in video with 3D convolution sum 3D pooling to form C3D frames Body motion characteristic, Simonyan K, Zisserman A.'s " Two-Stream Convolutional Networks for By the way that RGB is schemed in Action Recognition in Videos " (in Computational Linguistics, 2014) As sequential extraction procedures light stream sequence, merged with convolutional neural networks training and the feature that two networks obtain respectively to realize To the recognition effect of action.Although those above model achieves good effect, this method can only be realized to whole A video is identified, and is unable to the space-time position of location action.
" the Finding action tubes " of G.Gkioxari and J.Malik. is (in IEEE Int.Conf.on Computer Vision and Pattern Recognition, 2015.) action of each frame is mainly detected in Then action proposal that proposals reconnects each frame forms action sequence, J.Lu, r.Xu, and J.J.Corso " Human action segmentation with hierarchical supervoxel consistency " are (in IEEE Int.Conf.on Computer Vision and Pattern Recognition, June 2015) in propose a kind of layer The MRF models of secondaryization will there is high-level human motion and apparent low-level visual segment to connect to realize in video In segmentation to action, these methods mainly realize the segmentation that space is carried out to the action in video, and these algorithms need The other region proposals of a large amount of frame level are wanted to need a large amount of calculate.
" the Temporal Action Localization with Pyramid of of Yuan J, Ni B, Yang X Score Distribution Features " are (in IEEE:Computer Vision and Pattern Recognition.2016 iDT features score distribution pyramid feature (Pyramid of a kind of to video extraction is based in) Score Distribution Feature, PSDF), LSTM networks have been reused later, and PSDF characteristic sequences have been handled, And it handles to obtain the prediction of behavior segment according to the behavior classification confidence of the frame-level of output.Shou Z,Wang D, Chang S F.'s " Temporal Action Localization in Untrimmed Videos via Multi- Stage CNNs " are (in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)) in first use sliding window method generate sizes video clip (segment), reuse the multistage Network (Segment-CNN) handle, finally use non-maximization and inhibit, to remove the segment of overlapping, to complete prediction. Shou Z, Chan J, Zareian A, " CDC:Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos " are (in 2017IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)) in be based on C3D (3D CNN networks) devises a convolution against convolutional network (CDC), inputs a bit of video, the action classification of output frame rank Probability.The network is primarily used to be finely adjusted the trip boundary in temporal action detection so that action Boundary is more accurate, although frame above can reach real-time effect, algorithm above be mainly realization action when Between dimension accurately detect, and can not achieve action space-time detection.
J.C.van Gemert, M.Jain, E.Gati, and C.G.Snoek.'s " APT:Action localization Using unsupervised poly- in proposals from dense trajectories " (in BMVC, volume 2, page 4,2015) Class is proposed to generate the when empty action of one group of boundary frame.Since this method is based on intensive track characteristic, this method can not Detect the action characterized by small movement.P.Weinzaepfel, Z.Harchaoui, and C.Schmid.'s " Learning to track for spatio-temporal action localization”(IEEE Computer Vision and Pattern Recognition, 2015.) it is held by proposing to be combined the regions frame level EdgeBoxes with tracing detection frame The space-time detection that action is made.However, the detection of the time dimension of action is still through the multi-scale sliding window mouth on each track To realize so that for longer video sequence this method inefficiency.
Invention content
The present invention proposes that a kind of multizone based on monitor video is real for some problems existing for existing motion detection When motion detection method.The technological means that the present invention uses is as follows:
A kind of multizone real-time action detection method based on monitor video, it is characterised in that there are following steps:
Model training stage:
A1, training data is obtained:The database of the specific action marked;
A2, calculate training data in video sequence dense optical flow, obtain training data in video sequence light stream Sequence, and the light stream image in light stream sequence is labeled;
A3, using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively Obtain RGB yolo v3 models and light stream yolo v3 models;
Test phase:
B1, the sparse optical flow image sequence that video is extracted by pyramid Lucas-Kanande optical flow methods, then video RGB image sequence and sparse optical flow image sequence be respectively fed in RGB yolo v3 models and light stream yolo v3 models, RGB A series of detection blocks of yolo v3 models output are detected using first n of non-maxima suppression method extraction everything classification FrameI=1 ... n, there are one the labels of action classification and a probability score for belonging to the action for each detection blockA series of detection blocks of light stream yolo v3 models output extract everything class using non-maxima suppression method Other preceding n detection blockK=1 ... n, each detection block is there are one the label of action classification and belongs to the one of the action A probability score sLight The detection block of RGB yolo v3 models and the output of light stream yolo v3 models is traversed respectively, each The detection block of RGB yolo v3 models outputWith the detection of the same action classification of light stream yolo v3 models output FrameIt hands over and compares, and the detection of the same action classification maximum friendship and than the output of corresponding light stream yolo v3 models Frame is set asIf maximum friendship is simultaneously compared more than threshold k, corresponding two RGB yolo v3 models and light stream yolo The probability score of the detection block of v3 models output is fused toDetection as RGB yolo v3 models output FrameConfidence level,Meet following formula:
Wherein,It indicatesWithFriendship and ratio,For with It hands over and than maximum same action classificationProbability score;
B2, according to fusion obtain each RGB yolo v3 models output detection block each action classification confidence Score is spent, the detection block connected between the RGB image sequence of video forms tube:
Tube is initialized, is carried out using the detection block of the first frame image in the RGB image sequence of video initial The first frame image changed in tube, such as the RGB image sequence of video produces n detection block, then initial n tube, video RGB image sequence in the tube number of a certain action classification of first frame image be:
nClassification(1)=n;
Following operation is carried out to all action classifications respectively:
The detection block that S1, each tube and t frames of matching generate, traversal first belongs to the tube of same action classification, if this is dynamic There is n tube as classification, the average value of confidence levels of the tube per frame is sought each tube, as the value of the tube, and to this The value of n tube of action classification carries out descending arrangement form list listClassification, when determining the action classification of each tube, definition One list I={ lt-k+1…ltIt is used for determining the action classification of tube, list I={ lt-k+1…ltBe used for storing tube's The action classification of k frames afterwards;
S2, traversal of lists listClassificationIn t framesI=1 ... n, therefrom selection meet the following conditionsIt is added in tube:
Traversal of lists listClassificationIn tube, and select in t frames and tube is the same as action classificationIt is matched, such as Fruit shouldFriendship and ratio with the detection block in the last frame image of tube are more than threshold value d, then thisIt is added to Queue H_listClassificationIn;
IfThen select H_listClassificationMiddle confidence level is highestIt is added in tube, and In traversal t frames againWhen i=1 ... n, it is highest to reject confidence level
IfThen the tube is added without anyAnd remain unchanged, if continuous k frames Tube is not added newThen terminate the tube;
If t frames have and are not matchedIt is denoted asAll tube are then traversed to ask respectivelyWith it is all The friendships of tube last frames and ratio, and choose friendship and than being more than threshold value k, and hand over and than maximum tube, be denoted as T*, It is added in the tube, T*Meet following formula:
IfThen IfThenTiFor i-th of tube, Ti (t-1) the t-1 frames for being i-th of tube;
If still there is the detection block not being matched in t frames, using the detection block as starting point, new tube is generated, is used in combination The detection block initializes tube as the first frame image of the tube;
S3, all tube have been matchedAfterwards, the action classification list I=of the rear k frames of each tube is updated {lt-k+1…lt, wherein ltFor the action classification of the t frames of tube, the action classification L of each tube is updated, each tube is counted Rear k frames action classification I={ lt-k+1…lt, wherein action classification L of most action classifications as the tube, meet with Lower formula:
If li=c, then g (li, c)=1;If li≠ c, then g (li, c)=0, c be a certain action classification, that is, count I ={ lt-k+1…ltIn action classification, the largest number of action classifications are the action classification of the tube.
In the step A1, the database of the specific action marked is the Action Detection data of UCF101 Collection.
In the step A2, training data is calculated using the calcOpticalFlowFarneback functions in the libraries OpenCV In video sequence dense optical flow.
Compared with prior art, the present invention can not only realize the space-time position detection to specific action in monitor video, and And it can realize real-time processing to monitoring.
The present invention can be widely popularized in fields such as computer visions based on the above reasons.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to do simply to introduce, it should be apparent that, the accompanying drawings in the following description is this hair Some bright embodiments for those of ordinary skill in the art without having to pay creative labor, can be with Obtain other attached drawings according to these attached drawings.
Fig. 1 be the present invention specific implementation mode in hand over and than calculate schematic diagram.
Fig. 2 be the present invention specific implementation mode in the multizone real-time action detection method based on monitor video entirety Schematic diagram.
Fig. 3 be the present invention specific implementation mode in the multizone real-time action detection method program flow based on monitor video Cheng Tu.
Fig. 4 be the present invention specific implementation mode in a certain frame image processing procedure schematic diagram.
Fig. 5 be the present invention specific implementation mode in consecutive image sequence processing procedure schematic diagram.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art The every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As Figure 1-Figure 5, a kind of multizone real-time action detection method based on monitor video has following steps:
Model training stage:
A1, training data is obtained:The database of the specific action marked;
A2, calculate training data in video sequence dense optical flow, obtain training data in video sequence light stream Sequence, and the light stream image in light stream sequence is labeled;
A3, using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively Obtain RGB yolo v3 models and light stream yolo v3 models;
Test phase:
B1, the sparse optical flow image sequence that video is extracted by pyramid Lucas-Kanande optical flow methods, then video RGB image sequence and sparse optical flow image sequence be respectively fed in RGB yolo v3 models and light stream yolo v3 models, RGB A series of detection blocks of yolo v3 models output are detected using first n of non-maxima suppression method extraction everything classification FrameI=1 ... n, there are one the labels of action classification and a probability score for belonging to the action for each detection blockA series of detection blocks of light stream yolo v3 models output extract everything class using non-maxima suppression method Other preceding n detection blockK=1 ... n, each detection block is there are one the label of action classification and belongs to the one of the action A probability score sLight The detection block of RGB yolo v3 models and the output of light stream yolo v3 models is traversed respectively, each The detection block of RGB yolo v3 models outputWith the detection of the same action classification of light stream yolo v3 models output FrameIt hands over and compares, and the detection of the same action classification maximum friendship and than the output of corresponding light stream yolo v3 models Frame is set asIf maximum friendship is simultaneously compared more than threshold k, corresponding two RGB yolo v3 models and light stream yolo The probability score of the detection block of v3 models output is fused toDetection as RGB yolo v3 models output FrameConfidence level,Meet following formula:
Wherein,It indicatesWithFriendship and ratio,For with It hands over and than maximum same action classificationProbability score,Indicate probability score, class such as, the friendship of image A and B and ratio IOU (A, B) can as shown in Figure 1,
Wherein area (A) is expressed as the area of image A, and area (A) ∩ area (B) are the area of image intersection.
B2, according to fusion obtain each RGB yolo v3 models output detection block each action classification confidence Score is spent, the detection block connected between the RGB image sequence of video forms tube:
Tube is initialized, is carried out using the detection block of the first frame image in the RGB image sequence of video initial The first frame image changed in tube, such as the RGB image sequence of video produces n detection block, then initial n tube, video RGB image sequence in the tube number of a certain action classification of first frame image be:
nClassification(1)=n;
Following operation is carried out to all action classifications respectively:
The detection block that S1, each tube and t frames of matching generate, traversal first belongs to the tube of same action classification, if this is dynamic There is n tube as classification, the average value of confidence levels of the tube per frame is sought each tube, as the value of the tube, and to this The value of n tube of action classification carries out descending arrangement form list listClassification,Determine each tube's When action classification, a list I={ l is definedt-k+1…ltIt is used for determining the action classification of tube, list I={ lt-k+1… ltBe used for store tube rear k frames action classification;
S2, traversal of lists listClassificationIn t framesI=1 ... n, therefrom selection meet the following conditionsIt is added in tube:
Traversal of lists listClassificationIn tube, and select in t frames and tube is the same as action classificationIt is matched, If shouldFriendship and ratio with the detection block in the last frame image of tube are more than threshold value d, then thisIt is added To queue H_listClassificationIn;
IfThen select H_listClassificationMiddle confidence level is highestIt is added in tube, and In traversal t frames againWhen i=1 ... n, it is highest to reject confidence level
IfThen the tube is added without anyAnd remain unchanged, if continuous k frames Tube is not added newThen terminate the tube;
If t frames have and are not matchedIt is denoted asAll tube are then traversed to ask respectivelyWith it is all The friendships of tube last frames and ratio, and choose friendship and than being more than threshold value k, and hand over and than maximum tube, be denoted as T*, It is added in the tube, T*Meet following formula:
IfThen IfThenTiFor i-th of tube, Ti(t-1) the t-1 frames for being i-th of tube;
If still there is the detection block not being matched in t frames, using the detection block as starting point, new tube is generated, is used in combination The detection block initializes tube as the first frame image of the tube;
S3, all tube have been matchedAfterwards, the action classification list I=of the rear k frames of each tube is updated {lt-k+1…lt, wherein ltFor the action classification of the t frames of tube, the action classification L of each tube is updated, each tube is counted Rear k frames action classification I={ lt-k+1…lt, wherein action classification L of most action classifications as the tube, meet with Lower formula:
If li=c, then g (li, c)=1;If li≠ c, then g (li, c)=0, c be a certain action classification, that is, count I ={ lt-k+1…ltIn action classification, the largest number of action classifications are the action classification of the tube.
(a) indicates the RGB image sequence of video in Fig. 2;(b) indicate optical flow algorithm test phase using golden word in OpenCV Tower Lucas-Kanande optical flow methods extract sparse optical flow image, and the training stage is extraction dense optical flow image;(c) be The sparse optical flow image arrived;(d) it is motion detection model, a RGB yolo to be trained using the RGB image sequence of video V3 models, another is the light stream yolo v3 models trained with light stream sequence;(e) inspection of RGB yolo v3 models output is indicated Survey result;(f) testing result of light stream yolo v3 models is indicated;(g) indicate to merge the output of two models as a result, being had There is the feature of more preferable robustness;(h) indicate that the feature obtained using fusion connects the detection block between the RGB image sequence of video It is connected in tube.
Fig. 4 (a) is the image in the RGB image sequence of video;(b) image pair in the RGB image sequence of video is indicated The light stream image answered;(c) indicate what the image in the RGB image sequence of video exported after RGB yolo v3 model treatments Testing result;(d) testing result that light stream image exports after light stream yolo v3 model treatments is indicated;
Consecutive image sequence in Fig. 5 videos;(a) it indicates equidistantly to take the image in the RGB image sequence of video;(b) Indicate the corresponding light stream sequence of image in the RGB image sequence of video;(c) image in the RGB image sequence of video is indicated The testing result exported after RGB yolo v3 model treatments;(d) indicate light stream sequence by light stream yolo v3 models The testing result exported after reason;(e) tube that the testing result by fusion (c) and (d) obtains is indicated;
In the step A1, the database of the specific action marked is the Action Detection data of UCF101 Collection.
In the step A2, training data is calculated using the calcOpticalFlowFarneback functions in the libraries OpenCV In video sequence dense optical flow.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Present invention has been described in detail with reference to the aforementioned embodiments for pipe, it will be understood by those of ordinary skill in the art that:Its according to So can with technical scheme described in the above embodiments is modified, either to which part or all technical features into Row equivalent replacement;And these modifications or replacements, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (3)

1. a kind of multizone real-time action detection method based on monitor video, it is characterised in that have following steps:
Model training stage:
A1, training data is obtained:The database of the specific action marked;
A2, calculate training data in video sequence dense optical flow, obtain training data in video sequence light stream sequence, And the light stream image in light stream sequence is labeled;
A3, using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively obtain RGB yolo v3 models and light stream yolo v3 models;
Test phase:
B1, the sparse optical flow image sequence that video is extracted by pyramid Lucas-Kanande optical flow methods, then video RGB image sequence and sparse optical flow image sequence are respectively fed in RGB yolo v3 models and light stream yolo v3 models, RGB A series of detection blocks of yolo v3 models output are detected using first n of non-maxima suppression method extraction everything classification FrameThere are one the labels of action classification and a probability score for belonging to the action for each detection blockA series of detection blocks of light stream yolo v3 models output extract everything class using non-maxima suppression method Other preceding n detection block Each detection block is there are one the label of action classification and belongs to the one of the action A probability score The detection block of RGB yolo v3 models and the output of light stream yolo v3 models is traversed respectively, each The detection block of RGB yolo v3 models outputWith the detection of the same action classification of light stream yolo v3 models output FrameIt hands over and compares, and the detection of the same action classification maximum friendship and than the output of corresponding light stream yolo v3 models Frame is set asIf maximum friendship is simultaneously compared more than threshold k, corresponding two RGB yolo v3 models and light stream yolo The probability score of the detection block of v3 models output is fused toDetection as RGB yolo v3 models output FrameConfidence level,Meet following formula:
Wherein,It indicatesWithFriendship and ratio,For withIt hands over simultaneously Than maximum same action classificationProbability score;
B2, divided according to the confidence level of each action classification for the detection block for merging obtained each RGB yolo v3 models output Number, the detection block connected between the RGB image sequence of video form tube:
Tube is initialized, is initialized using the detection block of the first frame image in the RGB image sequence of video tube;
Following operation is carried out to all action classifications respectively:
The detection block that S1, each tube and t frames of matching generate, traversal first belongs to the tube of same action classification, if the action class There is not n tube, the average value of confidence levels of the tube per frame is sought each tube, as the value of the tube, and to the action The value of n tube of classification carries out descending arrangement form list listClassification, when determining the action classification of each tube, define one A list I={ lt-k+1…ltIt is used for determining the action classification of tube, list I={ lt-k+1…ltIt is used for storing the rear k of tube The action classification of frame;
S2, traversal of lists listClassificationIn t framesTherefrom selection meets the following conditionsAdd It is added in tube:
Traversal of lists listClassificationIn tube, and select in t frames and tube is the same as action classificationIt is matched, if shouldFriendship and ratio with the detection block in the last frame image of tube are more than threshold value d, then thisIt is added to queue H_listClassificationIn;
IfThen select H_listClassificationMiddle confidence level is highestIt is added in tube, and again Secondary traversal t framesWhen, it is highest to reject confidence level
IfThen the tube is added without anyAnd remain unchanged, if continuous k frames tube is It is not added newThen terminate the tube;
If t frames have and are not matchedIt is denoted asAll tube are then traversed to ask respectivelyWith all tube The friendship of last frame and ratio, and choose friendship and than being more than threshold value k, and hand over and than maximum tube, be denoted as T*,It is added Into the tube, T*Meet following formula:
IfThen IfThenTiFor i-th of tube, Ti(t-1) the t-1 frames for being i-th of tube;
If still there is the detection block not being matched in t frames, using the detection block as starting point, new tube is generated, the inspection is used in combination It surveys frame and initializes tube as the first frame image of the tube;
S3, all tube have been matchedAfterwards, the action classification list I={ l of the rear k frames of each tube are updatedt-k+1… lt, wherein ltFor the action classification of the t frames of tube, the action classification L of each tube is updated, counts the rear k frames of each tube Action classification I={ lt-k+1…lt, wherein action classification L of most action classifications as the tube, meets following formula:
If li=c, then g (li, c)=1;If li≠ c, then g (li, c)=0, c be a certain action classification, that is, count I= {lt-k+1…ltIn action classification, the largest number of action classifications are the action classification of the tube.
2. the multizone real-time action detection method according to claim 1 based on monitor video, it is characterised in that:It is described In step A1, the database of the specific action marked is the Action Detection data sets of UCF101.
3. the multizone real-time action detection method according to claim 1 based on monitor video, it is characterised in that:It is described In step A2, the video sequence in training data is calculated using the calcOpticalFlowFarneback functions in the libraries OpenCV Dense optical flow.
CN201810534453.0A 2018-05-30 2018-05-30 Multi-region real-time action detection method based on monitoring video Active CN108764148B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810534453.0A CN108764148B (en) 2018-05-30 2018-05-30 Multi-region real-time action detection method based on monitoring video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810534453.0A CN108764148B (en) 2018-05-30 2018-05-30 Multi-region real-time action detection method based on monitoring video

Publications (2)

Publication Number Publication Date
CN108764148A true CN108764148A (en) 2018-11-06
CN108764148B CN108764148B (en) 2020-03-10

Family

ID=64003645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810534453.0A Active CN108764148B (en) 2018-05-30 2018-05-30 Multi-region real-time action detection method based on monitoring video

Country Status (1)

Country Link
CN (1) CN108764148B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447014A (en) * 2018-11-07 2019-03-08 东南大学-无锡集成电路技术研究所 A kind of online behavioral value method of video based on binary channels convolutional neural networks
CN109711344A (en) * 2018-12-27 2019-05-03 东北大学 A kind of intelligentized specific exceptions behavioral value method in front end
CN109740454A (en) * 2018-12-19 2019-05-10 贵州大学 A kind of human body posture recognition methods based on YOLO-V3
CN109886165A (en) * 2019-01-23 2019-06-14 中国科学院重庆绿色智能技术研究院 A kind of action video extraction and classification method based on moving object detection
CN111126153A (en) * 2019-11-25 2020-05-08 北京锐安科技有限公司 Safety monitoring method, system, server and storage medium based on deep learning
WO2020114120A1 (en) * 2018-12-07 2020-06-11 深圳光启空间技术有限公司 Method for identifying vehicle information, system, memory device, and processor
CN111353452A (en) * 2020-03-06 2020-06-30 国网湖南省电力有限公司 Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images
CN114049396A (en) * 2021-11-05 2022-02-15 北京百度网讯科技有限公司 Method and device for marking training image and tracking target, electronic equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140254882A1 (en) * 2013-03-11 2014-09-11 Adobe Systems Incorporated Optical Flow with Nearest Neighbor Field Fusion
CN105512618A (en) * 2015-11-27 2016-04-20 北京航空航天大学 Video tracking method
CN106709461A (en) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 Video based behavior recognition method and device
CN107316007A (en) * 2017-06-07 2017-11-03 浙江捷尚视觉科技股份有限公司 A kind of monitoring image multiclass object detection and recognition methods based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140254882A1 (en) * 2013-03-11 2014-09-11 Adobe Systems Incorporated Optical Flow with Nearest Neighbor Field Fusion
CN105512618A (en) * 2015-11-27 2016-04-20 北京航空航天大学 Video tracking method
CN106709461A (en) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 Video based behavior recognition method and device
CN107316007A (en) * 2017-06-07 2017-11-03 浙江捷尚视觉科技股份有限公司 A kind of monitoring image multiclass object detection and recognition methods based on deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALAAELDIN EL-NOUBY 等: "Real-Time End-to-End Action Detection with Two-Stream Networks", 《ARXIV》 *
CHRISTOPH FEICHTENHOFER 等: "Detect to Track and Track to Detect", 《ARXIV》 *
PHILIPPE WEINZAEPFEL 等: "Learning to track for spatio-temporal action localization", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *
黄铁军 等: "多媒体技术研究:2013-面向智能视频监控的视觉感知与处理", 《中国图象图形学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447014A (en) * 2018-11-07 2019-03-08 东南大学-无锡集成电路技术研究所 A kind of online behavioral value method of video based on binary channels convolutional neural networks
WO2020114120A1 (en) * 2018-12-07 2020-06-11 深圳光启空间技术有限公司 Method for identifying vehicle information, system, memory device, and processor
CN111291779A (en) * 2018-12-07 2020-06-16 深圳光启空间技术有限公司 Vehicle information identification method and system, memory and processor
CN109740454A (en) * 2018-12-19 2019-05-10 贵州大学 A kind of human body posture recognition methods based on YOLO-V3
CN109711344A (en) * 2018-12-27 2019-05-03 东北大学 A kind of intelligentized specific exceptions behavioral value method in front end
CN109886165A (en) * 2019-01-23 2019-06-14 中国科学院重庆绿色智能技术研究院 A kind of action video extraction and classification method based on moving object detection
CN111126153A (en) * 2019-11-25 2020-05-08 北京锐安科技有限公司 Safety monitoring method, system, server and storage medium based on deep learning
CN111353452A (en) * 2020-03-06 2020-06-30 国网湖南省电力有限公司 Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images
CN114049396A (en) * 2021-11-05 2022-02-15 北京百度网讯科技有限公司 Method and device for marking training image and tracking target, electronic equipment and medium

Also Published As

Publication number Publication date
CN108764148B (en) 2020-03-10

Similar Documents

Publication Publication Date Title
CN108764148A (en) Multizone real-time action detection method based on monitor video
CN109472232B (en) Video semantic representation method, system and medium based on multi-mode fusion mechanism
CN110472554B (en) Table tennis action recognition method and system based on attitude segmentation and key point features
CN107679491B (en) 3D convolutional neural network sign language recognition method fusing multimodal data
Joshi et al. Robust sports image classification using InceptionV3 and neural networks
Xu et al. Two-stream region convolutional 3D network for temporal activity detection
CN109815826A (en) The generation method and device of face character model
Amirgholipour et al. A-CCNN: adaptive CCNN for density estimation and crowd counting
CN108399435B (en) Video classification method based on dynamic and static characteristics
Rangasamy et al. Deep learning in sport video analysis: a review
CN110110648B (en) Action nomination method based on visual perception and artificial intelligence
CN107633226A (en) A kind of human action Tracking Recognition method and system
CN110969078A (en) Abnormal behavior identification method based on human body key points
CN109214285A (en) Detection method is fallen down based on depth convolutional neural networks and shot and long term memory network
CN108537181A (en) A kind of gait recognition method based on the study of big spacing depth measure
CN111563404B (en) Global local time representation method for video-based person re-identification
CN113963032A (en) Twin network structure target tracking method fusing target re-identification
Flórez et al. Hand gesture recognition following the dynamics of a topology-preserving network
Chalasani et al. Egocentric gesture recognition for head-mounted ar devices
CN112597980A (en) Brain-like gesture sequence recognition method for dynamic vision sensor
CN111368770B (en) Gesture recognition method based on skeleton point detection and tracking
Chaudhary et al. Tsnet: deep network for human action recognition in hazy videos
He et al. What catches the eye? Visualizing and understanding deep saliency models
CN115410119A (en) Violent movement detection method and system based on adaptive generation of training samples
CN109858351B (en) Gait recognition method based on hierarchy real-time memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant