CN108764148A - Multizone real-time action detection method based on monitor video - Google Patents

Multizone real-time action detection method based on monitor video Download PDF

Info

Publication number
CN108764148A
CN108764148A CN201810534453.0A CN201810534453A CN108764148A CN 108764148 A CN108764148 A CN 108764148A CN 201810534453 A CN201810534453 A CN 201810534453A CN 108764148 A CN108764148 A CN 108764148A
Authority
CN
China
Prior art keywords
tube
action
frame
detection
yolo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810534453.0A
Other languages
Chinese (zh)
Other versions
CN108764148B (en
Inventor
陈东岳
任方博
王森
贾同
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201810534453.0A priority Critical patent/CN108764148B/en
Publication of CN108764148A publication Critical patent/CN108764148A/en
Application granted granted Critical
Publication of CN108764148B publication Critical patent/CN108764148B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The multizone real-time action detection method based on monitor video that the invention discloses a kind of having following steps:Model training stage and test phase, wherein model training stage is to obtain training data:The database of the specific action marked;The dense optical flow of the video sequence in training data is calculated, the light stream sequence of the video sequence in training data is obtained, and the light stream image in light stream sequence is labeled;Using in training data video sequence and light stream sequence target detection model yolo v3 are respectively trained, respectively obtain RGB yolo v3 models and light stream yolo v3 models.The present invention can not only realize the space-time position detection to specific action in monitor video, and can realize the real-time processing to monitoring.

Description

Multi-region real-time action detection method based on monitoring video
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a human body action detection system under a monitoring video scene.
Background
As the application of monitoring facilities is more and more popular, more monitoring-based technologies are applied, and motion recognition is one of the valuable technologies, mainly applied to the interaction of human-machine equipment in indoor and factory environments, and the detection and recognition of specific dangerous motions in the field of public environment safety.
Most motion recognition methods based on surveillance video mainly focus on motion recognition and classification tasks of the whole scene, such videos are generally manually processed video clips, and video clips generally only contain one type of motion, but such videos are very different from natural video clips, and a part of scholars are also used for placing research tasks at positions where detection motion occurs in the whole time axis and starts to be received, but in real-world applications, it is very useful to acquire the start and end of motion in videos and the range where motion occurs in space, and although existing motion detection methods have good detection effects in existing databases and competitions, these methods generally divide the whole video into a plurality of small blocks or process the whole video, and then output the spatio-temporal positions of motion in the video, however, the video frame level processing is required to achieve real-time motion detection, so that the method has no way to be deployed in a monitoring system.
With the popularization of monitoring devices, the detection of human body motion in monitoring video is becoming a popular research field, and the method of "Action Recognition with target-oriented adaptive descriptors" (in 2015IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)), by Wang l, Qiao y, Tang x, extracts video features and features obtained by using dense tracking algorithm by integrating deep neural network. To realize the action Recognition of the whole video, the method of "Learning spatial features with 3D connected visual Networks" (in 2015IEEE International Conference on computer vision) (2015)) of d.tran, l.bourdev, r.fergus, l.torresani, and m.paluli proposes to use 3D volume and 3D position to form C3D framework to extract the human action features in the video, and the method of "Two-Stream connectivity Networks for action Recognition video" (in computer linear Networks 2014) of sismonan K, Zisserman a. extracts the optical flow sequence by extracting the RGB image sequence, trains with Convolutional neural Networks respectively, and fuses the features obtained by the Two Networks to realize the action Recognition effect. Although the above models achieve good results, the method can only realize the identification of the whole video and cannot locate the spatiotemporal position of the motion.
G. gkioxaari and j.bright, "filing action tubes" (in IEEE int.conf.on Computer Vision and Pattern Recognition,2015.) mainly detects the action prosages of each frame and then concatenates the action prosages of each frame to form an action sequence, j.lu, r.xu, and j.j.kernel "Human action segmentation with hierarchical temporal locality consistency" (in IEEE int.conf.computer Vision and Pattern Recognition, June 2015) proposes a hierarchical MRF model to segment low-level video segments with high levels of Human motion and appearance to achieve segmentation of actions in video, which mainly achieves spatial actions in video, and a large number of frame-level algorithms are required to compute these frame levels.
Yuan J, Ni B, Yang X's "Temporal Action Localization with Pyramid of score Distribution Features" (in IEEE: Computer Vision and Pattern recognition.2016) extracts a score Distribution Pyramid Feature (PSDF) from the video based on iDT Feature, then uses LSTM network to process the PSDF Feature sequence, and processes according to the output frame-level behavior category confidence score to obtain the prediction of behavior segment. In "Temporal Action Localization in infinite video Multi-stage CNNs" (CVPR (2016)) of IEEE Conference on Computer Vision and Pattern Recognition, etc., of Shou Z, wang d, Chang S f, first, video segments (segments) of various sizes are generated by using a sliding window method, then processed by using a Multi-stage network (Segment-CNN), and finally, non-maximization suppression is adopted to remove overlapped segments, thereby completing prediction. "CDC: Convolitional-De-Convolitional Networks for precision Temporal Action Localization in unknown Videos" of Shou Z, Chan J, Zareian A (2017) a Convolutional inverse Convolutional network (CDC) was designed based on C3D (3D CNN network), a short segment of video was input, and a frame-level Action category probability was output. The network is mainly used for finely adjusting the action boundary in the temporal action detection, so that the action boundary is more accurate, the upper framework can achieve a real-time effect, but the upper algorithm mainly realizes the accurate detection of the action in the time dimension, but cannot realize the space-time detection of the action.
An unsupervised clustering is used in "APT: Action localization and routes" (in BMVC, volume 2, page 4,2015) of j.c.van Gemert, m.jain, e.gai, and c.g.snoek to generate a set of bounding box spatio-temporal Action proposals. Since the method is based on dense trajectory features, the method cannot detect motion characterized by small motion. "Learninggto track for spatial-temporal action localization" (IEEE Computer Vision and Pattern Recognition,2015.) by P.Weinzazepfel, Z.Harchaoui, and C.Schmid. performs spatiotemporal detection of actions by combining frame-level EdgeBox region proposals with a tracking detection frame. However, the detection of the temporal dimension of the motion is still achieved by a multi-scale sliding window on each track, making the method inefficient for longer video sequences.
Disclosure of Invention
The invention provides a multi-region real-time action detection method based on a monitoring video, aiming at some problems existing in the existing action detection. The technical means adopted by the invention are as follows:
a multi-region real-time action detection method based on a surveillance video is characterized by comprising the following steps:
a model training stage:
a1, acquiring training data: a database of labeled specific actions;
a2, calculating dense optical flows of video sequences in training data, acquiring optical flow sequences of the video sequences in the training data, and labeling optical flow images in the optical flow sequences;
a3, respectively training a target detection model yolo v3 by utilizing a video sequence and an optical flow sequence in training data to respectively obtain an RGB yolo v3 model and an optical flow yolo v3 model;
and (3) a testing stage:
b1, extracting a sparse optical flow image sequence of the video by a pyramid Lucas-Kanande optical flow method, then respectively sending the RGB image sequence and the sparse optical flow image sequence of the video into an RGB yolo v3 model and an optical flow yolo v3 model, and extracting the first n detection frames of all action categories by using a non-maximum suppression method through a series of detection frames output by the RGByolo v3 modeli-1 … n, each detection box having a label for a category of action and a probability score for the actionA series of detection frames output by the optical flow yolo v3 model uses a non-maximum suppression method to extract the first n detection frames of all action classesk 1 … n, each detection box having a label for the class of action and a probability score s for the actionLight (es) Respectively traversing the detection frames output by the RGB yolo v3 model and the optical flow yolo v3 model, and the detection frame output by each RGB yolo v3 modelDetection frame of the same action category as that output by optical flow yolo v3 modelMaking cross comparison, and outputting the optical flow yolo v3 model corresponding to the maximum cross comparison to the detection frame of the same action categoryIs arranged asIf the maximum intersection ratio is larger than the threshold value K, fusing the probability scores of the detection frames output by the corresponding two RGB yolo v3 models and the optical flow yolo v3 models intoDetection frame output as the RGB yolo v3 modelThe degree of confidence of (a) is,the following formula is satisfied:
wherein,to representAndthe cross-over-cross-over ratio of (c),is prepared by reacting withOf the same action class with the largest cross-over ratioA probability score;
b2, connecting the detection frames between the RGB image sequences of the video to form a tube according to the confidence score of each action type of the detection frame output by each RGB yolo v3 model obtained by fusion:
initializing a tube, and initializing the tube by using a detection frame of a first frame image in the RGB image sequence of the video, for example, if n detection frames are generated for the first frame image in the RGB image sequence of the video, the number of tubes of a certain action category of the first frame image in the RGB image sequence of the video is the following:
ncategories(1)=n;
The following operations are performed for all action categories, respectively:
s1, matching each tube and the detection frame generated by t frame, firstly traversing tube belonging to the same action type, if there are n tubes in the action type, calculating the average value of confidence of each frame of the tube for each tube as the value of the tube, and arranging the values of the n tubes in descending order to form listCategoriesWhen determining the action category of each tube, a list I ═ l is definedt-k+1…ltUsed to determine the action category of tube, list I ═ lt-k+1…ltThe action category of the k frame after tube is stored;
s2, traversing listCategoriesAnd in t framesi is 1 … n, and the one satisfying the following conditions is selectedAddition to tube:
traverse listCategoriesAnd selects t frames and tube of the same action categoryPerforming a match ifAnd last of tubeIf the intersection ratio of the detection frames in the frame image is larger than the threshold value d, the frame image is processedAdd to queue H _ listCategoriesPerforming the following steps;
if it is notPick H _ listCategoriesWith highest confidence level in the middleAdded to the tube and traversed the t frame againWhen i is 1 … n, the confidence coefficient is highest
If it is notThen the tube does not add anythingAnd remains unchanged if no new frame tube is added for consecutive k framesTerminating the tube;
if t frames have not been matchedIs marked asThen go through all tube to find outAnd stationThe intersection ratio of the last frame of some tube is selected, the intersection ratio is larger than a threshold k, the tube with the maximum intersection ratio is marked as T*Handle barAdded to the tube, T*The following formula is satisfied:
if it is notThen If it is notThenTiIs the ith tube, Ti(t-1) the t-1 th frame of the ith tube;
if the t-th frame still has the detection frame which is not matched, generating a new tube by taking the detection frame as a starting point, and initializing the tube by taking the detection frame as a first frame image of the tube;
s3, matching all tubeThen, the action category list I of the k-frame after each tube is updated to { l ═ lt-k+1…ltIn which ltFor the action type of t-th frame of tube, update action type L of each tube, and count action type I of k-th frame of each tube as { L }t-k+1…ltOf which the most action categoriesThe action type L of the tube satisfies the following equation:
if l isiC, then g (l)iC) 1; if l isiNot equal to c, then g (l)iC) is 0, c is a certain action category, i.e. the statistic I is { l ═t-k+1…ltThe action type with the largest number is the action type of the tube.
In step a1, the database of the labeled specific Action is the Action Detection data set of UCF 101.
In the step a2, a dense optical flow of the video sequence in the training data is calculated by using a calcoptical flow farneback function in the OpenCV library.
Compared with the prior art, the invention not only can realize the detection of the space-time position of the specific action in the monitoring video, but also can realize the real-time processing of the monitoring.
Based on the reasons, the invention can be widely popularized in the fields of computer vision and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of cross-over ratio calculation in an embodiment of the present invention.
Fig. 2 is a general schematic diagram of a multi-region real-time motion detection method based on surveillance video according to an embodiment of the present invention.
Fig. 3 is a flowchart of a multi-region real-time motion detection method based on surveillance video according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a processing procedure of a frame image according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a process for processing a sequence of consecutive images according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 to 5, a multi-region real-time motion detection method based on surveillance video includes the following steps:
a model training stage:
a1, acquiring training data: a database of labeled specific actions;
a2, calculating dense optical flows of video sequences in training data, acquiring optical flow sequences of the video sequences in the training data, and labeling optical flow images in the optical flow sequences;
a3, respectively training a target detection model yolo v3 by utilizing a video sequence and an optical flow sequence in training data to respectively obtain an RGB yolo v3 model and an optical flow yolo v3 model;
and (3) a testing stage:
b1, extracting a sparse optical flow image sequence of the video by a pyramid Lucas-Kanande optical flow method, then respectively sending the RGB image sequence and the sparse optical flow image sequence of the video into an RGB yolo v3 model and an optical flow yolo v3 model, and extracting the first n detection frames of all action categories by using a non-maximum suppression method through a series of detection frames output by the RGByolo v3 modeli-1 … n, each detection box having a label for a category of action and a probability score for the actionA series of detection frames output by the optical flow yolo v3 model uses a non-maximum suppression method to extract the first n detection frames of all action classesk 1 … n, each detection box having a label for the class of action and a probability score s for the actionLight (es) Respectively traversing the detection frames output by the RGB yolo v3 model and the optical flow yolo v3 model, and the detection frame output by each RGB yolo v3 modelDetection frame of the same action category as that output by optical flow yolo v3 modelMaking a cross-over ratio, and setting a detection frame of the same action type output by the optical flow yolo v3 model corresponding to the maximum cross-over ratio as a detection frame of the same action typeIf the maximum intersection ratio is larger than the threshold value K, fusing the probability scores of the detection frames output by the corresponding two RGB yolo v3 models and the optical flow yolo v3 models intoDetection frame output as the RGB yolo v3 modelThe degree of confidence of (a) is,the following formula is satisfied:
wherein,to representAndthe cross-over-cross-over ratio of (c),is prepared by reacting withOf the same action class with the largest cross-over ratioThe probability score is calculated based on the probability scores,representing a probability score, the class, e.g., the intersection ratio of images a and B, IOU (a, B) may be as shown in figure 1,
where area (A) is represented as the area of image A, and area (A) ∩ area (B) is the area where the images intersect.
B2, connecting the detection frames between the RGB image sequences of the video to form a tube according to the confidence score of each action type of the detection frame output by each RGB yolo v3 model obtained by fusion:
initializing a tube, and initializing the tube by using a detection frame of a first frame image in the RGB image sequence of the video, for example, if n detection frames are generated for the first frame image in the RGB image sequence of the video, the number of tubes of a certain action category of the first frame image in the RGB image sequence of the video is the following:
ncategories(1)=n;
The following operations are performed for all action categories, respectively:
s1, matching each tube and the detection frame generated by t frame, firstly traversing tube belonging to the same action type, if there are n tubes in the action type, calculating the average value of confidence of each frame of the tube for each tube as the value of the tube, and arranging the values of the n tubes in descending order to form listCategoriesWhen determining the action category for each tube, a list I ═ l is definedt-k+1…ltUsed to determine the action category of tube, list I ═ lt-k+1…ltThe action category of the k frame after tube is stored;
s2, traversing listCategoriesAnd in t framesi is 1 … n, and the one satisfying the following conditions is selectedAddition to tube:
traverse listCategoriesAnd selects t frames and tube of the same action categoryPerforming a match ifIf the intersection ratio with the detection frame in the last frame image of tube is larger than the threshold value d, the detection frame is processedAdd to queue H _ listCategoriesPerforming the following steps;
if it is notPick H _ listCategoriesWith highest confidence level in the middleAdded to the tube and traversed the t frame againWhen i is 1 … n, the confidence coefficient is highest
If it is notThen the tube does not add anythingAnd remains unchanged if no new frame tube is added for consecutive k framesTerminating the tube;
if t frames have not been matchedIs marked asThen go through all tube to find outAnd the intersection ratio of the last frame of all the tube is selected, the intersection ratio is larger than a threshold k, the tube with the maximum intersection ratio is marked as T*Handle barAdded to the tube, T*The following formula is satisfied:
if it is notThen If it is notThenTiIs the ithtube,Ti(t-1) the t-1 th frame of the ith tube;
if the t-th frame still has the detection frame which is not matched, generating a new tube by taking the detection frame as a starting point, and initializing the tube by taking the detection frame as a first frame image of the tube;
s3, matching all tubeThen, the action category list I of the k-frame after each tube is updated to { l ═ lt-k+1…ltIn which ltFor the action type of t-th frame of tube, update action type L of each tube, and count action type I of k-th frame of each tube as { L }t-k+1…ltAnd taking the most action types as the action types L of the tube, and satisfying the following formula:
if l isiC, then g (l)iC) 1; if l isiNot equal to c, then g (l)iC) is 0, c is a certain action category, i.e. the statistic I is { l ═t-k+1…ltThe action type with the largest number is the action type of the tube.
Fig. 2 (a) shows an RGB image sequence of a video; (b) in the expression optical flow algorithm testing stage, a pyramid Lucas-Kananden optical flow method in OpenCV is adopted to extract sparse optical flow images, and in the training stage, dense optical flow images are extracted; (c) obtaining a sparse optical flow image; (d) one is an RGB yolov3 model trained by using RGB image sequences of videos, and the other is an optical flow yolo v3 model trained by using optical flow sequences; (e) representing the detection result output by the RGB yolo v3 model; (f) a detection result representing the optical flow yolo v3 model; (g) results output by the two models are fused to obtain characteristics with better robustness; (h) indicating that the detection frames between the RGB image sequences of the video are connected as tubes using the features obtained by the fusion.
FIG. 4(a) is an image in an RGB image sequence of a video; (b) an optical flow image corresponding to an image in an RGB image sequence representing a video; (c) the detection result is output after the images in the RGB image sequence of the video are processed by an RGB yolo v3 model; (d) the detection result which represents the output of the optical flow image after being processed by the optical flow yolo v3 model;
a sequence of consecutive images in the video of fig. 5; (a) representing images in an RGB image sequence of equally spaced captured video; (b) an optical flow sequence corresponding to an image in an RGB image sequence representing a video; (c) the detection result is output after the images in the RGB image sequence of the video are processed by an RGB yolo v3 model; (d) representing a detection result output after the optical flow sequence is processed by an optical flow yolo v3 model; (e) represents tube obtained by fusing the detection results of (c) and (d);
in step a1, the database of the labeled specific Action is the Action Detection data set of UCF 101.
In the step a2, a dense optical flow of the video sequence in the training data is calculated by using a calcoptical flow farneback function in the OpenCV library.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (3)

1. A multi-region real-time action detection method based on a surveillance video is characterized by comprising the following steps:
a model training stage:
a1, acquiring training data: a database of labeled specific actions;
a2, calculating dense optical flows of video sequences in training data, acquiring optical flow sequences of the video sequences in the training data, and labeling optical flow images in the optical flow sequences;
a3, respectively training a target detection model yolo v3 by utilizing a video sequence and an optical flow sequence in training data to respectively obtain an RGB yolo v3 model and an optical flow yolo v3 model;
and (3) a testing stage:
b1, extracting a sparse optical flow image sequence of the video by a pyramid Lucas-Kanande optical flow method, then respectively sending the RGB image sequence and the sparse optical flow image sequence of the video into an RGB yolo v3 model and an optical flow yolo v3 model, and extracting the first n detection frames of all action categories by using a non-maximum suppression method through a series of detection frames output by the RGByolo v3 modelEach detection box has a label of an action category and a probability score belonging to the actionA series of detection frames output by the optical flow yolo v3 model uses a non-maximum suppression method to extract the first n detection frames of all action classes Each detection box has a label of an action category and a probability score belonging to the action Respectively traversing the detection frames output by the RGB yolo v3 model and the optical flow yolo v3 model, and the detection frame output by each RGB yolo v3 modelDetection frame of the same action category as that output by optical flow yolo v3 modelMaking a cross-over ratio, and setting a detection frame of the same action type output by the optical flow yolo v3 model corresponding to the maximum cross-over ratio as a detection frame of the same action typeIf the maximum intersection ratio is larger than the threshold value K, fusing the probability scores of the detection frames output by the corresponding two RGB yolo v3 models and the optical flow yolo v3 models intoDetection frame output as the RGB yolo v3 modelThe degree of confidence of (a) is,the following formula is satisfied:
wherein,to representAndthe cross-over-cross-over ratio of (c),is prepared by reacting withOf the same action class with the largest cross-over ratioA probability score;
b2, connecting the detection frames between the RGB image sequences of the video to form a tube according to the confidence score of each action type of the detection frame output by each RGB yolo v3 model obtained by fusion:
initializing a tube, and initializing the tube by using a detection frame of a first frame image in an RGB image sequence of a video;
the following operations are performed for all action categories, respectively:
s1, matching each tube and the detection frame generated by t frame, firstly traversing tube belonging to the same action type, if there are n tubes in the action type, calculating the average value of confidence of each frame of the tube for each tube as the value of the tube, and arranging the values of the n tubes in descending order to form listCategoriesWhen determining the action category of each tube, a list I ═ l is definedt-k+1…ltUsed to determine the action category of tube, list I ═ lt-k+1…ltThe action category of the k frame after tube is stored;
s2, traversing listCategoriesAnd in t framesFrom which one satisfying the following conditions is selectedAddition to tube:
traverse listCategoriesAnd selects t frames and tube of the same action categoryPerforming a match ifIf the intersection ratio with the detection frame in the last frame image of tube is larger than the threshold value d, the detection frame is processedAdd to queue H _ listCategoriesPerforming the following steps;
if it is notPick H _ listCategoriesWith highest confidence level in the middleAdded to the tube and traversed the t frame againThen, the one with the highest confidence coefficient is eliminated
If it is notThen the tube does not add anythingAnd remains unchanged if no new frame tube is added for consecutive k framesTerminating the tube;
if t frames have not been matchedIs marked asThen go through all tube to find outAnd the cross-over ratio of the last frame of all tube is selectedTube greater than threshold k and having the largest cross-over ratio is denoted as T*Handle barAdded to the tube, T*The following formula is satisfied:
if it is notThen If it is notThenTiIs the ith tube, Ti(t-1) the t-1 th frame of the ith tube;
if the t-th frame still has the detection frame which is not matched, generating a new tube by taking the detection frame as a starting point, and initializing the tube by taking the detection frame as a first frame image of the tube;
s3, matching all tubeThen, the action category list I of the k-frame after each tube is updated to { l ═ lt-k+1…ltIn which ltFor the action type of t-th frame of tube, update action type L of each tube, and count action type I of k-th frame of each tube as { L }t-k+1…ltAmong them, the most motion class is the motion class L of the tube, and the following formula is satisfiedFormula (II):
if l isiC, then g (l)iC) 1; if l isiNot equal to c, then g (l)iC) is 0, c is a certain action category, i.e. the statistic I is { l ═t-k+1…ltThe action type with the largest number is the action type of the tube.
2. The multi-region real-time action detection method based on surveillance video according to claim 1, characterized in that: in step a1, the database of the labeled specific Action is the Action Detection data set of UCF 101.
3. The multi-region real-time action detection method based on surveillance video according to claim 1, characterized in that: in the step a2, a dense optical flow of the video sequence in the training data is calculated by using a calcoptical flow farneback function in the OpenCV library.
CN201810534453.0A 2018-05-30 2018-05-30 Multi-region real-time action detection method based on monitoring video Expired - Fee Related CN108764148B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810534453.0A CN108764148B (en) 2018-05-30 2018-05-30 Multi-region real-time action detection method based on monitoring video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810534453.0A CN108764148B (en) 2018-05-30 2018-05-30 Multi-region real-time action detection method based on monitoring video

Publications (2)

Publication Number Publication Date
CN108764148A true CN108764148A (en) 2018-11-06
CN108764148B CN108764148B (en) 2020-03-10

Family

ID=64003645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810534453.0A Expired - Fee Related CN108764148B (en) 2018-05-30 2018-05-30 Multi-region real-time action detection method based on monitoring video

Country Status (1)

Country Link
CN (1) CN108764148B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447014A (en) * 2018-11-07 2019-03-08 东南大学-无锡集成电路技术研究所 A kind of online behavioral value method of video based on binary channels convolutional neural networks
CN109711344A (en) * 2018-12-27 2019-05-03 东北大学 A kind of intelligentized specific exceptions behavioral value method in front end
CN109740454A (en) * 2018-12-19 2019-05-10 贵州大学 A kind of human body posture recognition methods based on YOLO-V3
CN109886165A (en) * 2019-01-23 2019-06-14 中国科学院重庆绿色智能技术研究院 A kind of action video extraction and classification method based on moving object detection
CN111126153A (en) * 2019-11-25 2020-05-08 北京锐安科技有限公司 Safety monitoring method, system, server and storage medium based on deep learning
WO2020114120A1 (en) * 2018-12-07 2020-06-11 深圳光启空间技术有限公司 Method for identifying vehicle information, system, memory device, and processor
CN111353452A (en) * 2020-03-06 2020-06-30 国网湖南省电力有限公司 Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images
CN114049396A (en) * 2021-11-05 2022-02-15 北京百度网讯科技有限公司 Method and device for marking training image and tracking target, electronic equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140254882A1 (en) * 2013-03-11 2014-09-11 Adobe Systems Incorporated Optical Flow with Nearest Neighbor Field Fusion
CN105512618A (en) * 2015-11-27 2016-04-20 北京航空航天大学 Video tracking method
CN106709461A (en) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 Video based behavior recognition method and device
CN107316007A (en) * 2017-06-07 2017-11-03 浙江捷尚视觉科技股份有限公司 A kind of monitoring image multiclass object detection and recognition methods based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140254882A1 (en) * 2013-03-11 2014-09-11 Adobe Systems Incorporated Optical Flow with Nearest Neighbor Field Fusion
CN105512618A (en) * 2015-11-27 2016-04-20 北京航空航天大学 Video tracking method
CN106709461A (en) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 Video based behavior recognition method and device
CN107316007A (en) * 2017-06-07 2017-11-03 浙江捷尚视觉科技股份有限公司 A kind of monitoring image multiclass object detection and recognition methods based on deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALAAELDIN EL-NOUBY 等: "Real-Time End-to-End Action Detection with Two-Stream Networks", 《ARXIV》 *
CHRISTOPH FEICHTENHOFER 等: "Detect to Track and Track to Detect", 《ARXIV》 *
PHILIPPE WEINZAEPFEL 等: "Learning to track for spatio-temporal action localization", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *
黄铁军 等: "多媒体技术研究:2013-面向智能视频监控的视觉感知与处理", 《中国图象图形学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447014A (en) * 2018-11-07 2019-03-08 东南大学-无锡集成电路技术研究所 A kind of online behavioral value method of video based on binary channels convolutional neural networks
WO2020114120A1 (en) * 2018-12-07 2020-06-11 深圳光启空间技术有限公司 Method for identifying vehicle information, system, memory device, and processor
CN111291779A (en) * 2018-12-07 2020-06-16 深圳光启空间技术有限公司 Vehicle information identification method and system, memory and processor
CN111291779B (en) * 2018-12-07 2024-09-13 深圳光启空间技术有限公司 Vehicle information identification method, system, memory and processor
CN109740454A (en) * 2018-12-19 2019-05-10 贵州大学 A kind of human body posture recognition methods based on YOLO-V3
CN109711344A (en) * 2018-12-27 2019-05-03 东北大学 A kind of intelligentized specific exceptions behavioral value method in front end
CN109886165A (en) * 2019-01-23 2019-06-14 中国科学院重庆绿色智能技术研究院 A kind of action video extraction and classification method based on moving object detection
CN111126153A (en) * 2019-11-25 2020-05-08 北京锐安科技有限公司 Safety monitoring method, system, server and storage medium based on deep learning
CN111353452A (en) * 2020-03-06 2020-06-30 国网湖南省电力有限公司 Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images
CN114049396A (en) * 2021-11-05 2022-02-15 北京百度网讯科技有限公司 Method and device for marking training image and tracking target, electronic equipment and medium

Also Published As

Publication number Publication date
CN108764148B (en) 2020-03-10

Similar Documents

Publication Publication Date Title
CN108764148B (en) Multi-region real-time action detection method based on monitoring video
Wen et al. Detection, tracking, and counting meets drones in crowds: A benchmark
Lim et al. Background subtraction using encoder-decoder structured convolutional neural network
CN107203753B (en) Action recognition method based on fuzzy neural network and graph model reasoning
Xu et al. Two-stream region convolutional 3D network for temporal activity detection
CN110555387B (en) Behavior identification method based on space-time volume of local joint point track in skeleton sequence
CN110163127A (en) A kind of video object Activity recognition method from thick to thin
Zhang et al. Multi-instance multi-label action recognition and localization based on spatio-temporal pre-trimming for untrimmed videos
CN107818307B (en) Multi-label video event detection method based on LSTM network
Ajith et al. Unsupervised segmentation of fire and smoke from infra-red videos
McNally et al. Action recognition using deep convolutional neural networks and compressed spatio-temporal pose encodings
Tyagi et al. A review of deep learning techniques for crowd behavior analysis
CN112560618A (en) Behavior classification method based on skeleton and video feature fusion
Zhou et al. A study on attention-based LSTM for abnormal behavior recognition with variable pooling
CN111753795A (en) Action recognition method and device, electronic equipment and storage medium
Pervaiz et al. Artificial neural network for human object interaction system over Aerial images
CN111291785A (en) Target detection method, device, equipment and storage medium
CN106980823A (en) A kind of action identification method based on interframe self similarity
CN112487926A (en) Scenic spot feeding behavior identification method based on space-time diagram convolutional network
Hassan et al. Enhanced dynamic sign language recognition using slowfast networks
CN115798055B (en) Violent behavior detection method based on cornersort tracking algorithm
Wang Recognition and Analysis of Behavior Features of School-Age Children Based on Video Image Processing.
Sharma et al. A survey on moving object detection methods in video surveillance
Belmouhcine et al. Robust deep simple online real-time tracking
Khairy Statistical features versus deep learning representation for suspicious human activity recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200310