CN111291699B - Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection - Google Patents
Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection Download PDFInfo
- Publication number
- CN111291699B CN111291699B CN202010103140.7A CN202010103140A CN111291699B CN 111291699 B CN111291699 B CN 111291699B CN 202010103140 A CN202010103140 A CN 202010103140A CN 111291699 B CN111291699 B CN 111291699B
- Authority
- CN
- China
- Prior art keywords
- video
- behavior
- abnormal
- time sequence
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 78
- 238000012544 monitoring process Methods 0.000 title claims abstract description 70
- 230000009471 action Effects 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 title claims abstract description 48
- 206010000117 Abnormal behaviour Diseases 0.000 claims abstract description 92
- 230000006399 behavior Effects 0.000 claims abstract description 90
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000033001 locomotion Effects 0.000 claims abstract description 15
- 238000013526 transfer learning Methods 0.000 claims abstract description 11
- 239000012634 fragment Substances 0.000 claims abstract 2
- 238000000605 extraction Methods 0.000 claims description 45
- 230000002159 abnormal effect Effects 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 25
- 230000009466 transformation Effects 0.000 claims description 14
- 230000002123 temporal effect Effects 0.000 claims description 13
- 210000002569 neuron Anatomy 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000010276 construction Methods 0.000 abstract description 3
- 238000005520 cutting process Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 238000010923 batch production Methods 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000009192 sprinting Effects 0.000 description 2
- 210000003813 thumb Anatomy 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000012850 discrimination method Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Primary Health Care (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Business, Economics & Management (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Evolutionary Biology (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Probability & Statistics with Applications (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Computer Security & Cryptography (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
Abstract
A transformer substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection is characterized in that a priori knowledge is utilized to conduct autonomous acquisition, processing and construction of a transformer substation personnel abnormal behavior monitoring video data set, and a new transformer substation abnormal behavior detection video data set is introduced; according to the invention, the time sequence information is acquired through the video motion detection model based on the transfer learning, so that the accurate positioning of the time sequence motion of the monitoring video can be realized, the time for starting and ending the motion of a worker is found in a section of video which is not edited, and the motion is classified. Meanwhile, the video clip of the person specific behavior obtained by video motion detection is obtained. According to the method, the video anomaly detection technology is utilized, multiple examples are adopted for learning and training under weak supervision, the obtained model can judge whether abnormal behaviors exist in the fragments, the abnormal behaviors and the occurring time sequence position can be accurately detected, and the utilization value of video monitoring of the transformer substation and the anomaly detection accuracy are improved.
Description
Technical Field
The invention discloses a substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection, and belongs to the technical field of intelligent management of power grids.
Background
In the current power system, the operation and maintenance of power transmission and transformation are particularly important, and are directly related to the normal operation of the power system and the production and living electricity of society. During the operation of the power system, the failure of some devices may cause the power system to crash, and the error of the related personnel may cause the problem of the power system. In many power transformation scenes, safety accidents caused by abnormal operation of workers often occur, and the accidents bring great danger to the operators and seriously harm the social production and living order. Therefore, security management and security monitoring of the power transformation working environment are receiving more and more attention and attention.
The video safety monitoring can realize real-time monitoring and centralized management, and is an important mode for guaranteeing the life safety of the working personnel of the transformer substation and the normal operation of the power transmission and transformation equipment. The coverage rate is high, the stability is good, and the situation of the power transformation working scene can be recorded all-around and all-weather. Although video surveillance technology has been developed and widely used in power transformation scenes, various shortcomings still exist. Generally, video monitoring only simply records video information of the working scene condition of the power transformation, only has shooting and storing functions, and needs to set a specially-assigned person for 24-hour uninterrupted monitoring in subsequent judgment and processing, so that waste of human resources is caused. And the transformer room is monitored all day long, and data information volume is big, the useless information is many, only relies on monitoring personnel's shift naked eye to monitor the discernment efficiency extremely low. Therefore, it is very necessary to research a method for detecting abnormal behaviors of power transformation scene personnel based on video monitoring.
Chinese patent document CN110084151A discloses a video abnormal behavior discrimination method based on non-local network deep learning, belonging to the field of computer vision, intelligence and multimedia signal processing. The method uses the thought of multi-example learning to construct a training set, and defines and marks positive and negative packets and examples of the video data. The method comprises the steps of extracting the characteristics of a video sample by adopting a non-local network, taking an I3D network with a residual structure as a convolution filter for extracting space-time information, and fusing long-distance dependence information by using a non-local network block so as to meet the time sequence and space requirements of video characteristic extraction. After the characteristics are obtained, a regression task is established and a model is trained through a weak supervision learning method. The invention can distinguish the classes which are not marked, and is suitable for the conditions that the normal samples of the abnormal detection task are rare and the diversity in the classes is high.
The patent document CN110084151A adopts non-local network deep learning to judge the abnormal behavior of the video, and compared with the patent document, the invention adopts the improved C3D characteristic extraction network to extract the characteristics of the monitoring video sequence frame of the transformer substation; constructing a time sequence candidate area extraction network, and extracting candidate time sequence segments possibly having abnormal behaviors of substation personnel from a monitoring long video; constructing a behavior classification network, and classifying the extracted substation personnel behavior video segments; and constructing an abnormal behavior detection network, and detecting the abnormal behavior of the candidate time sequence segments obtained by the time sequence behavior classification network.
The method of patent document CN110084151A cannot be applied to long monitoring videos in a power transformation scene, but in the invention, a multi-network fusion model can be constructed based on a 3D feature extraction network, and the abnormal behavior of the long monitoring video is determined by performing time sequence candidate region extraction, video segment time sequence behavior classification, and abnormal behavior detection on the long monitoring video.
Patent document CN110084151A obtains examples required for multi-example learning by a method of average cutting of a video into 8 segments, but the present invention designs a frame clustering algorithm based on spatio-temporal continuity to segment a video, and segment each training video into 32 segments containing a single complete action, and these segments are used as examples in MIL, so as to achieve accurate video segment cutting and content evaluation.
In summary, the prior art still has many technical deficiencies, and is difficult to be applied to a substation scene to identify specific behaviors of workers.
Disclosure of Invention
Aiming at the defects of the prior art, the invention discloses a substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection.
The technical problems to be solved by the invention are as follows:
(1) as the transformer substation monitoring video data without an open source on the current network needs to be automatically collected, and a new transformer substation abnormal behavior detection video data set is constructed by carrying out time sequence classification and abnormal behavior marking on the actions of the personnel. Under the conditions that the abnormal behaviors of the transformer substation are low in occurrence frequency and data collection and marking are difficult, how to construct a proper abnormal behavior detection training data set of transformer substation personnel is an important problem to be solved at present.
(2) Due to the fact that the frequency of the abnormal events of the transformer substation is low, abnormal behavior monitoring video data of transformer substation personnel are not easy to collect, and the number of samples of the independently constructed behavior detection data set is limited. Under the condition of insufficient data, how to construct an efficient 3D convolution feature extraction network is a key problem needing to be researched, so that a model can fully mine features of video sequence frames through a small amount of substation monitoring video data.
(3) Because the frame images need to be combined with the time sequence information in the time sequence action detection, the action time span of the personnel behaviors in the monitoring scene of the transformer substation is large, and the time sequence action boundary is fuzzy. How to achieve high quality time-series segment cutting and accurate motion classification is another important issue.
(4) In the abnormal behavior detection process, it takes time to accurately mark the time position of each abnormal behavior in a video, meanwhile, the rarity of abnormal events causes that positive samples in training are far less than negative samples, and in a substation monitoring scene, the abnormal events are complex and diverse regardless of normal events or abnormal events, and the diversity in categories is high. How to solve the problems and realize accurate abnormal behavior detection of the transformer substation personnel is a key problem of the invention.
Summary of the invention:
the invention aims to realize automatic identification of abnormal behaviors of transformer substation personnel by utilizing monitoring video information of a transformer substation and through a video time sequence action positioning and abnormality detection technology based on 3D convolution.
Aiming at the characteristic of abnormal behavior of personnel under the video monitoring of the transformer substation, the invention utilizes the priori knowledge to carry out the autonomous acquisition, processing and construction of the monitoring video data set of the abnormal behavior of the personnel of the transformer substation, and introduces a new video data set for detecting the abnormal behavior of the transformer substation. The invention also obtains the time sequence information of the video data obtained by monitoring shooting through the video motion detection model based on the transfer learning, and can realize the accurate positioning of the monitoring video time sequence motion, thereby finding the time of the start and the end of the motion of the staff in a section of video which is not edited and classifying the motion. Meanwhile, for the personnel specific behavior video clip obtained by video action detection, the invention utilizes a video anomaly detection technology, and trains under weak supervision by adopting Multiple Instance Learning (MIL), so that the obtained model can judge whether the clip has an abnormal behavior, thereby realizing accurate detection of the abnormal behavior and the occurring time sequence position, and improving the utilization value of the video monitoring of the transformer substation and the accuracy of anomaly detection.
The technical scheme of the invention is as follows:
a transformer substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection is characterized by comprising the following steps:
s1: the method comprises the steps of automatically acquiring, processing and constructing a monitoring video data set for abnormal behavior of personnel in the transformer substation by using priori knowledge;
s2: constructing a 3D convolution feature extraction network: performing feature extraction on the input unsegmented substation monitoring long video, and extracting feature information of a monitoring video sequence;
s3: constructing a time sequence candidate area extraction network: the method comprises the steps of extracting candidate time sequence segments which may have abnormal behaviors of substation personnel;
s4: constructing a time sequence behavior classification network: classifying and regressing the extracted substation personnel behavior video bands;
s5: constructing an abnormal behavior detection network: performing abnormal behavior detection on the candidate time sequence segments obtained by the time sequence behavior classification network of the step S4;
s6: performing end-to-end joint training on a structure consisting of S2-S4 by adopting transfer learning, and processing monitoring long videos by utilizing a model obtained by training so as to intercept simply classified behavior category video clips;
s7: a frame clustering algorithm based on space-time continuity is designed to segment videos, and abnormal behavior detection networks based on multi-instance learning are adopted to evaluate the contents in video segments, so that whether abnormal behaviors exist in the video segments or not is identified, and the types and the accurate positions of the abnormal behaviors are determined.
Preferably, the method for constructing the data set in step S1 includes:
s11: acquiring substation personnel behavior monitoring videos at different shooting angles and under background environments by using video monitoring equipment erected in a substation;
s12: carrying out time sequence behavior marking on the typical behavior category to construct a video data set for a time sequence action detection task; the time sequence action detection task is based on the time positioning and action category classification of personnel actions of the transformer substation monitoring long video; typical behaviors described herein include, but are not limited to: movement, instrument manipulation, monitoring data, etc.;
s13: labeling a video, and constructing an abnormal behavior identification video data set by marking whether the video segment has abnormal behaviors; the abnormal behavior identification task is used for detecting abnormal behaviors according to the classification based on the video clips obtained by the segmentation of the time sequence action detection module; wherein the abnormal behavior includes, but is not limited to, an illegal operation, indoor sprinting, trunk falling, etc.
Preferably, the method for constructing the 3D convolution feature extraction network in step S2 includes:
s21: C3D (3D conversation) feature extraction network is improved: replacing the normal convolution operation with a depth separable convolution; the improvement is used for greatly reducing the calculation amount and the size of the model on the premise of ensuring the precision;
s22: extracting the characteristics of the monitoring video sequence frames of the transformer substation by adopting an improved C3D characteristic extraction network:
frame sequence of input surveillance videoThe feature extraction network composed of multilayer depth separable 3D convolution layers is used for processing to obtainThe characteristic diagram of (1).
Preferably, the method for constructing the candidate timing extraction network in step S3 includes:
s31: first, C obtained in S2conv5bThe feature map is used as input to carry out candidate time sequence generation, and the time sequence segments are assumed to be uniformly distributedAnd each position in the time domain generates K candidate time sequences with different lengthsTotal sum of allA candidate time series segment;
s32: the time-sequential field is extended by a 3X3 3D convolution filter, and then the size isDown-sampling it down to the spatial dimensionObtaining a feature map of the obtained time position featuresThe 512-dimensional feature vector at each timing position is used to predict the relative shift of the center position { δ ci,δliAnd length of each Anchor ci,li},i∈{1,…,K};
S33: in the feature map CtlAdd two convolutions of 1x1x1 above, predict the confidence score that the candidate temporal segment is background or there is behavioral activity.
Preferably, in step S4, the method for constructing the time-series behavior classification network includes:
s41: performing Non-maximum Suppression (NMS) operation on the candidate time sequence segment obtained in step S3 with 0.6 as a threshold to obtain a Region of interest (RoI) candidate time sequence segment with higher quality;
s42: mapping the RoI to C obtained in step S2conv5bIn the feature map, output features of 512x1x4x4 are obtained through 3D RoI pooling operation;
s43: the output characteristics are firstly sent to a large full-connection layer for characteristic synthesis, and then the two full-connection layers are respectively classified and regressed: and carrying out substation personnel behavior category classification through a classification layer, and adjusting the starting time and the ending time of the behavior segments through a regression layer. The two full-connection layers respectively refer to a classification layer and a regression layer, the classification layer performs behavior classification, and the regression layer performs behavior time regression.
Preferably, in the step S5, the method for constructing the abnormal behavior detection network includes:
s51: a specific video clip picture size intercepted by the time sequence action detection part; the size and frame rate selection can be adjusted according to the change of the application scene, for example, the size and frame rate selection are adjusted to 360x480, and the frame rate is fixed to 32 fps;
s52: dividing each video clip into a group of unit video clips with fixed length of 1 frame, clustering the unit video clips by a K-means frame clustering algorithm based on space-time continuity, wherein each clustering result represents a complete action; finally, dividing the video into 32 groups of video segments containing single complete action;
s53: extracting the characteristics of the video clips by using the 3D convolution characteristic extraction network constructed in the step S2, and extracting C of each 16 framesconv5bAdding a full connection layer to obtain 4096-dimensional characteristics;
s54: the extracted features are input into a Multi-Layer Perceptron (MLP) composed of 3 continuous full-connected layers to score each segment, the score of the segment with the maximum abnormal score in the video is used as an abnormal score, and a final predicted abnormal value is obtained.
According to a preferred embodiment of the present invention, in step S53, 512 neurons in the first fully connected layer in MLP are activated using a Linear rectification function (strained Linear Unit, ReLU); the second full-connectivity layer is 32 neurons and the third full-connectivity layer is 1 neuron, activated using Sigmoid function.
Preferably, in the step S6, the process of training and operating the time sequence motion detection model specifically includes:
s61: in the transfer learning, firstly, THUMOS2014 data sets are used for jointly training the corresponding network modules in the steps S2-S4, parameters of the first four layers of the network are extracted by fixing the characteristics of the step S2, and then the data sets constructed in the step S1 are trained to obtain parameters of the later network structure; the design is that the substation personnel behavior video data obtained by S1 is not large in set, and a transfer learning mode is adopted to improve the detection precision and the generalization capability of the model; the thumb 2014 dataset is an open-source action identification and time-sequence action detection dataset;
s62: training the RoI obtained in the step S4 according to the proportion of 1:3 of positive and negative samples; specifically, the RoI of IoU greater than 0.5 of the true value (ground route) is taken as a positive sample, and the RoI less than 0.5 is taken as a negative sample;
s63: for the candidate time series extraction network of step S3 and the time series behavior classification network of step S4, the classification task and the regression task are optimized simultaneously:
classification task LclsWith softmax penalty, regression task LregLosses with smooth L1:
wherein N isclsAnd NregRepresenting the number of samples selected for a training run and the number of candidate time series segments for regression, λ is a loss balance parameter, set to 1, i is an index of candidate time series segments in a batch process, αiIs the likelihood of predicting a candidate time series segment as human behavior,is a group channel of the group channel,is the relative offset of the predicted temporal segment to the candidate temporal segment,the coordinate transformation between the ground route and the candidate time sequence segment is as follows:
in the candidate timing extraction sub-network (referred to as the timing candidate area extraction network) of step S3, LclsPredicting whether a candidate time sequence segment contains a person behavior, regardless of the specific behavior class, LregOptimizing the relative displacement between the candidate time sequence segment and the ground channel;
in the time-series behavior classification subnetwork (referred to as a behavior classification network) of step S4, LclsPredicting the specific personnel behavior category of the RoI, and optimizing the relative displacement between the RoI and the ground route; the four losses of the two subnets are jointly optimized;
s64: and processing the monitoring long video of the transformer substation by utilizing the model parameters obtained by training S61-S63 based on the time sequence action detection network module constructed in the steps S2-S4 so as to intercept the simply classified behavior category video clips.
Preferably, the training and operating process of the abnormal behavior detection model in step S7 includes:
s71: adopting a K-means frame clustering algorithm based on space-time continuity to divide the video segments into 32 groups of video segments containing single complete action: firstly, segmenting a video clip into a unit frame data set, and randomly selecting 32 video frames from the unit frame data set as a centroid; calculating the similarity Euclidean distance between each frame in the data set and the front and rear centroids of the time sequence position, and dividing the Euclidean distance into a set to which the centroids closer to each other belong; after all the data are grouped together, recalculating the centroid of each group until the time sequence distance between the newly calculated centroid and the original centroid is less than 8 frames;
s72: using the video segment segmentation algorithm of step S71, each training video is segmented into 32 segments containing a single complete action, the segments are examples in MIL, and each video is a package in MIL: during training, randomly selecting 10 positive example packages (abnormal behavior videos) and 10 negative example packages (normal behavior videos) as mini-batch for training;
s73: extracting the spatio-temporal feature of each example segment by using the 3D convolution feature extraction network constructed in the step S2, and performing a full connection operation to obtain a 4096-dimensional feature which is used as a feature map required by subsequent multi-example learning;
s74: scoring each segment using the MLP constructed in step S54, then selecting a segment with the largest abnormal score from the positive example packet as a potential abnormal sample, selecting a segment with the largest abnormal score from the negative example packet as a non-abnormal sample, and training the model parameters of the MLP using the above two samples, wherein the objective function is:
wherein beta isaIndicating a positive case, vaAn abnormal sample is obtained; beta is anIndicates a negative example bag, vnF is a model prediction function;
adopting a Hinge-loss function to enlarge the score difference between the positive example and the negative example, wherein the training effect is that the model outputs high scores for abnormal samples and low scores for non-abnormal samples; the Hinge-loss function is:
in a real scene of the transformer substation, abnormal behaviors usually only occupy a very small period of time, namely, the proportion of positive samples (abnormal behaviors) in a positive packet is very low, so that the scores in the positive packet should be sparse, and therefore sparse constraint is added; meanwhile, considering the temporal structure of the video, since the video segments are continuous, the abnormal scores of the adjacent segments of the video sequence should also be relatively smooth, so that a temporal smoothing constraint is added, and the loss function becomes:
to prevent model overfitting, finally add l2 regular, get the final loss function:
L(w)=l(βa,βn)+||w||F
wherein w represents a model weight;
s75: based on the abnormal behavior detection network constructed in the step S5, the behavior category video clips obtained in the step S64 are detected by using the model parameters obtained through training in the steps S72-S74, so as to identify whether the specific behavior clip has an abnormal behavior, and the category and the precise time sequence position of the abnormal behavior.
The invention has the beneficial effects that:
according to the method, aiming at the characteristic of abnormal behavior of personnel under the video monitoring of the transformer substation, the prior knowledge is utilized to perform autonomous collection, processing and construction of the abnormal behavior monitoring video data set of the personnel of the transformer substation, and the blank of video data in the field of abnormal behavior detection of the transformer substation is filled; the 3D convolution feature extraction network based on transfer learning is used, so that the model can fully mine the features of the video sequence frame under the condition that the data volume of the substation personnel behavior monitoring video is limited, the algorithm operation efficiency is improved, and the precision and generalization capability of the model are enhanced; the invention adopts an end-to-end joint training mode to train the time sequence action detection module, and realizes high-quality time sequence segment cutting and accurate action classification by fully fusing and utilizing different network structures such as feature extraction, time sequence detection, action classification and the like; meanwhile, the abnormal detection network is trained by using multi-example learning of weak supervision, the obtained model can accurately judge whether the video clip has abnormal behaviors or not, and simultaneously, the abnormal behavior category and the occurring time sequence position are accurately detected, so that the utilization value of the video monitoring of the transformer substation is improved, and the high-efficiency and high-quality abnormal behavior detection of the transformer substation personnel is realized.
Drawings
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of the abnormal behavior recognition result of the present invention;
fig. 3 is a schematic diagram of the normal behavior recognition result of the present invention.
Detailed Description
The invention is described in detail below with reference to the following examples and the accompanying drawings of the specification, but is not limited thereto.
Examples of the following,
As shown in fig. 1.
A transformer substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection comprises the following steps:
s1: the method comprises the steps of automatically acquiring, processing and constructing a monitoring video data set for abnormal behavior of personnel in the transformer substation by using priori knowledge;
s2: constructing a 3D convolution feature extraction network: performing feature extraction on the input un-segmented substation monitoring long video, and extracting feature information of a monitoring video sequence;
s3: constructing a time sequence candidate area extraction network: the method comprises the steps of extracting candidate time sequence segments which may have abnormal behaviors of substation personnel;
s4: constructing a time sequence behavior classification network: classifying and regressing the extracted substation personnel behavior video bands;
s5: constructing an abnormal behavior detection network: performing abnormal behavior detection on the candidate time sequence segments obtained by the time sequence behavior classification network of the step S4;
s6: performing end-to-end joint training on a structure consisting of S2-S4 by adopting transfer learning, and processing monitoring long videos by utilizing a model obtained by training so as to intercept simply classified behavior category video clips;
s7: a frame clustering algorithm based on space-time continuity is designed to segment videos, and abnormal behavior detection networks based on multi-instance learning are adopted to evaluate the contents in video segments, so that whether abnormal behaviors exist in the video segments or not is identified, and the types and the accurate positions of the abnormal behaviors are determined.
The method for constructing the data set in step S1 includes:
s11: acquiring substation personnel behavior monitoring videos at different shooting angles and under background environments by using video monitoring equipment erected in a substation;
s12: carrying out time sequence behavior marking on the typical behavior category to construct a video data set for a time sequence action detection task; the time sequence action detection task is based on the time positioning and action category classification of personnel actions of the transformer substation monitoring long video; typical behaviors described herein include, but are not limited to: movement, instrument manipulation, monitoring data, etc.;
s13: labeling a video, and constructing an abnormal behavior identification video data set by marking whether the video segment has abnormal behaviors; the abnormal behavior identification task is used for detecting abnormal behaviors according to the classification based on the video clips obtained by the segmentation of the time sequence action detection module; wherein the abnormal behavior includes, but is not limited to, an illegal operation, indoor sprinting, trunk falling, etc.
The method for constructing the 3D convolution feature extraction network in step S2 includes:
s21: C3D (3D conversation) feature extraction network is improved: replacing the normal convolution operation with a depth separable convolution; the improvement is used for greatly reducing the calculation amount and the size of the model on the premise of ensuring the precision;
s22: extracting the characteristics of the monitoring video sequence frames of the transformer substation by adopting an improved C3D characteristic extraction network:
frame sequence of input surveillance videoThe feature extraction network composed of multilayer depth separable 3D convolution layers is used for processing to obtainThe characteristic diagram of (1).
The method for constructing the candidate timing extraction network in step S3 includes:
s31: firstly, C obtained in S2conv5bThe feature map is used as input to carry out candidate time sequence generation, and the time sequence segments are assumed to be uniformly distributedAnd each position in the time domain generates K candidate sequences of different lengths, then a total common sequence is generatedA candidate time series segment;
s32: extending the time series field by using a 3X3 3D convolution filterBy a size ofDown-sampling it down to the spatial dimensionObtaining a feature map of the obtained time-position featuresThe 512-dimensional feature vector at each time sequence position is used to predict the relative shift of the center position { deltac }i,δliAnd length of each Anchor ci,li},i∈{1,…,K};
S33: in the feature map CtlAdd two convolutions of 1x1x1 above, predict the confidence score that the candidate temporal segment is background or there is behavioral activity.
The method for constructing the time-series behavior classification network in step S4 includes:
s41: performing Non-maximum Suppression (NMS) operation on the candidate time sequence segment obtained in step S3 with 0.6 as a threshold to obtain a Region of interest (RoI) candidate time sequence segment with higher quality;
s42: mapping the RoI to C obtained in step S2conv5bIn the feature map, the output features of 512x1x4x4 are obtained through 3D RoI pooling operation;
s43: the output characteristics are firstly sent to a large full-connection layer for characteristic synthesis, and then the two full-connection layers are respectively classified and regressed: and carrying out substation personnel behavior category classification through a classification layer, and adjusting the starting time and the ending time of the behavior segments through a regression layer. The two full-connection layers respectively refer to a classification layer and a regression layer, the classification layer performs behavior classification, and the regression layer performs behavior time regression.
The method for constructing the abnormal behavior detection network in the step S5 includes:
s51: a specific video clip picture size intercepted by the time sequence action detection part; the size and frame rate selection can be adjusted according to the change of the application scene, for example, the size and frame rate selection are adjusted to 360x480, and the frame rate is fixed to 32 fps;
s52: dividing each video clip into a group of unit video clips with fixed length of 1 frame, clustering the unit video clips by a K-means frame clustering algorithm based on space-time continuity, wherein each clustering result represents a complete action; finally, dividing the video into 32 groups of video segments containing single complete action;
s53: extracting the characteristics of the video clips by using the 3D convolution characteristic extraction network constructed in the step S2, and extracting C of each 16 framesconv5bAdding a full connection layer to obtain 4096-dimensional characteristics;
s54: the extracted features are input into a Multi-Layer Perceptron (MLP) composed of 3 continuous full-connected layers to score each segment, the score of the segment with the maximum abnormal score in the video is used as an abnormal score, and a final predicted abnormal value is obtained.
In step S53, the first fully-connected layer in MLP has 512 neurons, and is activated using a Linear rectification function (ReLU); the second full-connectivity layer is 32 neurons and the third full-connectivity layer is 1 neuron, activated using Sigmoid function.
The training and running process of the time sequence motion detection model in the step S6 specifically includes:
s61: in the transfer learning, firstly, THUMOS2014 data sets are used for jointly training the corresponding network modules in the steps S2-S4, parameters of the first four layers of the network are extracted by fixing the characteristics of the step S2, and then the data sets constructed in the step S1 are trained to obtain parameters of the later network structure; the design is that the substation personnel behavior video data obtained by S1 is not large in set, and a transfer learning mode is adopted to improve the detection precision and the generalization capability of the model; the thumb 2014 dataset is an open-source action identification and time-sequence action detection dataset;
s62: training the RoI obtained in the step S4 according to the proportion of 1:3 of positive and negative samples; specifically, the RoI of IoU greater than 0.5 of the true value (ground route) is taken as a positive sample, and the RoI less than 0.5 is taken as a negative sample;
s63: for the candidate time series extraction network of step S3 and the time series behavior classification network of step S4, the classification task and the regression task are optimized simultaneously:
classification task LclsRegression task L with softmax lossregLosses with smooth L1:
wherein N isclsAnd NregRepresenting the number of samples selected for a training run and the number of candidate time series segments for regression, λ is a loss balance parameter, set to 1, i is an index of candidate time series segments in a batch process, αiIs the likelihood of predicting a candidate time series segment as human behavior,is a group channel of the group channel,is the relative offset of the predicted temporal segment to the candidate temporal segment,the coordinate transformation between the ground route and the candidate time sequence segment is as follows:
in the candidate timing extraction sub-network (referred to as the timing candidate area extraction network) of step S3, LclsPredicting whether a candidate time sequence segment contains a person behavior, regardless of the specific behavior class, LregOptimizing the relative displacement between the candidate time sequence segment and the ground channel;
in the time-series behavior classification subnetwork (referred to as a behavior classification network) of step S4, LclsPredicting the specific personnel behavior category of the RoI, and optimizing the relative position between the RoI and the ground routeMoving; the four losses of the two subnets are jointly optimized;
s64: and processing the monitoring long video of the transformer substation by utilizing the model parameters obtained by training S61-S63 based on the time sequence action detection network module constructed in the steps S2-S4 so as to intercept the simply classified behavior category video clips.
The process of training and operating the abnormal behavior detection model in step S7 includes:
s71: adopting a K-means frame clustering algorithm based on space-time continuity to divide the video segments into 32 groups of video segments containing single complete action: firstly, segmenting a video clip into a unit frame data set, and randomly selecting 32 video frames from the unit frame data set as a centroid; calculating the similarity Euclidean distance between each frame in the data set and the front and rear centroids of the time sequence position, and dividing the Euclidean distance into a set to which the centroids closer to each other belong; after all the data are grouped together, recalculating the centroid of each group until the time sequence distance between the newly calculated centroid and the original centroid is less than 8 frames;
s72: using the video segment segmentation algorithm of step S71, each training video is segmented into 32 segments containing a single complete action, the segments are examples in MIL, and each video is a package in MIL: during training, randomly selecting 10 positive example packages (abnormal behavior videos) and 10 negative example packages (normal behavior videos) as mini-batch for training;
s73: extracting the spatiotemporal feature of each example segment by using the 3D convolution feature extraction network constructed in the step S2, and performing a full-connection operation to obtain a 4096-dimensional feature as a feature map required by subsequent multi-example learning;
s74: scoring each segment using the MLP constructed in step S54, then selecting a segment with the largest abnormal score from the positive example packet as a potential abnormal sample, selecting a segment with the largest abnormal score from the negative example packet as a non-abnormal sample, and training the model parameters of the MLP using the above two samples, wherein the objective function is:
wherein, betaaRepresents a positive case, vaAn abnormal sample is obtained; beta is anIndicates a negative example bag, vnF is a model prediction function;
adopting a Hinge-loss function to enlarge the score difference between the positive example and the negative example, wherein the training effect is that the model outputs high scores for abnormal samples and low scores for non-abnormal samples; the Hinge-loss function is:
in a real scene of the transformer substation, abnormal behaviors usually only occupy a very small period of time, namely, the proportion of positive samples (abnormal behaviors) in a positive packet is very low, so that the scores in the positive packet should be sparse, and therefore sparse constraint is added; meanwhile, considering the temporal structure of the video, since the video segments are continuous, the abnormal scores of the adjacent segments of the video sequence should also be relatively smooth, so that a temporal smoothing constraint is added, and the loss function becomes:
to prevent the model from overfitting, the l2 regularization is finally added, resulting in the final loss function:
L(w)=l(βa,βn)+||w||F
wherein w represents a model weight;
s75: based on the abnormal behavior detection network constructed in the step S5, the behavior category video clips obtained in the step S64 are detected by using the model parameters obtained through training in the steps S72-S74, so as to identify whether the specific behavior clip has an abnormal behavior, and the category and the precise time sequence position of the abnormal behavior.
Application examples 1,
The method according to the embodiment is applied to identifying whether a person in a power transformation scene wears a safety helmet or not, as shown in fig. 2.
Processing the long video through time sequence action detection, acquiring action category video clips, and judging personnel actions as monitoring data; and carrying out abnormity judgment on the video clips through abnormal behavior detection, and judging whether the behaviors of the personnel are abnormal, wherein the abnormal behaviors are that the safety helmet is not worn.
Application examples 2,
When the method is applied to identifying whether a person wearing a safety helmet in a power transformation scene, as shown in fig. 3.
Processing the long video through time sequence action detection, acquiring action category video clips, and judging personnel actions as monitoring data; and carrying out abnormal judgment on the video clip through abnormal behavior detection to judge that the behavior of personnel is normal, wherein the normal behavior is to wear a safety helmet.
Claims (3)
1. A transformer substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection is characterized by comprising the following steps:
s1: the method comprises the steps of automatically acquiring, processing and constructing a monitoring video data set for abnormal behavior of personnel in the transformer substation by using priori knowledge;
s2: constructing a 3D convolution feature extraction network: performing feature extraction on the input un-segmented substation monitoring long video, and extracting feature information of a monitoring video sequence;
s3: constructing a time sequence candidate area extraction network: the method comprises the steps of extracting candidate time sequence segments which may have abnormal behaviors of substation personnel;
s4: constructing a time sequence behavior classification network: classifying and regressing the extracted substation personnel behavior video bands;
s5: constructing an abnormal behavior detection network: detecting abnormal behaviors of the behavior category video clips obtained by the time sequence behavior classification network in the step S4;
s6: performing end-to-end joint training on a structure consisting of S2-S4 by adopting transfer learning, and processing and monitoring long videos by using a model obtained by training so as to intercept simply classified behavior category video clips;
s7: designing a frame clustering algorithm based on space-time continuity to segment a video, and evaluating the content in a video clip by adopting an abnormal behavior detection network based on multi-instance learning, so as to identify whether abnormal behaviors exist in the video clip and determine the type and the accurate position of the abnormal behaviors;
the method for constructing the time sequence candidate area extraction network in step S3 includes:
s31: obtained at S2C conv5b Taking the feature map as an input to generate a candidate time sequence;
s32: the time-sequential field is extended by a 3X3 3D convolution filter, and then the size isDown-sampling it down to the spatial dimensionObtaining a characteristic diagram of the obtained time position characteristics(ii) a A512-dimensional feature vector at each timing position is used to predict the relative offset of the center positionδc i ,δl i Great and the length of each Anchorc i ,l i },i∈{l,...,K };
S33: in the feature diagramC tl Adding two convolutions of 1x1x1, and predicting the confidence score of the candidate time sequence segment as the background or the existing behavior activity;
the method for constructing the time-series behavior classification network in step S4 includes:
s41: performing non-maximum suppression operation on the candidate time sequence segment obtained in the step S3 by using 0.6 as a threshold value to obtain a candidate time sequence segment;
s42: mapping the RoI to that obtained in step S2C conv5b In the profile, the output of 512x1x4x4 is obtained by 3D RoI poolingCharacteristic;
s43: the output characteristics are firstly sent to a large full-connection layer for characteristic synthesis, and then the two full-connection layers are respectively classified and regressed: classifying the behavior categories of the substation personnel through a classification layer, and adjusting the starting time and the ending time of the behavior segments through a regression layer;
the method for constructing the abnormal behavior detection network in the step S5 includes:
s51: adjusting the size and frame rate of a specific video segment picture intercepted by the time sequence action detection part;
s52: dividing each video clip into a group of unit video clips with fixed length of 1 frame, clustering the unit video clips by a K-means frame clustering algorithm based on space-time continuity, wherein each clustering result represents a complete action;
s53: extracting the characteristics of the video clips by using the 3D convolution characteristic extraction network constructed in the step S2, and extracting each 16 framesC conv5b Adding a full connection layer to obtain 4096-dimensional characteristics;
s54: inputting the extracted features into a multilayer perceptron consisting of 3 continuous full-connected layers to score each segment, taking the score of the segment with the maximum abnormal score in the video as the abnormal score, and obtaining the final predicted abnormal value;
the method for constructing the data set in step S1 includes:
s11: acquiring monitoring videos of the behaviors of transformer substation personnel at different shooting angles and under background environments;
s12: carrying out time sequence behavior marking on the typical behavior category to construct a video data set for a time sequence action detection task; the time sequence action detection task is based on the time positioning and action category classification of personnel actions of the transformer substation monitoring long video;
s13: labeling a video, and constructing an abnormal behavior identification video data set by marking whether the video segment has abnormal behaviors; the abnormal behavior identification task is used for detecting abnormal behaviors according to the classification based on the video clips obtained by the segmentation of the time sequence action detection module;
the method for constructing the 3D convolution feature extraction network in step S2 includes:
s21: the C3D feature extraction network is improved: replacing the normal convolution operation with a depth separable convolution;
s22: extracting the characteristics of the monitoring video sequence frames of the transformer substation by adopting an improved C3D characteristic extraction network:
frame sequence of input surveillance videoThe feature extraction network composed of multilayer depth separable 3D convolution layers is used for processing to obtainA characteristic diagram of (1);
the training and running process of the time sequence motion detection model in the step S6 specifically includes:
s61: in the transfer learning, firstly, THUMOS2014 data sets are used for jointly training the corresponding network modules in the steps S2-S4, parameters of the first four layers of the network are extracted by fixing the characteristics of the step S2, and then the data sets constructed in the step S1 are trained to obtain parameters of the later network structure;
s62: training the RoI obtained in the step S4 according to the proportion of 1:3 of positive and negative samples;
s63: for the time series candidate region extraction network of step S3 and the time series behavior classification network of step S4, the classification task and the regression task are optimized at the same time:
classification taskL cls With softmax loss, regression tasksL reg Losses with smooth L1:
wherein,N cls andN reg representing the number of samples selected for one training and the number of candidate time series segments for regression, λ is a loss balance parameter,iis oneThe subscripts of the candidate time series segments in an individual batch,α i is the likelihood of predicting a candidate time series segment as human behavior,α i * is a group channel of the group channel,is the relative offset of the predicted temporal segment to the candidate temporal segment,t i * ={δc i ,δl i the coordinate transformation between the group channel and the candidate timing sequence fragment is calculated by the following formula:
s64: and processing the monitoring long video of the transformer substation by utilizing the model parameters obtained by training S61-S63 based on the time sequence action detection network module constructed in the steps S2-S4 so as to intercept the simply classified behavior category video clips.
2. The substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection according to claim 1, wherein in step S53, 512 neurons in the first full connection layer in MLP are activated by using a linear rectification function ReLU; the second full-connectivity layer is 32 neurons and the third full-connectivity layer is 1 neuron, activated using Sigmoid function.
3. The substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormal detection as claimed in claim 1, wherein the process of training and operating the abnormal behavior detection network in step S7 includes:
s71: dividing the video segments into 32 groups of video segments containing single complete action by adopting a K-means frame clustering algorithm based on space-time continuity;
s72: using the video segment segmentation algorithm of step S71, each training video is segmented into 32 segments containing a single complete action, the segments are examples in MIL, and each video is a package in MIL: during training, randomly selecting 10 positive example bags and 10 negative example bags as mini-batch for training;
s73: extracting the spatiotemporal feature of each example segment by using the 3D convolution feature extraction network constructed in the step S2, and performing a full-connection operation to obtain a 4096-dimensional feature as a feature map required by subsequent multi-example learning;
s74: scoring each segment using the MLP constructed in step S54, then selecting a segment with the largest abnormal score from the positive example packet as a potential abnormal sample, selecting a segment with the largest abnormal score from the negative example packet as a non-abnormal sample, and training the model parameters of the MLP using the above two samples, wherein the objective function is:
wherein,β athe positive example packet is shown as a positive example packet,v aan abnormal sample is obtained;β na negative example packet is shown, and,v nthe sample is a non-abnormal sample,fpredicting a function for the model;
adopting a Hinge-loss function to enlarge the score difference between the positive case and the negative case, wherein the training effect is that the model outputs high scores for abnormal samples and outputs low scores for non-abnormal samples; the Hinge-loss function is:
adding a temporal smoothing constraint, the loss function becomes:
to prevent model overfitting, finally add l2 regular, get the final loss function:
L(w)=l(β a,β n)+||w|| F
wherein,wrepresenting model weights;
s75: based on the abnormal behavior detection network constructed in the step S5, the behavior category video clips obtained in the step S64 are detected by using the model parameters obtained through training in the steps S72-S74, so as to identify whether the specific behavior clip has an abnormal behavior, and the category and the precise time sequence position of the abnormal behavior.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010103140.7A CN111291699B (en) | 2020-02-19 | 2020-02-19 | Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010103140.7A CN111291699B (en) | 2020-02-19 | 2020-02-19 | Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111291699A CN111291699A (en) | 2020-06-16 |
CN111291699B true CN111291699B (en) | 2022-06-03 |
Family
ID=71024617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010103140.7A Active CN111291699B (en) | 2020-02-19 | 2020-02-19 | Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111291699B (en) |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111985333B (en) * | 2020-07-20 | 2023-01-17 | 中国科学院信息工程研究所 | Behavior detection method based on graph structure information interaction enhancement and electronic device |
CN111626273B (en) * | 2020-07-29 | 2020-12-22 | 成都睿沿科技有限公司 | Fall behavior recognition system and method based on atomic action time sequence characteristics |
CN111914778B (en) * | 2020-08-07 | 2023-12-26 | 重庆大学 | Video behavior positioning method based on weak supervision learning |
CN111652201B (en) * | 2020-08-10 | 2020-10-27 | 中国人民解放军国防科技大学 | Video data abnormity identification method and device based on depth video event completion |
CN111709411B (en) * | 2020-08-20 | 2020-11-10 | 深兰人工智能芯片研究院(江苏)有限公司 | Video anomaly detection method and device based on semi-supervised learning |
CN112307885A (en) * | 2020-08-21 | 2021-02-02 | 北京沃东天骏信息技术有限公司 | Model construction and training method and device, and time sequence action positioning method and device |
CN112487913A (en) * | 2020-11-24 | 2021-03-12 | 北京市地铁运营有限公司运营四分公司 | Labeling method and device based on neural network and electronic equipment |
CN112434615A (en) * | 2020-11-26 | 2021-03-02 | 天津大学 | Time sequence action detection method based on Tensorflow deep learning framework |
CN112487967A (en) * | 2020-11-30 | 2021-03-12 | 电子科技大学 | Scenic spot painting behavior identification method based on three-dimensional convolution network |
CN112737121A (en) * | 2020-12-28 | 2021-04-30 | 内蒙古电力(集团)有限责任公司包头供电局 | Intelligent video monitoring, analyzing, controlling and managing system for power grid |
CN113297972B (en) * | 2021-05-25 | 2022-03-22 | 国网湖北省电力有限公司检修公司 | Transformer substation equipment defect intelligent analysis method based on data fusion deep learning |
CN113159003A (en) * | 2021-05-27 | 2021-07-23 | 中国银行股份有限公司 | Bank branch abnormity monitoring method and device |
CN113392770A (en) * | 2021-06-16 | 2021-09-14 | 国网浙江省电力有限公司电力科学研究院 | Typical violation behavior detection method and system for transformer substation operating personnel |
CN113421236B (en) * | 2021-06-17 | 2024-02-09 | 同济大学 | Deep learning-based prediction method for apparent development condition of water leakage of building wall surface |
CN113516058B (en) * | 2021-06-18 | 2024-05-24 | 北京工业大学 | Live video group abnormal activity detection method and device, electronic equipment and medium |
CN113627386A (en) * | 2021-08-30 | 2021-11-09 | 山东新一代信息产业技术研究院有限公司 | Visual video abnormity detection method |
CN114092851A (en) * | 2021-10-12 | 2022-02-25 | 甘肃欧美亚信息科技有限公司 | Monitoring video abnormal event detection method based on time sequence action detection |
CN113992894A (en) * | 2021-10-27 | 2022-01-28 | 甘肃风尚电子科技信息有限公司 | Abnormal event identification system based on monitoring video time sequence action positioning and abnormal detection |
CN114283492B (en) * | 2021-10-28 | 2024-04-26 | 平安银行股份有限公司 | Staff behavior-based work saturation analysis method, device, equipment and medium |
CN114120180B (en) * | 2021-11-12 | 2023-07-21 | 北京百度网讯科技有限公司 | Time sequence nomination generation method, device, equipment and medium |
CN114565968A (en) * | 2021-11-29 | 2022-05-31 | 杭州好学童科技有限公司 | Learning environment action and behavior identification method based on learning table |
CN116453204B (en) * | 2022-01-05 | 2024-08-13 | 腾讯科技(深圳)有限公司 | Action recognition method and device, storage medium and electronic equipment |
CN114429676B (en) * | 2022-01-27 | 2023-07-25 | 山东纬横数据科技有限公司 | Personnel identity and behavior recognition system for disinfection supply room of medical institution |
CN114612868A (en) * | 2022-02-25 | 2022-06-10 | 广东创亿源智能科技有限公司 | Training method, training device and detection method of vehicle track detection model |
CN114676739B (en) * | 2022-05-30 | 2022-08-19 | 南京邮电大学 | Method for detecting and identifying time sequence action of wireless signal based on fast-RCNN |
CN115080748B (en) * | 2022-08-16 | 2022-11-11 | 之江实验室 | Weak supervision text classification method and device based on learning with noise label |
CN115424347A (en) * | 2022-09-02 | 2022-12-02 | 重庆邮电大学 | Intelligent identification method for worker work content of barber shop |
CN115690658B (en) * | 2022-11-04 | 2023-08-08 | 四川大学 | Priori knowledge-fused semi-supervised video abnormal behavior detection method |
CN116313018B (en) * | 2023-05-18 | 2023-09-15 | 北京大学第三医院(北京大学第三临床医学院) | Emergency system and method for skiing field and near-field hospital |
CN117710832A (en) * | 2024-01-04 | 2024-03-15 | 广州智寻科技有限公司 | Intelligent identification method for power grid satellite, unmanned aerial vehicle and video monitoring image |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10834436B2 (en) * | 2015-05-27 | 2020-11-10 | Arris Enterprises Llc | Video classification using user behavior from a network digital video recorder |
CN107506740B (en) * | 2017-09-04 | 2020-03-17 | 北京航空航天大学 | Human body behavior identification method based on three-dimensional convolutional neural network and transfer learning model |
CN108399380A (en) * | 2018-02-12 | 2018-08-14 | 北京工业大学 | A kind of video actions detection method based on Three dimensional convolution and Faster RCNN |
CN108734095B (en) * | 2018-04-10 | 2022-05-20 | 南京航空航天大学 | Motion detection method based on 3D convolutional neural network |
CN110084151B (en) * | 2019-04-10 | 2023-02-28 | 东南大学 | Video abnormal behavior discrimination method based on non-local network deep learning |
CN110263728B (en) * | 2019-06-24 | 2022-08-19 | 南京邮电大学 | Abnormal behavior detection method based on improved pseudo-three-dimensional residual error neural network |
-
2020
- 2020-02-19 CN CN202010103140.7A patent/CN111291699B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111291699A (en) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111291699B (en) | Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection | |
CN108216252B (en) | Subway driver vehicle-mounted driving behavior analysis method, vehicle-mounted terminal and system | |
CN108009473A (en) | Based on goal behavior attribute video structural processing method, system and storage device | |
CN104915655A (en) | Multi-path monitor video management method and device | |
CN107133569A (en) | The many granularity mask methods of monitor video based on extensive Multi-label learning | |
CN104717468B (en) | Cluster scene intelligent monitoring method and system based on the classification of cluster track | |
CN103246896A (en) | Robust real-time vehicle detection and tracking method | |
CN110222592B (en) | Construction method of time sequence behavior detection network model based on complementary time sequence behavior proposal generation | |
CN105426820A (en) | Multi-person abnormal behavior detection method based on security monitoring video data | |
CN107230267A (en) | Intelligence In Baogang Kindergarten based on face recognition algorithms is registered method | |
CN107657232A (en) | A kind of pedestrian's intelligent identification Method and its system | |
CN107944628A (en) | A kind of accumulation mode under road network environment finds method and system | |
CN117437599B (en) | Pedestrian abnormal event detection method and system for monitoring scene | |
Regazzoni et al. | A real-time vision system for crowding monitoring | |
CN113569766A (en) | Pedestrian abnormal behavior detection method for patrol of unmanned aerial vehicle | |
CN113076825A (en) | Transformer substation worker climbing safety monitoring method | |
Jiang et al. | A deep learning framework for detecting and localizing abnormal pedestrian behaviors at grade crossings | |
CN117197713A (en) | Extraction method based on digital video monitoring system | |
Wang et al. | Deep learning and multi-modal fusion for real-time multi-object tracking: Algorithms, challenges, datasets, and comparative study | |
Katariya et al. | A pov-based highway vehicle trajectory dataset and prediction architecture | |
CN113012193B (en) | Multi-pedestrian tracking method based on deep learning | |
CN117423157A (en) | Mine abnormal video action understanding method combining migration learning and regional invasion | |
CN106960183A (en) | A kind of image pedestrian's detection algorithm that decision tree is lifted based on gradient | |
CN116311082A (en) | Wearing detection method and system based on matching of key parts and images | |
Tang et al. | Multilevel traffic state detection in traffic surveillance system using a deep residual squeeze-and-excitation network and an improved triplet loss |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |