CN111291699B

CN111291699B - Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection

Info

Publication number: CN111291699B
Application number: CN202010103140.7A
Authority: CN
Inventors: 聂礼强; 战新刚; 郑晓云; 姚一杨; 徐万龙; 尉寅玮
Original assignee: Shandong University; State Grid Zhejiang Electric Power Co Ltd; Quzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd; Zhiyang Innovation Technology Co Ltd
Current assignee: Shandong University; State Grid Zhejiang Electric Power Co Ltd; Quzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd; Zhiyang Innovation Technology Co Ltd
Priority date: 2020-02-19
Filing date: 2020-02-19
Publication date: 2022-06-03
Anticipated expiration: 2040-02-19
Also published as: CN111291699A

Abstract

A transformer substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection is characterized in that a priori knowledge is utilized to conduct autonomous acquisition, processing and construction of a transformer substation personnel abnormal behavior monitoring video data set, and a new transformer substation abnormal behavior detection video data set is introduced; according to the invention, the time sequence information is acquired through the video motion detection model based on the transfer learning, so that the accurate positioning of the time sequence motion of the monitoring video can be realized, the time for starting and ending the motion of a worker is found in a section of video which is not edited, and the motion is classified. Meanwhile, the video clip of the person specific behavior obtained by video motion detection is obtained. According to the method, the video anomaly detection technology is utilized, multiple examples are adopted for learning and training under weak supervision, the obtained model can judge whether abnormal behaviors exist in the fragments, the abnormal behaviors and the occurring time sequence position can be accurately detected, and the utilization value of video monitoring of the transformer substation and the anomaly detection accuracy are improved.

Description

Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection

Technical Field

The invention discloses a substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection, and belongs to the technical field of intelligent management of power grids.

Background

In the current power system, the operation and maintenance of power transmission and transformation are particularly important, and are directly related to the normal operation of the power system and the production and living electricity of society. During the operation of the power system, the failure of some devices may cause the power system to crash, and the error of the related personnel may cause the problem of the power system. In many power transformation scenes, safety accidents caused by abnormal operation of workers often occur, and the accidents bring great danger to the operators and seriously harm the social production and living order. Therefore, security management and security monitoring of the power transformation working environment are receiving more and more attention and attention.

The video safety monitoring can realize real-time monitoring and centralized management, and is an important mode for guaranteeing the life safety of the working personnel of the transformer substation and the normal operation of the power transmission and transformation equipment. The coverage rate is high, the stability is good, and the situation of the power transformation working scene can be recorded all-around and all-weather. Although video surveillance technology has been developed and widely used in power transformation scenes, various shortcomings still exist. Generally, video monitoring only simply records video information of the working scene condition of the power transformation, only has shooting and storing functions, and needs to set a specially-assigned person for 24-hour uninterrupted monitoring in subsequent judgment and processing, so that waste of human resources is caused. And the transformer room is monitored all day long, and data information volume is big, the useless information is many, only relies on monitoring personnel's shift naked eye to monitor the discernment efficiency extremely low. Therefore, it is very necessary to research a method for detecting abnormal behaviors of power transformation scene personnel based on video monitoring.

Chinese patent document CN110084151A discloses a video abnormal behavior discrimination method based on non-local network deep learning, belonging to the field of computer vision, intelligence and multimedia signal processing. The method uses the thought of multi-example learning to construct a training set, and defines and marks positive and negative packets and examples of the video data. The method comprises the steps of extracting the characteristics of a video sample by adopting a non-local network, taking an I3D network with a residual structure as a convolution filter for extracting space-time information, and fusing long-distance dependence information by using a non-local network block so as to meet the time sequence and space requirements of video characteristic extraction. After the characteristics are obtained, a regression task is established and a model is trained through a weak supervision learning method. The invention can distinguish the classes which are not marked, and is suitable for the conditions that the normal samples of the abnormal detection task are rare and the diversity in the classes is high.

The patent document CN110084151A adopts non-local network deep learning to judge the abnormal behavior of the video, and compared with the patent document, the invention adopts the improved C3D characteristic extraction network to extract the characteristics of the monitoring video sequence frame of the transformer substation; constructing a time sequence candidate area extraction network, and extracting candidate time sequence segments possibly having abnormal behaviors of substation personnel from a monitoring long video; constructing a behavior classification network, and classifying the extracted substation personnel behavior video segments; and constructing an abnormal behavior detection network, and detecting the abnormal behavior of the candidate time sequence segments obtained by the time sequence behavior classification network.

The method of patent document CN110084151A cannot be applied to long monitoring videos in a power transformation scene, but in the invention, a multi-network fusion model can be constructed based on a 3D feature extraction network, and the abnormal behavior of the long monitoring video is determined by performing time sequence candidate region extraction, video segment time sequence behavior classification, and abnormal behavior detection on the long monitoring video.

Patent document CN110084151A obtains examples required for multi-example learning by a method of average cutting of a video into 8 segments, but the present invention designs a frame clustering algorithm based on spatio-temporal continuity to segment a video, and segment each training video into 32 segments containing a single complete action, and these segments are used as examples in MIL, so as to achieve accurate video segment cutting and content evaluation.

In summary, the prior art still has many technical deficiencies, and is difficult to be applied to a substation scene to identify specific behaviors of workers.

Disclosure of Invention

Aiming at the defects of the prior art, the invention discloses a substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection.

The technical problems to be solved by the invention are as follows:

(1) as the transformer substation monitoring video data without an open source on the current network needs to be automatically collected, and a new transformer substation abnormal behavior detection video data set is constructed by carrying out time sequence classification and abnormal behavior marking on the actions of the personnel. Under the conditions that the abnormal behaviors of the transformer substation are low in occurrence frequency and data collection and marking are difficult, how to construct a proper abnormal behavior detection training data set of transformer substation personnel is an important problem to be solved at present.

(2) Due to the fact that the frequency of the abnormal events of the transformer substation is low, abnormal behavior monitoring video data of transformer substation personnel are not easy to collect, and the number of samples of the independently constructed behavior detection data set is limited. Under the condition of insufficient data, how to construct an efficient 3D convolution feature extraction network is a key problem needing to be researched, so that a model can fully mine features of video sequence frames through a small amount of substation monitoring video data.

(3) Because the frame images need to be combined with the time sequence information in the time sequence action detection, the action time span of the personnel behaviors in the monitoring scene of the transformer substation is large, and the time sequence action boundary is fuzzy. How to achieve high quality time-series segment cutting and accurate motion classification is another important issue.

(4) In the abnormal behavior detection process, it takes time to accurately mark the time position of each abnormal behavior in a video, meanwhile, the rarity of abnormal events causes that positive samples in training are far less than negative samples, and in a substation monitoring scene, the abnormal events are complex and diverse regardless of normal events or abnormal events, and the diversity in categories is high. How to solve the problems and realize accurate abnormal behavior detection of the transformer substation personnel is a key problem of the invention.

Summary of the invention:

the invention aims to realize automatic identification of abnormal behaviors of transformer substation personnel by utilizing monitoring video information of a transformer substation and through a video time sequence action positioning and abnormality detection technology based on 3D convolution.

Aiming at the characteristic of abnormal behavior of personnel under the video monitoring of the transformer substation, the invention utilizes the priori knowledge to carry out the autonomous acquisition, processing and construction of the monitoring video data set of the abnormal behavior of the personnel of the transformer substation, and introduces a new video data set for detecting the abnormal behavior of the transformer substation. The invention also obtains the time sequence information of the video data obtained by monitoring shooting through the video motion detection model based on the transfer learning, and can realize the accurate positioning of the monitoring video time sequence motion, thereby finding the time of the start and the end of the motion of the staff in a section of video which is not edited and classifying the motion. Meanwhile, for the personnel specific behavior video clip obtained by video action detection, the invention utilizes a video anomaly detection technology, and trains under weak supervision by adopting Multiple Instance Learning (MIL), so that the obtained model can judge whether the clip has an abnormal behavior, thereby realizing accurate detection of the abnormal behavior and the occurring time sequence position, and improving the utilization value of the video monitoring of the transformer substation and the accuracy of anomaly detection.

The technical scheme of the invention is as follows:

a transformer substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection is characterized by comprising the following steps:

s1: the method comprises the steps of automatically acquiring, processing and constructing a monitoring video data set for abnormal behavior of personnel in the transformer substation by using priori knowledge;

s2: constructing a 3D convolution feature extraction network: performing feature extraction on the input unsegmented substation monitoring long video, and extracting feature information of a monitoring video sequence;

s3: constructing a time sequence candidate area extraction network: the method comprises the steps of extracting candidate time sequence segments which may have abnormal behaviors of substation personnel;

s4: constructing a time sequence behavior classification network: classifying and regressing the extracted substation personnel behavior video bands;

s5: constructing an abnormal behavior detection network: performing abnormal behavior detection on the candidate time sequence segments obtained by the time sequence behavior classification network of the step S4;

s6: performing end-to-end joint training on a structure consisting of S2-S4 by adopting transfer learning, and processing monitoring long videos by utilizing a model obtained by training so as to intercept simply classified behavior category video clips;

s7: a frame clustering algorithm based on space-time continuity is designed to segment videos, and abnormal behavior detection networks based on multi-instance learning are adopted to evaluate the contents in video segments, so that whether abnormal behaviors exist in the video segments or not is identified, and the types and the accurate positions of the abnormal behaviors are determined.

Preferably, the method for constructing the data set in step S1 includes:

s11: acquiring substation personnel behavior monitoring videos at different shooting angles and under background environments by using video monitoring equipment erected in a substation;

s12: carrying out time sequence behavior marking on the typical behavior category to construct a video data set for a time sequence action detection task; the time sequence action detection task is based on the time positioning and action category classification of personnel actions of the transformer substation monitoring long video; typical behaviors described herein include, but are not limited to: movement, instrument manipulation, monitoring data, etc.;

s13: labeling a video, and constructing an abnormal behavior identification video data set by marking whether the video segment has abnormal behaviors; the abnormal behavior identification task is used for detecting abnormal behaviors according to the classification based on the video clips obtained by the segmentation of the time sequence action detection module; wherein the abnormal behavior includes, but is not limited to, an illegal operation, indoor sprinting, trunk falling, etc.

Preferably, the method for constructing the 3D convolution feature extraction network in step S2 includes:

s21: C3D (3D conversation) feature extraction network is improved: replacing the normal convolution operation with a depth separable convolution; the improvement is used for greatly reducing the calculation amount and the size of the model on the premise of ensuring the precision;

s22: extracting the characteristics of the monitoring video sequence frames of the transformer substation by adopting an improved C3D characteristic extraction network:

frame sequence of input surveillance video

The feature extraction network composed of multilayer depth separable 3D convolution layers is used for processing to obtain

The characteristic diagram of (1).

Preferably, the method for constructing the candidate timing extraction network in step S3 includes:

s31: first, C obtained in S2_conv5bThe feature map is used as input to carry out candidate time sequence generation, and the time sequence segments are assumed to be uniformly distributed

And each position in the time domain generates K candidate time sequences with different lengthsTotal sum of all

A candidate time series segment;

s32: the time-sequential field is extended by a 3X3 3D convolution filter, and then the size is

Down-sampling it down to the spatial dimension

Obtaining a feature map of the obtained time position features

The 512-dimensional feature vector at each timing position is used to predict the relative shift of the center position { δ c_i,δl_iAnd length of each Anchor c_i,l_i}，i∈{1,…,K}；

S33: in the feature map C_tlAdd two convolutions of 1x1x1 above, predict the confidence score that the candidate temporal segment is background or there is behavioral activity.

Preferably, in step S4, the method for constructing the time-series behavior classification network includes:

s41: performing Non-maximum Suppression (NMS) operation on the candidate time sequence segment obtained in step S3 with 0.6 as a threshold to obtain a Region of interest (RoI) candidate time sequence segment with higher quality;

s42: mapping the RoI to C obtained in step S2_conv5bIn the feature map, output features of 512x1x4x4 are obtained through 3D RoI pooling operation;

s43: the output characteristics are firstly sent to a large full-connection layer for characteristic synthesis, and then the two full-connection layers are respectively classified and regressed: and carrying out substation personnel behavior category classification through a classification layer, and adjusting the starting time and the ending time of the behavior segments through a regression layer. The two full-connection layers respectively refer to a classification layer and a regression layer, the classification layer performs behavior classification, and the regression layer performs behavior time regression.

Preferably, in the step S5, the method for constructing the abnormal behavior detection network includes:

s51: a specific video clip picture size intercepted by the time sequence action detection part; the size and frame rate selection can be adjusted according to the change of the application scene, for example, the size and frame rate selection are adjusted to 360x480, and the frame rate is fixed to 32 fps;

s52: dividing each video clip into a group of unit video clips with fixed length of 1 frame, clustering the unit video clips by a K-means frame clustering algorithm based on space-time continuity, wherein each clustering result represents a complete action; finally, dividing the video into 32 groups of video segments containing single complete action;

s53: extracting the characteristics of the video clips by using the 3D convolution characteristic extraction network constructed in the step S2, and extracting C of each 16 frames_conv5bAdding a full connection layer to obtain 4096-dimensional characteristics;

s54: the extracted features are input into a Multi-Layer Perceptron (MLP) composed of 3 continuous full-connected layers to score each segment, the score of the segment with the maximum abnormal score in the video is used as an abnormal score, and a final predicted abnormal value is obtained.

According to a preferred embodiment of the present invention, in step S53, 512 neurons in the first fully connected layer in MLP are activated using a Linear rectification function (strained Linear Unit, ReLU); the second full-connectivity layer is 32 neurons and the third full-connectivity layer is 1 neuron, activated using Sigmoid function.

Preferably, in the step S6, the process of training and operating the time sequence motion detection model specifically includes:

s61: in the transfer learning, firstly, THUMOS2014 data sets are used for jointly training the corresponding network modules in the steps S2-S4, parameters of the first four layers of the network are extracted by fixing the characteristics of the step S2, and then the data sets constructed in the step S1 are trained to obtain parameters of the later network structure; the design is that the substation personnel behavior video data obtained by S1 is not large in set, and a transfer learning mode is adopted to improve the detection precision and the generalization capability of the model; the thumb 2014 dataset is an open-source action identification and time-sequence action detection dataset;

s62: training the RoI obtained in the step S4 according to the proportion of 1:3 of positive and negative samples; specifically, the RoI of IoU greater than 0.5 of the true value (ground route) is taken as a positive sample, and the RoI less than 0.5 is taken as a negative sample;

s63: for the candidate time series extraction network of step S3 and the time series behavior classification network of step S4, the classification task and the regression task are optimized simultaneously:

classification task L_clsWith softmax penalty, regression task L_regLosses with smooth L1:

wherein N is_clsAnd N_regRepresenting the number of samples selected for a training run and the number of candidate time series segments for regression, λ is a loss balance parameter, set to 1, i is an index of candidate time series segments in a batch process, α_iIs the likelihood of predicting a candidate time series segment as human behavior,

is a group channel of the group channel,

is the relative offset of the predicted temporal segment to the candidate temporal segment,

the coordinate transformation between the ground route and the candidate time sequence segment is as follows:

in the candidate timing extraction sub-network (referred to as the timing candidate area extraction network) of step S3, L_clsPredicting whether a candidate time sequence segment contains a person behavior, regardless of the specific behavior class, L_regOptimizing the relative displacement between the candidate time sequence segment and the ground channel;

in the time-series behavior classification subnetwork (referred to as a behavior classification network) of step S4, L_clsPredicting the specific personnel behavior category of the RoI, and optimizing the relative displacement between the RoI and the ground route; the four losses of the two subnets are jointly optimized;

s64: and processing the monitoring long video of the transformer substation by utilizing the model parameters obtained by training S61-S63 based on the time sequence action detection network module constructed in the steps S2-S4 so as to intercept the simply classified behavior category video clips.

Preferably, the training and operating process of the abnormal behavior detection model in step S7 includes:

s71: adopting a K-means frame clustering algorithm based on space-time continuity to divide the video segments into 32 groups of video segments containing single complete action: firstly, segmenting a video clip into a unit frame data set, and randomly selecting 32 video frames from the unit frame data set as a centroid; calculating the similarity Euclidean distance between each frame in the data set and the front and rear centroids of the time sequence position, and dividing the Euclidean distance into a set to which the centroids closer to each other belong; after all the data are grouped together, recalculating the centroid of each group until the time sequence distance between the newly calculated centroid and the original centroid is less than 8 frames;

s72: using the video segment segmentation algorithm of step S71, each training video is segmented into 32 segments containing a single complete action, the segments are examples in MIL, and each video is a package in MIL: during training, randomly selecting 10 positive example packages (abnormal behavior videos) and 10 negative example packages (normal behavior videos) as mini-batch for training;

s73: extracting the spatio-temporal feature of each example segment by using the 3D convolution feature extraction network constructed in the step S2, and performing a full connection operation to obtain a 4096-dimensional feature which is used as a feature map required by subsequent multi-example learning;

s74: scoring each segment using the MLP constructed in step S54, then selecting a segment with the largest abnormal score from the positive example packet as a potential abnormal sample, selecting a segment with the largest abnormal score from the negative example packet as a non-abnormal sample, and training the model parameters of the MLP using the above two samples, wherein the objective function is:

wherein beta is_aIndicating a positive case, v_aAn abnormal sample is obtained; beta is a_nIndicates a negative example bag, v_nF is a model prediction function;

adopting a Hinge-loss function to enlarge the score difference between the positive example and the negative example, wherein the training effect is that the model outputs high scores for abnormal samples and low scores for non-abnormal samples; the Hinge-loss function is:

in a real scene of the transformer substation, abnormal behaviors usually only occupy a very small period of time, namely, the proportion of positive samples (abnormal behaviors) in a positive packet is very low, so that the scores in the positive packet should be sparse, and therefore sparse constraint is added; meanwhile, considering the temporal structure of the video, since the video segments are continuous, the abnormal scores of the adjacent segments of the video sequence should also be relatively smooth, so that a temporal smoothing constraint is added, and the loss function becomes:

to prevent model overfitting, finally add l2 regular, get the final loss function:

L(w)＝l(β_a,β_n)+||w||_F

wherein w represents a model weight;

s75: based on the abnormal behavior detection network constructed in the step S5, the behavior category video clips obtained in the step S64 are detected by using the model parameters obtained through training in the steps S72-S74, so as to identify whether the specific behavior clip has an abnormal behavior, and the category and the precise time sequence position of the abnormal behavior.

The invention has the beneficial effects that:

according to the method, aiming at the characteristic of abnormal behavior of personnel under the video monitoring of the transformer substation, the prior knowledge is utilized to perform autonomous collection, processing and construction of the abnormal behavior monitoring video data set of the personnel of the transformer substation, and the blank of video data in the field of abnormal behavior detection of the transformer substation is filled; the 3D convolution feature extraction network based on transfer learning is used, so that the model can fully mine the features of the video sequence frame under the condition that the data volume of the substation personnel behavior monitoring video is limited, the algorithm operation efficiency is improved, and the precision and generalization capability of the model are enhanced; the invention adopts an end-to-end joint training mode to train the time sequence action detection module, and realizes high-quality time sequence segment cutting and accurate action classification by fully fusing and utilizing different network structures such as feature extraction, time sequence detection, action classification and the like; meanwhile, the abnormal detection network is trained by using multi-example learning of weak supervision, the obtained model can accurately judge whether the video clip has abnormal behaviors or not, and simultaneously, the abnormal behavior category and the occurring time sequence position are accurately detected, so that the utilization value of the video monitoring of the transformer substation is improved, and the high-efficiency and high-quality abnormal behavior detection of the transformer substation personnel is realized.

Drawings

FIG. 1 is an overall flow diagram of the present invention;

FIG. 2 is a schematic diagram of the abnormal behavior recognition result of the present invention;

fig. 3 is a schematic diagram of the normal behavior recognition result of the present invention.

Detailed Description

The invention is described in detail below with reference to the following examples and the accompanying drawings of the specification, but is not limited thereto.

Examples of the following,

As shown in fig. 1.

A transformer substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection comprises the following steps:

s2: constructing a 3D convolution feature extraction network: performing feature extraction on the input un-segmented substation monitoring long video, and extracting feature information of a monitoring video sequence;

The method for constructing the data set in step S1 includes:

The method for constructing the 3D convolution feature extraction network in step S2 includes:

frame sequence of input surveillance video

The characteristic diagram of (1).

The method for constructing the candidate timing extraction network in step S3 includes:

s31: firstly, C obtained in S2_conv5bThe feature map is used as input to carry out candidate time sequence generation, and the time sequence segments are assumed to be uniformly distributed

And each position in the time domain generates K candidate sequences of different lengths, then a total common sequence is generated

A candidate time series segment;

s32: extending the time series field by using a 3X3 3D convolution filterBy a size of

Down-sampling it down to the spatial dimension

Obtaining a feature map of the obtained time-position features

The 512-dimensional feature vector at each time sequence position is used to predict the relative shift of the center position { deltac }_i,δl_iAnd length of each Anchor c_i,l_i}，i∈{1,…,K}；

The method for constructing the time-series behavior classification network in step S4 includes:

s42: mapping the RoI to C obtained in step S2_conv5bIn the feature map, the output features of 512x1x4x4 are obtained through 3D RoI pooling operation;

The method for constructing the abnormal behavior detection network in the step S5 includes:

In step S53, the first fully-connected layer in MLP has 512 neurons, and is activated using a Linear rectification function (ReLU); the second full-connectivity layer is 32 neurons and the third full-connectivity layer is 1 neuron, activated using Sigmoid function.

The training and running process of the time sequence motion detection model in the step S6 specifically includes:

classification task L_clsRegression task L with softmax loss_regLosses with smooth L1:

is a group channel of the group channel,

in the time-series behavior classification subnetwork (referred to as a behavior classification network) of step S4, L_clsPredicting the specific personnel behavior category of the RoI, and optimizing the relative position between the RoI and the ground routeMoving; the four losses of the two subnets are jointly optimized;

The process of training and operating the abnormal behavior detection model in step S7 includes:

s73: extracting the spatiotemporal feature of each example segment by using the 3D convolution feature extraction network constructed in the step S2, and performing a full-connection operation to obtain a 4096-dimensional feature as a feature map required by subsequent multi-example learning;

wherein, beta_aRepresents a positive case, v_aAn abnormal sample is obtained; beta is a_nIndicates a negative example bag, v_nF is a model prediction function;

to prevent the model from overfitting, the l2 regularization is finally added, resulting in the final loss function:

L(w)＝l(β_a,β_n)+||w||_F

wherein w represents a model weight;

Application examples 1,

The method according to the embodiment is applied to identifying whether a person in a power transformation scene wears a safety helmet or not, as shown in fig. 2.

Processing the long video through time sequence action detection, acquiring action category video clips, and judging personnel actions as monitoring data; and carrying out abnormity judgment on the video clips through abnormal behavior detection, and judging whether the behaviors of the personnel are abnormal, wherein the abnormal behaviors are that the safety helmet is not worn.

Application examples 2,

When the method is applied to identifying whether a person wearing a safety helmet in a power transformation scene, as shown in fig. 3.

Processing the long video through time sequence action detection, acquiring action category video clips, and judging personnel actions as monitoring data; and carrying out abnormal judgment on the video clip through abnormal behavior detection to judge that the behavior of personnel is normal, wherein the normal behavior is to wear a safety helmet.

Claims

1. A transformer substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection is characterized by comprising the following steps:

s5: constructing an abnormal behavior detection network: detecting abnormal behaviors of the behavior category video clips obtained by the time sequence behavior classification network in the step S4;

s6: performing end-to-end joint training on a structure consisting of S2-S4 by adopting transfer learning, and processing and monitoring long videos by using a model obtained by training so as to intercept simply classified behavior category video clips;

s7: designing a frame clustering algorithm based on space-time continuity to segment a video, and evaluating the content in a video clip by adopting an abnormal behavior detection network based on multi-instance learning, so as to identify whether abnormal behaviors exist in the video clip and determine the type and the accurate position of the abnormal behaviors;

the method for constructing the time sequence candidate area extraction network in step S3 includes:

s31: obtained at S2C _conv5bTaking the feature map as an input to generate a candidate time sequence;

Down-sampling it down to the spatial dimension

Obtaining a characteristic diagram of the obtained time position characteristics

(ii) a A512-dimensional feature vector at each timing position is used to predict the relative offset of the center positionδc _i,δl _iGreat and the length of each Anchorc _i,l _i}，i∈{l,...,K }；

S33: in the feature diagramC _tlAdding two convolutions of 1x1x1, and predicting the confidence score of the candidate time sequence segment as the background or the existing behavior activity;

s41: performing non-maximum suppression operation on the candidate time sequence segment obtained in the step S3 by using 0.6 as a threshold value to obtain a candidate time sequence segment;

s42: mapping the RoI to that obtained in step S2C _conv5bIn the profile, the output of 512x1x4x4 is obtained by 3D RoI poolingCharacteristic;

s43: the output characteristics are firstly sent to a large full-connection layer for characteristic synthesis, and then the two full-connection layers are respectively classified and regressed: classifying the behavior categories of the substation personnel through a classification layer, and adjusting the starting time and the ending time of the behavior segments through a regression layer;

s51: adjusting the size and frame rate of a specific video segment picture intercepted by the time sequence action detection part;

s52: dividing each video clip into a group of unit video clips with fixed length of 1 frame, clustering the unit video clips by a K-means frame clustering algorithm based on space-time continuity, wherein each clustering result represents a complete action;

s53: extracting the characteristics of the video clips by using the 3D convolution characteristic extraction network constructed in the step S2, and extracting each 16 framesC _conv5bAdding a full connection layer to obtain 4096-dimensional characteristics;

s54: inputting the extracted features into a multilayer perceptron consisting of 3 continuous full-connected layers to score each segment, taking the score of the segment with the maximum abnormal score in the video as the abnormal score, and obtaining the final predicted abnormal value;

the method for constructing the data set in step S1 includes:

s11: acquiring monitoring videos of the behaviors of transformer substation personnel at different shooting angles and under background environments;

s12: carrying out time sequence behavior marking on the typical behavior category to construct a video data set for a time sequence action detection task; the time sequence action detection task is based on the time positioning and action category classification of personnel actions of the transformer substation monitoring long video;

s13: labeling a video, and constructing an abnormal behavior identification video data set by marking whether the video segment has abnormal behaviors; the abnormal behavior identification task is used for detecting abnormal behaviors according to the classification based on the video clips obtained by the segmentation of the time sequence action detection module;

s21: the C3D feature extraction network is improved: replacing the normal convolution operation with a depth separable convolution;

frame sequence of input surveillance video

A characteristic diagram of (1);

s61: in the transfer learning, firstly, THUMOS2014 data sets are used for jointly training the corresponding network modules in the steps S2-S4, parameters of the first four layers of the network are extracted by fixing the characteristics of the step S2, and then the data sets constructed in the step S1 are trained to obtain parameters of the later network structure;

s62: training the RoI obtained in the step S4 according to the proportion of 1:3 of positive and negative samples;

s63: for the time series candidate region extraction network of step S3 and the time series behavior classification network of step S4, the classification task and the regression task are optimized at the same time:

classification taskL _clsWith softmax loss, regression tasksL _regLosses with smooth L1:

wherein,N _clsandN _regrepresenting the number of samples selected for one training and the number of candidate time series segments for regression, λ is a loss balance parameter,iis oneThe subscripts of the candidate time series segments in an individual batch,α _iis the likelihood of predicting a candidate time series segment as human behavior,α _i ^*is a group channel of the group channel,

is the relative offset of the predicted temporal segment to the candidate temporal segment,t _i ^*={δc _i,δl _ithe coordinate transformation between the group channel and the candidate timing sequence fragment is calculated by the following formula:

2. The substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection according to claim 1, wherein in step S53, 512 neurons in the first full connection layer in MLP are activated by using a linear rectification function ReLU; the second full-connectivity layer is 32 neurons and the third full-connectivity layer is 1 neuron, activated using Sigmoid function.

3. The substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormal detection as claimed in claim 1, wherein the process of training and operating the abnormal behavior detection network in step S7 includes:

s71: dividing the video segments into 32 groups of video segments containing single complete action by adopting a K-means frame clustering algorithm based on space-time continuity;

s72: using the video segment segmentation algorithm of step S71, each training video is segmented into 32 segments containing a single complete action, the segments are examples in MIL, and each video is a package in MIL: during training, randomly selecting 10 positive example bags and 10 negative example bags as mini-batch for training;

wherein,β _athe positive example packet is shown as a positive example packet,v _aan abnormal sample is obtained;β _na negative example packet is shown, and,v _nthe sample is a non-abnormal sample,fpredicting a function for the model;

adopting a Hinge-loss function to enlarge the score difference between the positive case and the negative case, wherein the training effect is that the model outputs high scores for abnormal samples and outputs low scores for non-abnormal samples; the Hinge-loss function is:

adding a temporal smoothing constraint, the loss function becomes:

L(w)=l(β _a，β _n)+||w||_F

wherein,wrepresenting model weights;