CN114373162B

CN114373162B - Dangerous area personnel intrusion detection method and system for transformer substation video monitoring

Info

Publication number: CN114373162B
Application number: CN202111573235.6A
Authority: CN
Inventors: 朱建宝; 孙玉玮; 马青山; 俞鑫春; 邓伟超; 施烨; 叶超; 陈鹏; 葛春燕
Original assignee: Nantong Power Supply Co Of State Grid Jiangsu Electric Power Co
Current assignee: Nantong Power Supply Co Of State Grid Jiangsu Electric Power Co
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2023-12-26
Anticipated expiration: 2041-12-21
Also published as: CN114373162A

Abstract

A dangerous area personnel intrusion detection method and system for transformer substation video monitoring can effectively solve the existing passive monitoring problem in transformer substation video monitoring, separate semantic segmentation of personnel and personnel channels is achieved through an improved PSPNet semantic segmentation model, a black-and-white mask image is obtained through binarization of mask images obtained through semantic segmentation, then expansion post-processing is conducted to merge respective fracture areas, finally whether personnel leave a safe personnel channel to enter a dangerous area is judged based on the intersection relation of the personnel and the personnel channel mask images, an alarm is sent out according to the situation, and intelligent transformer substation video monitoring can be greatly promoted.

Description

Dangerous area personnel intrusion detection method and system for transformer substation video monitoring

Technical Field

The invention belongs to the field of safety control, in particular to a transformer substation video monitoring processing method, and particularly relates to a dangerous area personnel intrusion detection method and system for transformer substation video monitoring.

Background

Video monitoring is widely applied in the production environment of a transformer substation, but still belongs to passive monitoring, and is often only used for post-analysis of accidents. Meanwhile, the transformer substation safety accidents often come from the fact that workers enter dangerous areas (non-personnel channels), real-time warning cannot be achieved by means of subjective judgment of monitoring personnel, and the working intensity of the monitoring personnel is too high.

In the prior art, some researches are also carried out on active video monitoring, and the Chinese patent application CN201910987198.X discloses a suspicious personnel invasion detection method based on a video monitoring platform, wherein each section of monitoring video is subjected to frame extraction by collecting monitoring videos for personnel detection, and each picture in a data set is subjected to manual marking of a real target. Then clustering analysis is carried out on the real targets marked in the data set established in the step by using a Kmeans clustering algorithm to obtain different length-width combinations; performing yolov3 network training on the manually marked data set pair, and optimizing a loss function by adopting a random gradient descent method according to three parts of a trained loss function, namely regression loss of a target frame, confidence loss and classification loss of classification discrimination of the target frame; the final output of the network contains three-scale information, and the personnel with different scales in large, medium and small sizes are detected respectively; inputting a video frame, and marking a forbidden area by a rectangular frame, wherein the rectangular area is called a warning area; reading image data from a video recorder, sending the selected warning area into a trained deep learning network for detection, and outputting a predicted picture; if a red rectangular frame appears in the warning area, suspicious personnel is detected, the occurrence of the invasion phenomenon of the suspicious personnel can be judged, and early warning is carried out by combining an alarm system.

As can be seen, in the scheme of the patent (cn201910987198. X), personnel intrusion detection is mainly performed by a target detection method based on a YOLO network, however, the architecture thought of the YOLO network is single-step detection, the accuracy is low, the YOLO network is used for image target detection, a boundary rectangular frame of the target is output, the boundary rectangular frame has a larger difference from the actual pixels of the target, and misjudgment is easily caused by intrusion detection performed by using the rectangular frame.

Patent application CN201910783166.8 discloses a method and a system for intrusion detection and alarm in a non-operation area of a transformer substation, and the main technical route is as follows: firstly, dividing a transformer substation area into an operation area and a non-operation area; positioning information of operators is obtained through a positioning system arranged on a transformer substation, and a movement track is displayed on an electronic map according to the positioning information; when the positioning system detects that an operator invades a non-operation area, an alarm system arranged on a transformer substation sends out alarm information, a video acquisition system arranged on the transformer substation acquires videos of the invaded sites, the videos of the invaded sites are played through a display, and the movement track of the corresponding operator is displayed through an electronic map, so that monitoring of the operation site of the transformer substation is realized.

The method relies on the normal operation of the sensor and the positioning accuracy thereof, and belongs to passive monitoring. Although effective warning can be provided for the intrusion of staff into dangerous areas, the system is not useful for the extraneous personnel without the positioning system, and the staff can avoid intrusion detection by damaging positioning equipment and the like.

According to the investigation result of the background technology, it is easy to find that the existing personnel intrusion detection method mainly has two problems:

(1) If the monitoring means are based on specific sensing or positioning equipment, the monitoring means are very passive and the restricted crowd is limited;

(2) The target positioning result in the image obtained by target detection is a rectangular frame where the object is located, and based on the result, the target detection can cause frequent misjudgment due to the actual difference between the rectangular frame and the object.

Therefore, the application provides a dangerous area personnel intrusion detection method based on an improved PSPNet semantic segmentation network model, which can actively discover dangers and give alarms in time after personnel enter a dangerous area based on semantic information obtained by a substation monitoring camera.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a dangerous area personnel intrusion detection method used in transformer substation video monitoring.

The invention adopts the following technical scheme.

The dangerous area personnel intrusion detection method for transformer substation video monitoring is characterized by comprising the following steps of:

(1) Aiming at the requirement of personnel intrusion detection, a classical semantic segmentation network, namely a PSPNet network model is improved, so that the network output result is a prediction vector of each category to which a pixel belongs, and a desired label is changed into a multi-hot vector in a multi-hot coding format, namely, single pixel is allowed to belong to different categories; (2) After finishing the improvement of the semantic segmentation network, the video stream is led out from the substation monitoring equipment, and the semantic segmentation network after the improvement in the step (1) is called to carry out multidimensional semantic segmentation on each frame of substation video monitoring scene in the video stream in real time, so as to segment mask patterns of personnel channels, personnel and background;

(3) Performing binarization operation on the mask map divided in the step (2) respectively, and performing expansion operation on the personnel mask map;

(4) Judging whether the personnel leave the safe personnel channel or not according to the intersection relation of the personnel channel mask diagram obtained in the step (3) and the personnel mask diagram after the expansion operation, and entering a dangerous area.

The invention further comprises the following preferable schemes:

in the step (1), the improved PSPNet network model structurally comprises an image feature extraction module, a global feature fusion module and a multidimensional mask prediction module; the image feature extraction module is used for extracting image features of a video monitoring scene of the transformer substation; the global feature fusion module acquires image global information of corresponding scales through pooling operations of different sizes, and generates a fusion feature map after splicing the image global information with the initial feature map; and the multi-dimensional mask prediction module accesses the fusion feature map into the multi-dimensional mask prediction module to generate a mask map with the same width and height as the original image.

The image feature extraction module and the global feature fusion module of the improved PSPNet network model access the fusion feature image into a multidimensional mask prediction module, wherein the multidimensional mask prediction module consists of a 3X 3 convolution group, a primary up-sampling layer, a primary dropout layer and a 1X 1 convolution layer, a mask image with the same width and height as the original image is finally generated, the depth of the mask image is 3, and the mask image needs to be divided into three categories of personnel, personnel channels and background.

In the step (2), the video monitoring scene of each frame of transformer substation in the video stream is subjected to multi-dimensional semantic segmentation in real time through the improved PSPNet network model, the output result is a three-dimensional matrix with the width and height equal to those of the input image and the thickness equal to the segmentation class, the feature vector of the ith row and the jth column of the matrix corresponds to the pixel point feature of the jth row and the jth column in the original image, and the feature vector is as shown as X= [ X ] ₀ ,x ₁ ,x ₂ ]，x ₀ 、x ₁ 、x ₂ Characteristic values of background, personnel and personnel channels at the pixel points are represented respectively; vector the prediction vector by using the following normalization functionNormalized to p= [ p ] ₀ ,p ₁ ,p ₂ ]Let it represent the probability that the pixel point is background, person channel:

wherein: delta is a number slightly greater than 0, avoiding the denominator tending toward 0,

probability distribution vector p= [ p ] obtained after semantic segmentation of transformer substation video monitoring images obtained at this time ₀ ,p ₁ ,p ₂ ]Representing the probability that a pixel belongs to each class.

Delta is within the range of [10 ] ^-5 ,10 ^-2 ]。

The multi-dimensional mask prediction module of the improved PSPNet network model outputs a result to summarize a multi-layer mask graph, wherein the width and height of the size parameter of the multi-layer mask graph are the same as those of the original image, and the number of layers of the multi-layer mask graph is the same as that of the type to be segmented.

Obtaining a predicted probability distribution p= [ p ] ₀ ,p ₁ ,p ₂ ]Then, constructing the following loss evaluation function to evaluate network loss, and correcting network error, wherein the loss evaluation function formula is as follows:

wherein: y= [ y ] ₀ ,y ₁ ,y ₂ ]Is a multi-hot encoded representation of the sample tag, i.e. y when the sample belongs to the ith class _i =1, otherwise y _i =0; w and h are mask pattern width and height, and C is sample class number.

In step (3), binarization operation is performed for each layer of the obtained multi-layer mask map:

wherein: p (x, y) represents a corresponding probability value somewhere in the mask map, x, y represents pixel points in the mask map, sigma represents a segmentation threshold value, and generally 0.5 is taken;

therefore, the output three-layer mask image is respectively converted into three-layer binary mask images, each layer of mask image is a two-dimensional binary matrix with the width and height equal to the original image size and the elements only being 0 or 1, and the two-dimensional binary matrix represents whether each pixel in the original image belongs to the current category, if the element is 0, the pixel point of the first row and the first column of the image does not belong to the current category, and if the element is 1, the pixel point belongs to the current category.

The person mask map is expanded as follows: for a person pixel region in the person mask map, that is, a connected region having an element value of 1 in the person mask map, a width span (w) is set for each connected region of the person pixel region ₁ ，w ₂ ) The height span is (h ₁ ，h ₂ ) Taking the height coordinate in the connected domain to be at (h ₁ +(h ₂ -h ₂ )/8，h ₂ ) Is used as a person standing area.

In the step (4), if the sum of the areas of the person standing area and the person passage area after expansion is equal to the sum of the union areas, the person is indicated to enter a dangerous area, and the monitoring is controlled and simultaneously an alarm is sent out; if the sum of the union areas is smaller, the personnel is still in the safe personnel channel.

The application also discloses a dangerous area personnel intrusion detection system in transformer substation video monitoring by utilizing the dangerous area personnel intrusion detection method, which comprises a monitoring video deriving unit, a monitoring scene image multidimensional semantic segmentation unit, an image expansion unit and a personnel intrusion judging unit; the method is characterized in that:

the monitoring video export unit exports video streams from substation monitoring equipment and uploads the video streams to the monitoring scene image multidimensional semantic segmentation unit;

the monitoring scene image multidimensional semantic segmentation unit carries out multidimensional semantic segmentation on each frame of transformer substation video monitoring scene in the video stream in real time, and personnel channels and personnel are segmented;

the image expansion unit performs expansion operation on the segmented personnel channels and personnel respectively, and merges adjacent small areas;

the personnel invasion judging unit judges the relative position according to the intersection relation of the personnel channels and the personnel after the expansion operation and the merging of the adjacent small areas, and further judges whether the personnel leave the safe personnel channel or not and enter the dangerous area.

Compared with the prior art, the invention has the following beneficial effects:

1. the method provides a dangerous area personnel intrusion detection method used in transformer substation video monitoring, the problem is regarded as the intersection problem of personnel and personnel channel pixel areas, and personnel entering a dangerous area (non-personnel channel) can be rapidly judged.

2. The method abandons the common intrusion detection method based on target detection, and extracts the characteristic information based on the semantic segmentation network instead, so that the prediction result is accurate to the pixel level, and the precision and reliability of intrusion detection are greatly improved.

3. Aiming at the problem that the traditional semantic segmentation network expects to output a result of one-hot coding and does not have the image target overlapping reading capability, the method provides an improved method for the structure of the traditional semantic segmentation network, the result of the semantic segmentation network is adapted to be a multi-layer feature mask diagram, the number of layers corresponds to the number of categories to be segmented, the label coding format is changed to multi-hot coding, a loss function and a probability distribution function are redefined according to the method, and the effective operation of a model is ensured.

4. According to the method, binarization and expansion operations are carried out on the pictures through the probability mask map according to the categories, which is obtained after the pictures are processed through the improved semantic segmentation network, the pictures are converted into a multi-layer binarization mask map according to the categories, a specific area of a personnel pixel area in the personnel binarization mask map is selected as a personnel standing area, and then pixel-level early warning of personnel intrusion detection is achieved according to the intersection relation of the personnel standing area and the personnel channel area.

Drawings

Fig. 1 is a schematic flow chart of a dangerous area personnel intrusion detection method used in transformer substation video monitoring.

FIG. 2 is a schematic diagram of the present invention directed to semantic segmentation network architecture improvement.

Fig. 3 is a schematic view of a person standing area extracted from a person pixel area according to the present invention.

Fig. 4 is a schematic diagram of personnel intrusion detection based on mask pattern intersection according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described herein are merely some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are within the scope of the present invention.

As shown in fig. 1, the invention discloses a dangerous area personnel intrusion detection method for transformer substation video monitoring, which comprises the following steps:

(1) Aiming at the problem that the traditional semantic segmentation network can only predict single category aiming at single pixel point, and aiming at the requirement of personnel intrusion detection, the classical semantic segmentation network is improved, which mainly comprises the following steps: 1) Changing the network result to make the network output result be the prediction vector of each category to which the pixel belongs; 2) Changing the desired label into multi-hot vectors in multi-hot encoding format, i.e. allowing single pixel points to be assigned to different categories; 3) Redefining a confidence coefficient calculation mode of the category of single pixel point attribution aiming at the characteristics of the multi-hot coding format; 4) The penalty function is redefined for format variations of the predicted and desired labels.

In the step (1), an improved PSPNet network model is adopted when semantic segmentation is carried out on a transformer substation video monitoring scene, and the structure of the improved PSPNet network model comprises an image feature extraction module, a global feature fusion module and a multidimensional mask prediction module. The composition characteristics of each module in the improved PSPNet network model comprise: the feature extraction module can be replaced, including ResNet101, resNet50, resNet18; resNet101 is comprehensively selected as a feature extraction module to obtain an initial feature map in combination with the calculated amount, and cavity convolution is adopted in the initial feature map to avoid information loss caused by pooling.

After feature extraction, a global feature fusion module adopts four self-adaptive average pooling layers (nn. Adaptive AvgPool2 d) with pooling size ratio of 1:2:3:6 to obtain image global information with four scale sizes, uses a 1X 1 convolution group with 4 parameters for channel compression, uses bilinear interpolation for up-sampling and jointly splices the initial feature map on the channel layer to generate a fusion feature map with the same width, height and depth as the initial feature map;

and then the fusion feature map is connected into a multidimensional mask prediction module, the module consists of a 3X 3 convolution group, a primary up-sampling layer, a primary dropout layer and a 1X 1 convolution layer, a mask map with the same width and height as the original image is finally generated, the depth of the mask map is determined by the divided categories, and in the invention, the mask map needs to be divided into personnel, personnel channels and backgrounds, so the depth of the mask map is 3. The resulting mask map may be used by the penalty function to perform penalty calculations using the mask map with the true callout.

The improved PSPNet network model loss function adopts a multi-classification cross entropy loss function to classify the personnel channels and the personnel at the pixel level;

as shown in fig. 2, the improved semantic segmentation network finally obtains a prediction vector x= [ X ] with a length of category number for a single pixel ₀ ,...,x _C-1 ](taking this patent as an example, the prediction vector in this application includes x ₀ /x ₁ /x ₂ Representing the background, personnel and personnel channels at the pixel points respectively, firstly, carrying out vector normalization on the pixel points to obtain p= [ p ] by using the following normalization function ₀ ,...,p _C-1 ]So that it represents the probability (i.e. the probability that the pixel point is the background, person channel).

Obtaining a predicted probability distribution p= [ p ] ₀ ,...,p _C-1 ]Then, the cross entropy loss calculation is carried out on the obtained product, andnetwork parameters are updated through a random inertia gradient descent method, and segmentation loss is reduced.

The feature vector obtained after the semantic segmentation is carried out on the transformer substation video monitoring image at this time represents the probability that a certain pixel point belongs to each category, the width and height of the dimension parameter of the finally obtained multi-layer mask image are the same as those of the original image, the number of layers is the same as that of the categories required to be segmented, and the image is segmented into the background, the personnel and the personnel channels in the patent, so the mask image is three layers. Assuming that the original image size is w×h×3, the mask pattern size is also w×h×3, and the numerical range in the mask pattern is 0-1, so that the three-layer binary mask pattern is obtained after binarizing each layer of the mask pattern.

(2) After finishing the semantic segmentation network improvement, a video stream is led out from the substation monitoring equipment, the network in the step (1) is called to carry out multidimensional semantic segmentation on each frame of substation video monitoring scene in the video stream in real time, and a personnel channel and a mask diagram of personnel are segmented;

in the step (2), the video monitoring scene of each frame of transformer substation in the video stream is subjected to multi-dimensional semantic segmentation in real time through the improved PSPNet network model, the output result is a three-dimensional matrix with the width and height equal to those of the input image and the thickness equal to the segmentation class, the feature vector of the ith row and the jth column of the matrix corresponds to the pixel point feature of the jth row and the jth column in the original image, and the feature vector is as shown as X= [ X ] ₀ ,x ₁ ,x ₂ ]，x ₀ 、x ₁ 、x ₂ Characteristic values of background, personnel and personnel channels at the pixel points are represented respectively; vector normalization of the prediction vector to p= [ p ] using the normalization function ₀ ,p ₁ ,p ₂ ]Let it represent the probability that the pixel point is background, person channel:

wherein: delta is a number slightly larger than 0, the denominator is prevented from tending to 0, and the value range is [10 ] ^-5 ,10 ^-2 ]。

The video monitoring for the transformer substation obtained at the momentProbability distribution vector p= [ p ] obtained after semantic segmentation of image ₀ ,p ₁ ,p ₂ ]Representing the probability that a pixel belongs to each class.

wherein: y= [ y ] ₀ ,y ₁ ,y ₂ ]Is a multi-hot encoded representation of the sample tag, i.e. y when the sample belongs to the ith class _i =1, otherwise y _i =0; w and h are mask pattern width and height, and C is the sample class number, which is 3 in the present application.

(3) Performing binarization operation on the multi-layer mask map (the number of layers of the mask map depends on the category of semantic segmentation) subjected to semantic segmentation in the step (2) to obtain a multi-layer binary mask map, performing expansion operation on the mask map corresponding to personnel and personnel channels by using (15, 15) expansion units respectively, merging small fracture areas, and solving respective mask defects. As shown in fig. 3, for each connected region of the person pixel region in the person mask map, the width span thereof is set to (w ₁ ，w ₂ ) The height span is (h ₁ ，h ₂ ) Taking the height coordinate in the connected domain to be at (h ₁ +(h ₂ -h ₂ )/8，h ₂ ) Is used as a personnel standing area;

(4) And (3) judging whether the personnel leave the safe personnel channel or not according to the intersection relation of the personnel channel mask diagram obtained in the step (3) and the personnel standing area, and entering the dangerous area.

As shown in fig. 4, if the sum of the areas of the standing area and the personnel access area is equal to the sum of the union areas, personnel enter the dangerous area, and the monitoring is controlled to send out an alarm at the same time; if the sum of the union areas is smaller, the personnel is still in the safe personnel channel.

the monitoring scene image multidimensional semantic segmentation unit carries out multidimensional semantic segmentation on each frame of transformer substation video monitoring scene in the video stream in real time, segments a personnel channel area mask map and a personnel area mask map, and carries out binarization to obtain a binarized personnel channel area and a binarized personnel area mask map;

and (5) respectively expanding the divided binarized personnel channel area and the personnel area by using an expansion unit (15, 15), and combining adjacent small areas. For each connected domain of the personnel pixel region in the personnel mask map, the width span thereof is set as (w ₁ ，w ₂ ) The height span is (h ₁ ，h ₂ ) Taking the height coordinate in the connected domain to be at (h ₁ +(h ₂ -h ₂ )/8，h ₂ ) Is used as a personnel standing area;

the personnel invasion judging unit judges the relative position according to the intersection relation of the personnel standing area and the personnel passage area, so as to judge whether personnel leave the safe personnel passage or not and enter the dangerous area.

In the following, an example implementation will be performed using the currently-used open source platform pytorch, and if the function is reproduced in other platforms, a module with the same or similar function as the corresponding function or toolkit of the platform may be selected for reproduction.

Firstly, a transformer substation monitoring scene data set is manufactured, 200 pieces of personnel and personnel channel areas are marked manually, an image sample is expanded in a data enhancement mode, the expansion is ten times of the original expansion, and a training set and a testing set are divided into 1:9. during marking, a semantic segmentation tool Labelme marking software is adopted to generate a real mask picture required by training. At this point, the sample and the actual tag pictures required for the network training are ready.

Then, a feature extraction Module ResNet101, a global feature fusion Module PSP Module and a final multidimensional mask prediction Module are sequentially built under a Pytorch framework. The ResNet101 contains 101 convolution groups, each of which contains convolution operations, batch regularization, and ReLu activation operations, as shown in FIG. 3. The convolution kernels with the sizes of 3×3 and 1×1 are used for filtering the image, extracting useful information related to personnel and personnel channel characteristics, and generating an initial characteristic diagram. As shown in fig. 2, the global feature fusion module adopts four scale self-adaptive average pooling layers, finally generates a feature map with a scale ratio of 1:2:3:6, compresses the number of channels of each convolution group with a size of 1×1 to be 1/4 of the original number, and generates a final fusion feature map for segmentation by splicing the convolution group with the same scale as the initial feature map through bilinear interpolation. The mask prediction module includes an up-sampling layer, a dropout layer, and a 1×1 convolution layer, further expanding the feature map into a mask feature map of the same size as the original image. Specifically, in a substation monitoring image with the size of (720,720,3), in a feature extraction module, namely ResNet101, an initial feature image with the size of (90,90,2048) is obtained through multiple hole convolutions and residual connection, in a global feature fusion module, feature images with the sizes of (1,1,2048), (2,2,2048), (3,3,2048) and (6,6,2048) are respectively obtained through a global adaptive pooling layer (nn. Adaptive avgpool2 d), channels are compressed into feature images with the sizes of (1,1,512), (2,2,512), (3,3,512) and (6,6,512) through a 1×1 convolution layer with the parameter learning, then the feature images are changed into feature images with the sizes of (90,90,512) through bilinear interpolation, finally, the initial feature image and the feature images after bilinear interpolation are spliced on the channels to obtain a fused feature image with the size of (90,90,4096), and in a mask prediction module, a mask feature image with the size of (720,720,3) is obtained through a convolution layer, a dropout layer and an upsampling layer.

After obtaining the mask feature map, a loss function needs to be built in Pytorch, and cross entropy loss of the mask feature map and the real tag is calculated using a class function nn.

Wherein z= [ z ] ₀ ,...,z _C-1 ]Feature vector output representing a sample, i.e., a mask feature map; c represents the number of classes of the sample, and 3 is taken in the present application.

Then, after the PSPNet weights pre-trained under the City Scaps urban road traffic data set are loaded by using Pytorch, the model weights are fine-tuned by using the training set. Specifically, by adopting a momentum gradient descent method, a momentum item beta is set to be 0.9, a weight attenuation coefficient is set to be 0.0001, meanwhile, an online data enhancement technology is used, including mirroring, zooming an image between (0.5 and 2), rotating the image between (-10 degrees and 10 degrees), enhancing probability is uniformly set to be 50%, and finally, the trained network model can be more robust to personnel and personnel channels in a transformer substation scene.

And finally, after the finely tuned model weight is obtained, extracting an image frame from a transformer substation video monitoring stream, inputting the image frame into an improved PSPNet network for semantic segmentation, obtaining a probability mask map of personnel and personnel channels, obtaining a binary personnel and personnel channel mask map through binarization operation, respectively performing expansion post-processing on the communication domain fracture problem of the mask map, sliding the binary image by using convolution kernels with the sizes of (15 and 15), finally merging fracture areas, and expanding respective mask areas. For person pixels in a person mask mapThe area, for each connected area of the personnel pixel area, is set with the width span as (w ₁ ，w ₂ ) The height span is (h ₁ ，h ₂ ) Taking the height coordinate in the connected domain to be at (h ₁ +(h ₂ -h ₂ )/8，h ₂ ) Is used as a person standing area. Judging the risk of personnel intrusion detection according to the intersection relation of the personnel standing area and the personnel passage area, if the sum of the areas of the personnel standing area and the personnel passage area is equal to the sum of the areas of the two areas, indicating personnel to enter the dangerous area, and monitoring and giving an alarm; if the sum of the areas is smaller than the sum of the areas, the personnel are still in the safe personnel channel, and do not enter the dangerous area. The dangerous area personnel intrusion detection method for the transformer substation video monitoring can effectively solve the existing passive monitoring mode in the transformer substation monitoring, judges whether personnel leave a safe personnel channel to enter a dangerous area according to the intersection of the personnel and the personnel channel, gives an alarm according to the situation, and promotes the intelligent of the transformer substation video monitoring.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The dangerous area personnel intrusion detection method for transformer substation video monitoring is characterized by comprising the following steps of:

(1) Aiming at the requirement of personnel intrusion detection, a classical semantic segmentation network PSPNet network model is improved, and the structure of the improved classical semantic segmentation PSPNet network model comprises an image feature extraction module, a global feature fusion module and a multidimensional mask prediction module; the network output result of the improved classical semantic segmentation PSPNet network model is the prediction vector of each category to which the pixel belongs, the expected label is changed into a multi-hot vector in a multi-hot coding format, and single pixel is allowed to belong to different categories;

(2) After finishing the improvement of the semantic segmentation network, the video stream is led out from the substation monitoring equipment, and the semantic segmentation network after the improvement in the step (1) is called to enter each frame of substation video monitoring scene in the video stream in real timeObtaining probability distribution function of certain pixel point belonging to each category by multi-dimensional semantic segmentation of rowsWherein: delta is a number slightly larger than 0, so that denominator tends to 0 is avoided, and probability distribution vectors obtained after semantic segmentation is carried out on the transformer substation video monitoring image at the moment represent the probability that a certain pixel point belongs to each category; mask patterns for dividing personnel channels, personnel and backgrounds and constructing a loss evaluation function +.>Wherein: y= [ y ] ₀ ,y ₁ ,y ₂ ]Is a multi-hot encoded representation of the sample tag, i.e. y when the sample belongs to the ith class _i =1, otherwise y _i =0; w and h are mask pattern width and height, and C is sample class number; evaluating network loss and correcting network errors;

(3) Performing binarization operation on the mask map divided in the step (2) and performing expansion operation on the personnel mask map, wherein when the sum of the areas of the personnel standing area and the personnel passage area after expansion is equal to the sum of the union areas, personnel entering the dangerous area is indicated;

2. The method for detecting personnel intrusion in a hazardous area in video monitoring of a transformer substation according to claim 1, wherein:

3. The hazardous area personnel intrusion detection method for use in substation video monitoring according to claim 2, wherein:

the image feature extraction module and the global feature fusion module of the improved PSPNet network model access the fusion feature image into a multidimensional mask prediction module, wherein the multidimensional mask prediction module consists of a 3'3 convolution group, a primary up-sampling layer, a primary dropout layer and a 1'1 convolution layer, a mask image with the same width and height as the original image is finally generated, the depth of the mask image is 3, and the mask image needs to be divided into three categories of personnel, personnel channels and background.

4. A hazardous area personnel intrusion detection method for use in video monitoring of a substation according to claim 1 or 3, characterized by: improved semantic segmentation network

In the step (2), the video monitoring scene of each frame of transformer substation in the video stream is subjected to multi-dimensional semantic segmentation in real time through the improved PSPNet network model, the output result is a three-dimensional matrix with the width and height equal to those of the input image and the thickness equal to the segmentation class, the feature vector of the ith row and the jth column of the matrix corresponds to the pixel point feature of the jth row and the jth column in the original image, and the feature vector is as shown as X= [ X ] ₀ ，x ₁ ，x ₂ ]，x ₀ 、x ₁ 、x ₂ Characteristic values of background, personnel and personnel channels at the pixel points are represented respectively; vector normalization of the prediction vector to p= [ p ] using the normalization function ₀ ，p ₁ ，p ₂ ]Let it represent the probability that the pixel point is background, person channel:

wherein: delta is a number slightly greater than 0, avoiding the denominator tending to 0, whichProbability distribution vector p= [ p ] obtained after semantic segmentation of transformer substation video monitoring images ₀ ，p ₁ ，p ₂ ]Representing the probability that a pixel belongs to each class.

5. The method for detecting personnel intrusion in a hazardous area in video monitoring of a transformer substation according to claim 4, wherein:

delta is within the range of [10 ] ^-5 ,10 ^-2 ]。

6. The method for detecting personnel intrusion in a hazardous area in video monitoring of a transformer substation according to claim 4, wherein:

7. The method for detecting personnel intrusion in a hazardous area in video monitoring of a transformer substation according to claim 4, wherein:

obtaining a predicted probability distribution p= [ p ] ₀ ，p ₁ ，p ₂ ]Then, constructing the following loss evaluation function to evaluate network loss, and correcting network error, wherein the loss evaluation function formula is as follows:

wherein: y= [ y ] ₀ ，y ₁ ，y ₂ ]Multihot encoded representation of sample tag, y when sample belongs to the ith class _i =1, otherwise y _i =0; w and h are mask pattern width and height, and C is sample class number.

8. The hazardous area personnel intrusion detection method for use in video monitoring of a substation according to claim 1 or 7, wherein:

wherein: p (x, y) represents a corresponding probability value somewhere in the mask map, x, y represents pixel points in the mask map, sigma represents a segmentation threshold value, and 0.5 is taken;

9. The method for detecting personnel intrusion in a hazardous area in video monitoring of a substation according to claim 8, wherein:

the person mask map is expanded as follows: for the connected domain of the personnel pixel area in the personnel mask image, namely the element value of 1 in the personnel mask image, the width span of each connected domain of the personnel pixel area is set as (w ₁ ，w ₂ ) The height span is (h ₁ ，h ₂ ) Taking the height coordinate in the connected domain to be at (h ₁ +(h ₂ -h ₁ )/8，h ₂ ) Is used as a person standing area.

10. The hazardous area personnel intrusion detection method for use in video monitoring of a substation according to claim 1 or 9, wherein:

in the step (4), when personnel enter a dangerous area, controlling monitoring and simultaneously giving an alarm; if the sum of the union areas is smaller, the personnel is still in the safe personnel channel.

11. A dangerous area personnel intrusion detection system in transformer substation video monitoring by using the dangerous area personnel intrusion detection method according to any one of claims 1-10, comprising a monitoring video deriving unit, a monitoring scene image multidimensional semantic segmentation unit, an image expansion unit and a personnel intrusion judging unit; the method is characterized in that: