CN116543333A - Target recognition method, training method, device, equipment and medium of power system - Google Patents

Target recognition method, training method, device, equipment and medium of power system Download PDF

Info

Publication number
CN116543333A
CN116543333A CN202310467935.XA CN202310467935A CN116543333A CN 116543333 A CN116543333 A CN 116543333A CN 202310467935 A CN202310467935 A CN 202310467935A CN 116543333 A CN116543333 A CN 116543333A
Authority
CN
China
Prior art keywords
target
target area
area image
video frame
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310467935.XA
Other languages
Chinese (zh)
Inventor
张云翔
高圣溥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Bureau Co Ltd
Original Assignee
Shenzhen Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Bureau Co Ltd filed Critical Shenzhen Power Supply Bureau Co Ltd
Priority to CN202310467935.XA priority Critical patent/CN116543333A/en
Publication of CN116543333A publication Critical patent/CN116543333A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The application relates to a target identification method, training method, device, equipment and medium of a power system. The method comprises the following steps: acquiring a video to be identified of a power system; identifying a target video frame of the video to be identified through a background model to obtain a target area image corresponding to the target video frame, wherein the background model is determined according to a plurality of data frames in the video to be identified, which are in front of the target video frame; and extracting and predicting the characteristics of the target area image to obtain the category of the target to be identified in the target video frame. By adopting the method, the calculated amount can be reduced so as to improve the identification speed of the video to be identified of the power system, and the balance of the speed and the accuracy of the identification of the foreign matters of the power system is realized.

Description

Target recognition method, training method, device, equipment and medium of power system
Technical Field
The application relates to the technical field of power grid operation and maintenance, in particular to a target identification method, a training device, equipment and a medium of a power system.
Background
The transformer substation is a centralized point of transmission and distribution in a power system, and most of equipment of the transformer substation is in an open air environment and is easily affected by external mobile foreign matters, so that a power line fails.
In order to ensure the normal operation of the power system, it is necessary to quickly identify foreign mobile foreign matters, so that the foreign matters in the power line can be effectively and timely found and clarified. The two-stage detection model is often applied to target detection of foreign mobile foreign matters in actual engineering due to high detection accuracy and high performance robustness. The two-stage detection model firstly extracts a large number of candidate areas through an image processing technology, and carries out classification detection on each candidate area so as to carry out target foreign matter identification.
However, when the two-stage detection model is used for identifying the foreign mobile foreign matters, a large number of redundant areas exist in the extracted candidate areas, which results in large calculation amount and low identification speed.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a target recognition method, training method, apparatus, device, and medium of a power system capable of rapidly recognizing foreign mobile foreign matters.
In a first aspect, the present application provides a method for target identification in an electrical power system. The method comprises the following steps:
acquiring a video to be identified of a power system;
identifying a target video frame of the video to be identified through a background model to obtain a target area image corresponding to the target video frame, wherein the background model is determined according to a plurality of data frames in the video to be identified, which are in front of the target video frame;
And extracting and predicting the characteristics of the target area image to obtain the category of the target to be identified in the target video frame.
In one embodiment, extracting features of the target area image and predicting to obtain a category of the foreign object to be identified in the target video frame includes:
performing depth convolution operation on the target area image to obtain a depth convolution feature map of the target area image;
performing point-by-point convolution operation on the depth convolution feature map of the target area image to obtain a point-by-point convolution feature map of the target area image;
and predicting the category of the foreign matters to be identified in the target video frame according to the point-by-point convolution characteristic diagram of the target area image.
In one embodiment, performing a depth convolution operation on the target area image to obtain a depth convolution feature map of the target area image, including:
performing channel generalization on the target area image to obtain a plurality of channel images corresponding to the target area image;
carrying out convolution operation on each channel image of the target area image by adopting a convolution kernel to obtain a feature image corresponding to each channel image;
and splicing the feature images to obtain the depth convolution feature image.
In one embodiment, performing a point-by-point convolution operation on a depth convolution feature map of a target area image to obtain a point-by-point convolution feature map of the target area image includes:
Performing convolution operation on the depth convolution feature images by adopting a preset number of convolution kernels to obtain a preset number of output feature images;
and obtaining point-by-point convolution characteristic diagrams according to the preset number of output characteristic diagrams.
In a second aspect, the present application provides a model training method, the method comprising:
acquiring a training sample, wherein the training sample comprises a video frame in a sample video, a foreign object position corresponding to the video frame and a foreign object category corresponding to the video frame;
carrying out foreign matter identification on a video frame in the sample video by using a foreign matter identification model to obtain a prediction result;
according to the prediction result, the foreign object position corresponding to the video frame and the foreign object category corresponding to the video frame, carrying out parameter optimization on the foreign object identification model;
wherein, foreign matter identification model is used for: determining a background model according to at least the previous frame of the video frame aiming at the video frame in the sample video, and acquiring a target area image comprising the foreign matters to be identified in the video frame through the background model; extracting and predicting the characteristics of the target area image to obtain the category of the foreign matters to be identified in the target area image; and determining the prediction result according to the position of the target area image in the video frame and the type of the foreign matter to be identified in the target area image.
In one embodiment, obtaining training samples includes:
acquiring a video frame in an initial sample video;
performing foreign matter position labeling and foreign matter category labeling on video frames in the initial sample video to obtain labeled video frames;
and generating the frame data expansion of the marked video by adopting the depth convolution to the countermeasure network, so as to obtain a training sample.
In a third aspect, the present application further provides an object recognition device of a power system. The device comprises:
the acquisition module is used for acquiring the video to be identified of the power system;
the extraction module is used for identifying target video frames of the video to be identified through a background model to obtain a target area image corresponding to the target video frames, and the background model is determined according to a plurality of data frames in the video to be identified, which are in front of the target video frames;
and the classification module is used for extracting and predicting the characteristics of the target area image to obtain the class of the target to be identified in the target video frame.
In a fourth aspect, the present application also provides a computer device. The computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the target recognition method of the power system provided in the first aspect of the application or the model training method provided in the second aspect.
In a fifth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the object recognition method of the power system provided in the first aspect of the present application or the model training method provided in the second aspect.
In a sixth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the object recognition method of the power system provided in the first aspect of the application or the model training method provided in the second aspect.
According to the target identification method, the training method, the device, the equipment, the medium and the product of the electric power system, the video to be identified of the electric power system is obtained, the target video frame of the video to be identified is identified through the background model, the target area image corresponding to the target video frame is obtained, the background model is determined according to a plurality of data frames before the target video frame in the video to be identified, after the target area image is obtained, the characteristic extraction and the prediction are carried out on the target area image, the category of the target to be identified in the target video frame is obtained, and the identification result of the target video frame is obtained according to the position information of the target area image in the target video frame and the category information of the target to be identified in the target video frame. Compared with the prior art that a plurality of pre-selected frames are generated, feature extraction and classification prediction are needed to be carried out on the pre-selected frames respectively, redundant frames in the pre-selected frames are further screened according to feature extraction and classification prediction results, for example, the current R-CNN (Region-based Convolutional Neural Networks, regional convolutional neural network) series target detection algorithm is used for generating the pre-selected frames based on RPN (Region Proposal Network, regional candidate network), a series of complex calculation and screening of the redundant frames can be brought, and the method for generating the target Region and the method for calculating are optimized based on the background model establishment and the foreground detection method.
Drawings
FIG. 1 is an application environment diagram of a target identification method of a power system in one embodiment;
FIG. 2 is a flow chart of a method of identifying an object of a power system in one embodiment;
FIG. 3 is a flow chart of a method for identifying an object of a power system according to another embodiment;
FIG. 4 is a flow chart of a model training method in another embodiment;
FIG. 5 is a block diagram of a target recognition device of a power system in one embodiment;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The target identification method of the power system provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The data storage system is used for storing videos to be identified of the power system. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, there is provided a target recognition method of a power system, which is described by taking an example that the method is applied to a server in fig. 1, and includes the following steps:
step 202, obtaining a video to be identified of a power system.
The video to be identified of the power system may refer to a video in a power system scene collected online, for example, a video of a power system inspection area collected through an unmanned aerial vehicle or a tower base camera, and the video to be identified of the power system may also refer to a video related to the power system collected offline, for example, a video related to the power system collected offline in a video dataset to be identified for testing.
Further, the video to be identified may be a video stream collected in real time, or the video to be identified may be a video stream synthesized manually.
Because the power line environment of the power system is complex, the invasion of foreign matters is easy to happen. The foreign matter in the embodiment of the application refers to an object not in the power system, in particular to a fast-moving foreign invasion foreign matter, such as an animal or a falling suspended object which is wrongly intruded in an open air environment of a transformer substation. Because even a small foreign matter intrudes into the inside of the apparatus easily causes serious safety accidents in an environment where the power system apparatus operates at a high speed. In order to ensure safe, reliable, efficient and stable operation of the power system, target identification needs to be carried out aiming at a necessary area so as to timely discover invasion foreign matters with potential safety hazards, and the embodiment of the application firstly acquires videos to be identified in the power system.
The method comprises the steps that an example is that a substation camera shoots a video to be identified of a substation inspection area, the video to be identified is uploaded to a server, and after the video to be identified is obtained, the server performs target identification on the video to be identified.
The video to be identified of the power system is obtained from the public data set, and the video to be identified of the power system can be used for testing the target identification method of the power system.
Step 204, identifying the target video frame of the video to be identified through the background model, and obtaining a target area image corresponding to the target video frame.
The target video frame may be any frame of video frame in the video to be identified, or the target video frame may also be a key frame in the video to be identified.
Further, the background model is determined from a number of data frames preceding the target video frame in the video to be identified.
In one implementation manner, for a stable monitored scene, in the case that there is no moving object or no change in illumination, the gray value of each pixel point in the video image generally conforms to a random probability distribution. Due to the complexity and unpredictability of the real scene and the existence of various environmental interferences and noises, such as abrupt changes of illumination, fluctuation of some objects in the actual background image, jitter of a camera and the like, a background model, namely background reconstruction, can be constructed by combining a plurality of data frames before the initial background image and the target video frame, so that the background image is updated to obtain a target background image, and the target background image can be differentiated from the target video frame to obtain a more accurate target position.
In one implementation, the background model may be built using a gaussian mixture modeling method, a random background update method (visual background extractor, vibe), and so on.
In one implementation, erosion and dilation may be employed for the established background model to eliminate flash points to update the background model.
The video to be identified comprises continuous video frames, when foreign matters appear, namely, a moving object to be identified can cause pixel blocks in a certain area of the video frames to change, and under the condition that the foreign matters continuously move, the changed pixel blocks can also continuously move, so that after a background model is determined, if some pixel blocks are mutated and regularly move, the object to be identified is regarded as the object to be identified, a target area where the object to be identified is located can be obtained through a foreground detection method, the target area can be a target frame comprising the object to be identified, and the target area represents position identification information of the object to be identified.
Illustratively, after the background model is established, a background image is acquired in real time through the background model, a target video frame and the background image are subtracted, and an area, in which the pixel difference between the target video frame and the background image exceeds a preset threshold, is determined as a target area, so that the position information of the target is determined. By means of the foreground detection method of background model establishment and background difference with the target video frame, an optimal target frame corresponding to one target is extracted and used as the position identification information of the target in the target video frame.
In one implementation, a foreign object detection network (Foreign Objects Detection Network for Power Substation, FODN4 PS) may be employed to identify objects in a video to be identified in a power system, and the algorithm is a simple and efficient object detection method, the foreign object detection network including a candidate object network and an object classification network. The candidate target network is determined based on the front background segmentation model and the rear background segmentation model, and the background model can be established by adopting the front three frames of images of the target video frame. The candidate target network extracts target areas with foreign objects, each object only generates one target area, and the target area images are sent to the target classification network for feature extraction.
And 206, extracting and predicting the characteristics of the target area image to obtain the category of the target to be identified in the target video frame.
After the position information of the target is acquired, namely after the target area is acquired, the feature extraction is carried out on the target area image, the classification prediction is carried out on the features of the extracted target area image, and the category information of the target in the target area image is identified.
In the target recognition method of the electric power system, the video to be recognized of the electric power system is obtained, the target video frame of the video to be recognized is recognized through the background model, the target area image corresponding to the target video frame is obtained, the background model is determined according to a plurality of data frames before the target video frame in the video to be recognized, after the target area image is obtained, the characteristic extraction and the prediction are carried out on the target area image, the category of the target to be recognized in the target video frame is obtained, and the recognition result of the target video frame is obtained according to the position information of the target area image in the target video frame and the category information of the target to be recognized in the target video frame. Compared with the prior art that a plurality of pre-selected frames are generated, feature extraction and classification prediction are needed to be carried out on the pre-selected frames respectively, redundant frames in the pre-selected frames are further screened according to feature extraction and classification prediction results, for example, the current R-CNN (Region-based Convolutional Neural Networks, regional convolutional neural network) series target detection algorithm is used for generating the pre-selected frames based on RPN (Region Proposal Network, regional candidate network), a series of complex calculation and screening of the redundant frames can be brought, and the method for generating the target Region and the method for calculating are optimized based on the background model establishment and the foreground detection method.
In one embodiment, extracting features of the target area image and predicting, and obtaining the category of the foreign object to be identified in the target video frame includes:
and A1, performing a depth convolution operation on the target area image to obtain a depth convolution feature map of the target area image.
And step A2, performing point-by-point convolution operation on the depth convolution feature map of the target area image to obtain the point-by-point convolution feature map of the target area image.
And step A3, predicting the category of the foreign matters to be identified in the target video frame according to the point-by-point convolution characteristic diagram of the target area image.
Among them, the embodiment of the application adopts depth separable convolution (depthwise separable convolution, DSC) to perform feature extraction on the target area image. Depth separable convolutions include depth convolutions (Depthwise Convolution, DW) and point-wise convolutions (Pointwise Convolution, PW).
In conventional R-CNN based detectors, target detection is performed from a high resolution low level feature map in a standard convolution manner. However, since the underlying features are fused, although the acquisition process of the image features is enhanced, an additional calculation amount is inevitably brought. In order to reduce the calculated amount and improve the training speed, the embodiment of the application lightens the model through the depth separable convolution, greatly reduces the time for target identification and reduces the calculation pressure of hardware. The standard convolution process is decomposed into two links of depth convolution and point-by-point convolution to be sequentially convolved, and firstly, the target area image is subjected to depth convolution operation to obtain a depth convolution characteristic diagram of the target area image. And performing point-by-point convolution operation on the depth convolution feature map of the target area image to obtain the point-by-point convolution feature map of the target area image. And predicting the category of the foreign matters to be identified in the target video frame according to the point-by-point convolution characteristic diagram of the target area image.
In this embodiment, a depth convolution operation is performed on a target area image to obtain a depth convolution feature map of the target area image, a point-by-point convolution operation is performed on the depth convolution feature map of the target area image to obtain a point-by-point convolution feature map of the target area image, and according to the point-by-point convolution feature map of the target area image, the type of foreign matters to be identified in a target video frame is predicted, so that the calculated amount can be reduced, the feature extraction speed of the target area image can be improved, and target identification can be performed rapidly.
In one embodiment, performing a depth convolution operation on the target area image to obtain a depth convolution feature map of the target area image, including:
and B1, performing channel generalization on the target area image to obtain a plurality of channel images corresponding to the target area image.
Illustratively, after the target area image is acquired, three-channel decomposition is performed on the target area image, and then the target area image is generalized into a plurality of channels.
And B2, carrying out convolution operation on each channel image of the target area image by adopting a convolution kernel, and obtaining a feature map corresponding to each channel image.
And step B3, splicing the feature images to obtain a depth convolution feature image.
Wherein, since the number of output channels of the convolution operation is equal to the number of convolution kernels, the depth convolution operation uses one convolution kernel for each channel, and thus the number of output channels of a single channel after the convolution operation is also 1.
Illustratively, a convolution kernel is used for each channel of the input target area image, and the outputs of all convolution kernels are spliced to obtain the final output of the convolution kernels.
For example, the number of channels of the target area image is set to be N, after a convolution kernel is used for each of the N channels, a feature map with N channels being 1 is obtained, and then the N feature maps are spliced in sequence to obtain a depth convolution feature map with N channels.
In this embodiment, through channel generalization on the target area image, a plurality of channel images corresponding to the target area image are obtained, a convolution kernel is adopted for each channel image of the target area image to perform convolution operation, a feature map corresponding to each channel image is obtained, and each feature map is spliced to obtain a deep convolution feature map.
In one embodiment, performing a point-by-point convolution operation on a depth convolution feature map of a target area image to obtain a point-by-point convolution feature map of the target area image, including:
and C1, performing convolution operation on the depth convolution feature images by adopting a preset number of convolution kernels to obtain a preset number of output feature images.
And C2, obtaining a point-by-point convolution characteristic map according to the output characteristic maps of the preset number.
The number of output channels can be changed through the preset number of convolution kernels on one hand, and channel fusion is carried out on the depth convolution feature map output by the depth convolution on the other hand.
In one implementation, a preset number of convolution check depth convolution feature maps with the size of 1×1×n are adopted to perform a normal standard convolution operation, and N is the number of channels of the depth convolution feature map. And respectively carrying out convolution on N channels of the depth convolution feature images by utilizing each convolution check, and carrying out weighted combination in the depth direction to obtain a preset number of output feature images, wherein the preset number of output feature images are taken as the preset number of channel images, so as to obtain the point-by-point convolution feature images.
In this embodiment, the convolution operation is performed on the depth convolution feature images by using a preset number of convolution kernels, so as to obtain a preset number of output feature images, and according to the preset number of output feature images, a point-by-point convolution feature image is obtained, so that each channel feature of the depth convolution feature images can be fused, and the number of output channels can be changed.
In one embodiment, as shown in fig. 3, there is provided a target recognition method of a power system, including:
and step 302, acquiring videos to be identified in the transformer substation through a transformer substation camera.
Step 304, selecting a key frame in the video to be identified as a target video frame.
And 306, establishing a background model and foreground detection according to the first 3 frames of the target video frame by adopting a vibe algorithm to obtain a target area image of the target video frame.
And 308, performing a depth convolution operation on the target area image to obtain a depth convolution feature map of the target area image.
Exemplary, after the target area image is channel-generalized, the input feature map size corresponding to the target area image is obtained to be D i ×D i X S, where D i And normalizing the target area image into a square side length, wherein S is the channel number of the input feature map. The convolution operation is carried out on each channel by using a convolution kernel, and the convolution kernel has the size Dw multiplied by Dw, so that the calculated amount V of the depth convolution operation is calculated 1 Is S x D i 2 ×D w 2
And 310, performing point-by-point convolution operation on the depth convolution feature map of the target area image to obtain the point-by-point convolution feature map of the target area image.
Illustratively, if the number of channels of the depth convolution feature map is S, performing point-by-point convolution by using T convolution check depth convolution feature maps with a size of 1×1×s to obtain T output feature maps, then the calculated amount V of the point-by-point convolution operation 2 Is S x T x D i 2
Compared with the common standard convolution method, the convolution calculation amount reduction ratio G in the embodiment of the application isSubstituted into V 1 And V 2 The latter is->
Step 312, predicting the category of the foreign object to be identified in the target video frame by the classifier according to the point-by-point convolution feature map of the target area image.
And step 314, combining the position information of the target area image in the target video frame and the category information of the foreign matters to be identified to obtain the identification result of the target video frame.
In the embodiment, a key frame in a video to be identified is selected as a target video frame by acquiring the video to be identified of a power system, a video frame is adopted, a background model is built according to the first 3 frames of the target video frame and a foreground is detected to obtain a target area image of the target video frame, a depth convolution operation is carried out on the target area image to obtain a depth convolution feature image of the target area image, a point-by-point convolution operation is carried out on the depth convolution feature image of the target area image to obtain a point-by-point convolution feature image of the target area image, the category of foreign matters to be identified in the target video frame is predicted through a classifier according to the point-by-point convolution feature image of the target area image, the identification result of the target video frame is obtained according to the position information of the target area image in the target video frame and the category information of the target to be identified in the target video frame, the generation mode and the calculation mode of the target area can be optimized, the feature extraction is carried out through depth separable convolution, the calculated amount is reduced, and the balance of the detection speed and the accuracy of the moving foreign matters is improved.
Based on the same inventive concept, the embodiment of the application also provides a model training method. The implementation of the solution to the problem provided by the model training method is similar to that described in the above method, so the specific limitation in one or more embodiments of the model training method provided below may be referred to the limitation of the target recognition method for the electric power system hereinabove, and will not be repeated here.
In one embodiment, as shown in FIG. 4, a model training method is provided, comprising:
step 402, a training sample is obtained.
The training samples comprise video frames in the sample video, foreign object positions corresponding to the video frames and foreign object categories corresponding to the video frames.
And step 404, performing target recognition on the video frames in the sample video by using the foreign matter recognition model to obtain a prediction result.
The foreign matter identification model is used for determining a background model according to at least one previous frame of a video frame aiming at the video frame in the sample video, acquiring a target area image including the foreign matters to be identified in the video frame through the background model, extracting features of the target area image, predicting the target area image to obtain the category of the foreign matters to be identified in the target area image, and determining the prediction result according to the position of the target area image in the video frame and the category of the foreign matters to be identified in the target area image.
Illustratively, the foreign object identification model may employ a foreign object detection network (Foreign Objects Detection Network for Power Substation, FODN4 PS) that includes a candidate target network and a target classification network. The candidate target network is used for extracting target area images of video frames in the sample video, is determined based on the front background segmentation model and the rear background segmentation model, and can be used for establishing a background model by adopting front 3 frame images of the target video frames. The candidate target network extracts target areas with foreign objects, each object only generates one target area, and the target area images are sent to the target classification network for feature extraction. The target classification network is used for carrying out feature extraction and classification recognition on the target area image to obtain a prediction result, and the target area image is subjected to feature extraction through depth separable convolution. The target classification network in the foreign object identification model of the embodiment of the application comprises a main classifier and an auxiliary classifier, the extracted features are subjected to auxiliary classification through the auxiliary classifier, a loss function of the auxiliary classification is added into a total loss function according to a weight lambda, and the lambda can take a value of 0.3.
And step 406, performing parameter optimization on the foreign object identification model according to the prediction result, the foreign object position corresponding to the video frame and the foreign object category corresponding to the video frame.
The foreign object position corresponding to the video frame and the foreign object class corresponding to the video frame refer to the real label of the training sample.
Illustratively, a loss function is calculated based on the prediction result and the real label, and the foreign object identification model is subjected to parameter optimization.
During training, the loss function may use the following formula:
wherein Lcls is the class of the target area predicted by the main classifier, lclsa is the class of the target area predicted by the auxiliary classifier, pi is the probability of target prediction, pi represents whether it is a real target, 1 represents a real target, 0 represents a false target, and λ is a weight.
The FODN4PS can obtain the category and position information of the foreign matters through the candidate target network and the target classification network, and then map the obtained category and position information onto the original image. Compared with a complex two-stage target detection algorithm, the FODN4PS processed target area has higher precision, and the FODN4PS has higher precision after the training of a target classification network.
According to the method, the generation mode and the calculation mode of the target area are optimized in the training process, the depth separable convolution is adopted for feature extraction, the calculated amount is reduced, the training speed is improved, and therefore the balance of the detection speed and the accuracy of the moving foreign matters is achieved.
In one embodiment, obtaining a training sample includes:
and D1, acquiring a video frame in the initial sample video.
And D2, labeling the foreign object positions and the foreign object categories of the video frames in the initial sample video to obtain labeled video frames.
Illustratively, the location of the foreign object is labeled (x, y, w, h), where (x, y) represents the center point coordinates of the foreign object region, w represents the width of the foreign object region, h represents the height of the foreign object region, and its category is labeled L, representing an intruding foreign object.
And D3, generating an countermeasure network by adopting depth convolution to expand the marked video frame data, and obtaining a training sample.
In one implementation, the labeling video frames can be subjected to fuzzy processing in consideration of the fact that the weather conditions increase the recognition difficulty, so that the model is more friendly to severe weather conditions.
In one implementation, the annotation video frame data is expanded with at least one of flipping, color space transformation, cropping, rotation, sharpening, blurring, and blending.
In one implementation, the number of samples may also be increased using a deep convolutional generation challenge network (DCGAN) to improve recognition accuracy relative to data expansion with conventional image processing.
In this embodiment, by acquiring a video frame in an initial sample video, performing foreign object position labeling and foreign object category labeling on the video frame in the initial sample video to obtain a labeled video frame, and generating an countermeasure network to expand the labeled video frame data by adopting depth convolution to obtain a training sample, so as to improve recognition accuracy and training effect.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an object recognition device of the electric power system for realizing the object recognition method of the electric power system. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the object recognition device for one or more electric power systems provided below may refer to the limitation of the object recognition method for an electric power system hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 5, there is provided an object recognition apparatus of a power system, including: an acquisition module 502, an extraction module 504, and a classification module 506, wherein:
the acquiring module 502 is configured to acquire a video to be identified of the power system.
The extracting module 504 is configured to identify a target video frame of the video to be identified through a background model, obtain a target area image corresponding to the target video frame, and determine the background model according to a plurality of data frames before the target video frame in the video to be identified.
And the classification module 506 is used for extracting and predicting the characteristics of the target area image to obtain the class of the target to be identified in the target video frame.
In one embodiment, the extracting module 504, when performing feature extraction and prediction on the target area image to obtain the category of the foreign object to be identified in the target video frame, is further configured to: performing depth convolution operation on the target area image to obtain a depth convolution feature map of the target area image; performing point-by-point convolution operation on the depth convolution feature map of the target area image to obtain a point-by-point convolution feature map of the target area image; and predicting the category of the foreign matters to be identified in the target video frame according to the point-by-point convolution characteristic diagram of the target area image.
In one embodiment, the extracting module 504, when performing a depth convolution operation on the target area image, is further configured to: performing channel generalization on the target area image to obtain a plurality of channel images corresponding to the target area image; carrying out convolution operation on each channel image of the target area image by adopting a convolution kernel to obtain a feature image corresponding to each channel image; and splicing the feature images to obtain the depth convolution feature image.
In one embodiment, the extracting module 504, when performing a point-by-point convolution operation on the depth convolution feature map of the target area image, is further configured to: performing convolution operation on the depth convolution feature images by adopting a preset number of convolution kernels to obtain a preset number of output feature images; and obtaining point-by-point convolution characteristic diagrams according to the preset number of output characteristic diagrams.
The respective modules in the target recognition device of the above-described power system may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing power system data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for target recognition or model training of a power system.
It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method of target identification for an electrical power system, the method comprising:
acquiring a video to be identified of a power system;
identifying a target video frame of the video to be identified through a background model to obtain a target area image corresponding to the target video frame, wherein the background model is determined according to a plurality of data frames in the video to be identified, which are in front of the target video frame;
And extracting and predicting the characteristics of the target area image to obtain the category of the target to be identified in the target video frame.
2. The method according to claim 1, wherein the performing feature extraction and prediction on the target area image to obtain a category of the foreign object to be identified in the target video frame includes:
performing depth convolution operation on the target area image to obtain a depth convolution feature map of the target area image;
performing point-by-point convolution operation on the depth convolution feature map of the target area image to obtain a point-by-point convolution feature map of the target area image;
and predicting the category of the foreign matters to be identified in the target video frame according to the point-by-point convolution characteristic diagram of the target area image.
3. The method according to claim 2, wherein performing a depth convolution operation on the target area image to obtain a depth convolution feature map of the target area image includes:
performing channel generalization on the target area image to obtain a plurality of channel images corresponding to the target area image;
carrying out convolution operation on each channel image of the target area image by adopting a convolution kernel to obtain a feature map corresponding to each channel image;
And splicing the feature images to obtain a depth convolution feature image.
4. The method according to claim 2, wherein performing a point-by-point convolution operation on the depth convolution feature map of the target area image to obtain a point-by-point convolution feature map of the target area image includes:
performing convolution operation on the depth convolution feature images by adopting a preset number of convolution kernels to obtain the preset number of output feature images;
and obtaining the point-by-point convolution characteristic map according to the output characteristic map with the preset number.
5. A method of model training, the method comprising:
acquiring a training sample, wherein the training sample comprises a video frame in a sample video, a foreign object position corresponding to the video frame and a foreign object category corresponding to the video frame;
carrying out foreign matter identification on a video frame in the sample video by using a foreign matter identification model to obtain a prediction result;
according to the prediction result, the foreign object position corresponding to the video frame and the foreign object category corresponding to the video frame, performing parameter optimization on the foreign object identification model;
wherein the foreign matter identification model is for: determining a background model according to at least one previous frame of the video frame aiming at the video frame in the sample video, and acquiring a target area image comprising a foreign object to be identified in the video frame through the background model; extracting and predicting the characteristics of the target area image to obtain the category of the foreign matters to be identified in the target area image; and determining the prediction result according to the position of the target area image in the video frame and the type of the foreign matter to be identified in the target area image.
6. The method of claim 5, wherein the obtaining training samples comprises:
acquiring a video frame in an initial sample video;
performing foreign matter position labeling and foreign matter category labeling on the video frames in the initial sample video to obtain labeled video frames;
and generating an countermeasure network by adopting depth convolution to expand the marked video frame data so as to obtain the training sample.
7. An object recognition apparatus of an electric power system, the apparatus comprising:
the acquisition module is used for acquiring the video to be identified of the power system;
the extraction module is used for identifying the target video frame of the video to be identified through a background model to obtain a target area image corresponding to the target video frame, and the background model is determined according to a plurality of data frames in the video to be identified, which are in front of the target video frame;
and the classification module is used for extracting and predicting the characteristics of the target area image to obtain the class of the target to be identified in the target video frame.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the power system foreign object identification method of any one of claims 1 to 4 or the model training method of any one of claims 5 to 6.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the power system foreign matter identification method of any one of claims 1 to 4 or the model training method of any one of claims 5 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the power system foreign object identification method of any one of claims 1 to 4 or the model training method of any one of claims 5 to 6.
CN202310467935.XA 2023-04-19 2023-04-19 Target recognition method, training method, device, equipment and medium of power system Pending CN116543333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310467935.XA CN116543333A (en) 2023-04-19 2023-04-19 Target recognition method, training method, device, equipment and medium of power system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310467935.XA CN116543333A (en) 2023-04-19 2023-04-19 Target recognition method, training method, device, equipment and medium of power system

Publications (1)

Publication Number Publication Date
CN116543333A true CN116543333A (en) 2023-08-04

Family

ID=87446302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310467935.XA Pending CN116543333A (en) 2023-04-19 2023-04-19 Target recognition method, training method, device, equipment and medium of power system

Country Status (1)

Country Link
CN (1) CN116543333A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117557108A (en) * 2024-01-10 2024-02-13 中国南方电网有限责任公司超高压输电公司电力科研院 Training method and device for intelligent identification model of power operation risk

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117557108A (en) * 2024-01-10 2024-02-13 中国南方电网有限责任公司超高压输电公司电力科研院 Training method and device for intelligent identification model of power operation risk

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
US10943145B2 (en) Image processing methods and apparatus, and electronic devices
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN106682697B (en) End-to-end object detection method based on convolutional neural network
US20180114071A1 (en) Method for analysing media content
CN113936256A (en) Image target detection method, device, equipment and storage medium
CN111667001B (en) Target re-identification method, device, computer equipment and storage medium
CN111738054B (en) Behavior anomaly detection method based on space-time self-encoder network and space-time CNN
CN110826429A (en) Scenic spot video-based method and system for automatically monitoring travel emergency
CN113850242B (en) Storage abnormal target detection method and system based on deep learning algorithm
CN110176024B (en) Method, device, equipment and storage medium for detecting target in video
CN112052837A (en) Target detection method and device based on artificial intelligence
CN107563299B (en) Pedestrian detection method using RecNN to fuse context information
CN110852222A (en) Campus corridor scene intelligent monitoring method based on target detection
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN113239914B (en) Classroom student expression recognition and classroom state evaluation method and device
CN111291826A (en) Multi-source remote sensing image pixel-by-pixel classification method based on correlation fusion network
CN111768415A (en) Image instance segmentation method without quantization pooling
Jiang et al. A self-attention network for smoke detection
CN112528974A (en) Distance measuring method and device, electronic equipment and readable storage medium
CN110942456B (en) Tamper image detection method, device, equipment and storage medium
CN116543333A (en) Target recognition method, training method, device, equipment and medium of power system
CN116453056A (en) Target detection model construction method and transformer substation foreign matter intrusion detection method
CN111652181B (en) Target tracking method and device and electronic equipment
CN113012107A (en) Power grid defect detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination