CN117746100A

CN117746100A - Few-sample power system defect detection method based on prompt and contrast learning

Info

Publication number: CN117746100A
Application number: CN202311558555.3A
Authority: CN
Inventors: 邱臻; 韩翊; 聂峥; 马得国; 项文波; 王海军; 陈亮; 金锖; 叶锴; 戴瑞金; 郑乐
Original assignee: Zhejiang Huayun Information Technology Co Ltd
Current assignee: Zhejiang Huayun Information Technology Co Ltd
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-03-22

Abstract

The invention discloses a few-sample electric power system defect detection method based on prompt and contrast learning, which comprises the steps of firstly establishing a defect data set, constructing a deep neural network model, training the deep neural network model based on the prompt learning and contrast learning method, then carrying out defect detection on an electric power system by utilizing the trained deep neural network model, aiming at the problem that the deep detection network is difficult to train under the condition that the electric power system has few defect samples and the samples are difficult to obtain, reducing the training difficulty of the defect detection model by the prompt and contrast learning method of a graphic large model, enabling the model to learn from limited samples more effectively, improving the detection precision of a defect detector, and providing an efficient solution for the few-sample defect detection in the electric power system.

Description

Few-sample power system defect detection method based on prompt and contrast learning

Technical Field

The invention relates to the technical field of defect detection of power systems, in particular to a few-sample power system defect detection method based on prompt and contrast learning.

Background

The electric power plays an indispensable key role in the fields of economic development, civil improvement, engineering construction and the like, and has important significance in guaranteeing the stable operation of an electric power system. With the continuous progress and development of the scientific and technical level, more and more electric power companies begin to detect defects of an electric power system through an image recognition mode. Traditional image recognition algorithms rely on manual extraction of image features based on a priori knowledge and task understanding. However, the conventional algorithm has the problems of insufficient capability and weak expansibility in the aspect of image feature extraction, deep features cannot be extracted well, and a large amount of manual design is needed.

Chinese patent, publication No.: CN 113239994A, publication date: 2021, 8 and 10 days, discloses a grid defect detection method, a device, a storage medium and electronic equipment based on a YOLOv4-tiny algorithm, wherein the method is based on a hot-restart random gradient descent SGDR algorithm, a sample set is used for training an improved YOLOv4-tiny network structure to obtain a target detection network, and a grid defect detection result is obtained by detecting a grid image to be verified based on the target detection network.

Disclosure of Invention

The invention aims to provide a few-sample electric power system defect detection method based on prompt and contrast learning, which solves the problem that a deep detection network is difficult to train under the condition that the electric power system defect sample is few and the sample is difficult to obtain in the prior art. The invention provides an efficient solution for detecting few sample defects in the power system.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a few sample power system defect detection method based on prompt and contrast learning comprises the following steps:

s1, establishing a defect data set, wherein the defect data set comprises an industrial defect detection data set, a power system data set and a prompt set;

s2, constructing a deep neural network model, and training the deep neural network model based on a prompt learning and contrast learning method;

and S3, performing defect detection on the power system by using the trained deep neural network model.

In the scheme, the defect detection method of the few-sample power system based on prompt and contrast learning is provided, in the few-sample learning, the traditional deep learning method is difficult to learn enough characteristic representation due to insufficient training samples, the prompt learning can improve the understanding and utilization efficiency of the model to information in a small number of samples by integrating professional knowledge or context information into the training process, the contrast learning can strengthen the capability of the model in extracting key characteristics by learning and particularly has very effective characteristic extraction of visual images, so that the model can be trained more effectively under the condition of insufficient data by training the model based on the prompt learning and contrast learning method, the training difficulty of the defect detection model is reduced, and the detection precision of a defect detector is improved.

Preferably, the industrial defect detection data set in S1 is a non-power related data set, and the specific step of establishing the power system data set includes:

s11, acquiring an electric power system image, wherein the electric power system image comprises a defect-free image and an image containing various defects;

s12, marking the image of the power system to obtain a data set of the power system;

the prompt set comprises a prompt template, and the specific steps for establishing the prompt template comprise:

generating corresponding states according to labels of the universal industrial defect detection data set and the power system data set, whether defects exist in the pictures or not and the types of the defects;

and generating a prompt template according to the object size in the picture and the picture light condition and combining the corresponding states. The industrial defect detection data sets comprise images in various industrial or general environments and are not specific to facilities or equipment of a power system, the data sets are established to provide wider samples, help models learn to identify defects and scenes of different types, enhance generalization capability of the defects and scenes, and the accuracy of model training and the final detection result of the models are dependent on the accuracy of image labeling, and the quality and the accuracy of the data sets of the power system are ensured by acquiring and labeling the images of the power system, so that the models successfully complete training and the accuracy and the reliability of the final detection result are ensured.

Preferably, the deep neural network model in S2 includes:

text coding network of graphic large model for converting prompt text into feature vector F _T ；

The image-text matching sub-network is used for calculating the matching degree score between the prompt text feature vector and the image to be detected;

and the defect segmentation sub-network is used for outputting a pixel-level defect segmentation image based on the prompt text feature vector and the image to be detected. The text coding network of the graphic large model converts natural language text (such as description of defects) into high-dimensional feature vectors based on a pre-trained large language model (such as BERT, GPT and the like), so that text information is understood by a computer in a mathematical form, a basis is provided for subsequent graphic matching, the graphic matching sub-network calculates matching degree scores between image feature vectors and text feature vectors, the high matching indicates that an image possibly contains specific defects described by the text, and the defect segmentation sub-network classifies each pixel in the image by utilizing the result of previous graphic matching, judges whether the pixel belongs to a defect area and accurately locates the specific position of the defect.

Preferably, the text coding network of the graphic large model is a pre-training model. The pre-trained text coding network is used as a part of models, such as BERT, GPT and the like, which are pre-trained on a large-scale data set to learn the extensive characteristics of language, and then can be fine-tuned on specific tasks to realize specific application so as to accelerate the model training process and improve the understanding capability of the models on text information, thereby improving the accuracy of defect detection.

Preferably, the construction process of the graph matching sub-network further comprises:

s21, constructing an image encoder network for extracting a feature map from an image to be detected;

s22, converting the feature map into a feature vector F with the same dimension as the text feature vector through a full connection layer _I ；

S23, outputting a predicted image-text matching score through cosine similarity, wherein the formula is as follows:

score＝Cos(F _I ,F _T ). By constructing the image-text matching sub-network and calculating the matching degree score between the image feature vector and the text feature vector by using cosine similarity, the high matching indicates that the image possibly contains specific defects described by the text, so that the correlation between the prompt text and the image is effectively judged, and the accuracy of defect detection is improved.

Preferably, the image encoder network is a convolutional neural network-based or a transform network-based feature extraction network. The use of the image encoder network enhances the extraction capability of the model to the image features of the power system, and whether the model is based on a convolutional neural network or a transducer network, the image encoder network can extract rich image features so as to prepare for subsequent defect segmentation.

Preferably, the defect segmentation sub-network further comprises:

an image encoder network;

and the image decoder network is used for outputting a pixel-level defect segmentation image by combining the feature map after image encoding and the prompt text feature vector.

Preferably, the image decoder network is composed of a residual module and an upsampling module in a ResNet network structure. The upsampling module is used to increase the spatial dimensions (e.g., width and height) of the data, for example, in an image segmentation task, a residual module can be used to construct a deep feature extractor, and then the spatial resolution of the image is gradually restored by the upsampling module to generate a fine segmentation map, which combination allows the network to effectively restore details of the image while preserving depth and complexity to accurately perform pixel-level defect segmentation

Preferably, the training step of the deep neural network model in S2 is as follows:

A. training an image-text matching sub-network:

a1, freezing network parameters of a text encoder;

a2, training an image encoder network and a fully connected network by using contrast learning:

a21, adopting the contrast loss as a loss function of training:

wherein L is _i Is the contrast loss of image features relative to the prompt text, L _t Is the contrast loss of the prompt text relative to the image features, which is defined as:

where N is the total number of image pixels,for the feature vector of the ith image, +.>For the feature vector of the i-th text,for the feature vector of the j-th image, +.>For the feature vector of the j text, τ is a parameter with learning ability;

a22, training adopts an Adam optimizer:

when the error calculated by training reaches an expected value, finishing training to obtain the optimal parameters of the network model;

B. defect segmentation sub-network training:

b1, freezing image encoder network and text encoder network parameters;

b2, training the image decoder network:

b21, adopting the sum of Focal loss and Dice loss as a loss function during training:

Loss＝FocalLoss+DiceLoss

wherein, focalLoss is:

wherein, diceLoss is:

wherein alpha is a balance factor for balancing the imbalance of positive and negative samples, and the value range is [0,1]Gamma is a weight factor, so that the network pays attention to complex and error-prone samples, N is the total number of image pixels, and p _i Is the value of the ith pixel of the model prediction, g _i Is the value of the ith pixel of the real label, e is a constant;

b22, training adopts an Adam optimizer:

and when the calculated error reaches the expected value, finishing training to obtain the optimal parameters of the network model. In general, in the training process of the deep neural network model, a text encoder (such as BERT or GPT) is pre-trained on a large-scale data set, so that its parameters can well represent language information, freezing these parameters can reduce the calculation amount during training and prevent overfitting to a specific data set, while a contrast learning method is adopted to train the model to distinguish matching and unmatched graphics, enhance the capability of the model in terms of correlating image content with text description, contrast loss function helps train the model to make similar graphics and graphics more similar in feature space, dissimilar pairs are more distant, adam optimizer is a widely used optimization algorithm, which combines the features of momentum and adaptive learning rate, helps converge more quickly, through the prompting of the graphics and contrast learning method of the large model, and the strategy of a specific loss function and optimizer and freezing specific network parameters, so as to reduce the training difficulty of the defect detection model, enhance the capability of the model to defects, and improve the training efficiency and detection efficiency.

Preferably, the step S3 includes the following substeps:

s31, inputting a power image to be detected and a prompt text into a trained deep neural network model to obtain a matching score and an image segmentation map;

s32, sorting the obtained matching scores, and selecting the highest score as a defect prediction result;

s33, if the corresponding prompt text state of the obtained prediction result is defect-free, the segmentation result is not output; if the corresponding prompt text state of the obtained prediction result is defective, selecting a segmentation threshold value as k, and if the value of at least one point in the segmentation map is greater than k, judging the point as a defective point, and enabling the point to be 1; if all points in the segmentation map are smaller than k, the points are set to be 0, and finally a binary defect segmentation mask map is obtained. By setting the segmentation threshold and utilizing the prompt text state, the method can quickly and accurately identify and locate the defects of the power system in practical application and generate a defect segmentation mask diagram which is easy to understand.

The invention has the beneficial effects that:

1. according to the invention, a defect detection model training method in the prior art is improved, and in view of the fact that a traditional deep learning method is difficult to learn enough characteristic representation under the condition of insufficient training samples, prompt learning is realized by integrating professional knowledge or context information into the training process so as to improve the understanding and utilization efficiency of the model to information in a small amount of samples, and the capability of the model in extracting key characteristics is enhanced by comparing learning with different samples through learning, so that the purpose of more effectively training the model under the condition of insufficient data is realized by training the model based on prompt learning and comparison learning methods, the training difficulty of the defect detection model is reduced, the detection precision of a defect detector is improved, and meanwhile, the model is more flexible to defect detection and can adapt to different conditions.

2. The invention adopts the combination of the state and the template to generate the text prompt to guide the defect detection, can provide richer and specific information, helps the deep neural network to more accurately understand and process the specific defect detection task, and can generate a new text prompt by simply updating the combination of the state and the template along with the appearance of a new equipment type or a new working environment without carrying out great adjustment or retraining on the whole deep learning model, thereby improving the expansibility of the system.

The foregoing summary is merely an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more fully understood, and in order that the same or additional objects, features and advantages of the present invention may be more fully understood.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures.

FIG. 1 is a flow chart of a method for detecting defects of a power system with few samples based on prompt and contrast learning in an embodiment.

FIG. 2 is a schematic diagram of a deep neural network model architecture in one embodiment.

FIG. 3 is a schematic diagram of image feature and text feature hint learning in one embodiment.

FIG. 4 is a schematic diagram of image data output of a defect segmentation sub-network in one embodiment

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and examples, it being understood that the detailed description herein is merely a preferred embodiment of the present invention, which is intended to illustrate the present invention, and not to limit the scope of the invention, as all other embodiments obtained by those skilled in the art without making any inventive effort fall within the scope of the present invention.

Before discussing the exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts operations (or steps) as a sequential process, many of the operations (or steps) can be performed in parallel, concurrently, or at the same time. Furthermore, the order of the operations may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures; the processes may correspond to methods, functions, procedures, subroutines, and the like.

Examples: as shown in FIG. 1, a method for detecting defects of a power system with few samples based on prompt and contrast learning comprises the following steps S1-S3, wherein:

s1, establishing a defect data set, including an industrial defect detection data set, an electric power system data set and a prompt set, wherein the industrial defect detection data set is an open source MVTec AD data set, and the data set is a non-electric power related data set, namely facilities or equipment which are not specific to an electric power system, so as to provide a wider sample, help model learning to identify different types of defects and scenes, and the specific steps for establishing the electric power system data set include:

s12, labeling the power system image to obtain a power system data set, wherein the specific labeling process comprises the following steps:

s121, marking whether the image is defective;

s122, respectively carrying out pixel-level labeling and defect type labeling on the defective image to obtain a defect area mask map; the prompt set comprises a prompt template, and the specific steps for establishing the prompt template comprise:

generating corresponding states according to the labels of the universal industrial defect detection data set and the electric power system data set and whether the pictures are defective or not and the types of the defects, and more specifically, the state prompt texts comprise state words such as 'no damage', 'no defect', 'damage', 'defective', and the like according to the description of specific tasks;

the prompt template is generated according to the object size in the picture, the light condition of the picture and the like and is a predefined list, and more specifically, the prompt template is a general prompt template such as a { state } image, a large { state } image, a bright { state } image and the like.

S2, constructing a deep neural network model, training the deep neural network model based on a prompt learning and contrast learning method, and comprising the following steps:

A. training an image-text matching sub-network:

a1, freezing network parameters of a text encoder;

a21, adopting the contrast loss as a loss function of training:

a22, training adopts an Adam optimizer:

B. defect segmentation sub-network training:

b1, freezing image encoder network and text encoder network parameters;

b2, training the image decoder network:

Loss＝FocalLoss+DiceLoss

wherein, focalLoss is:

wherein, diceLoss is:

wherein alpha is a balance factor for balancing the imbalance of positive and negative samples, and the value range is [0,1]Gamma is a weight factor, so that the network pays attention to complex and error-prone samples, N is the total number of image pixels, and p _i Is the value of the ith pixel of the model prediction, g _i Is trueThe value of the i-th pixel of the real label, e is a constant;

b22, training adopts an Adam optimizer:

and when the calculated error reaches the expected value, finishing training to obtain the optimal parameters of the network model.

Specifically, in this embodiment, the deep neural network model is shown in fig. 2, and specifically includes: the text coding network of the graphic large model, the graphic matching sub-network and the defect segmentation sub-network, wherein the text coding network of the graphic large model adopts a prompting text coder network of an RBT3 Chinese pre-training large model.

Further, in this embodiment, as shown in fig. 3, the input of the graph matching sub-network is the feature vector of the prompt text and the image to be detected, and the output is the matching degree score of the prompt text and the image to be detected, including the image encoder network composed of the res net50, as the feature extractor, the res net50 can learn complex and abstract image features, and is easy to integrate with other types of network structures.

Further, in this embodiment, as shown in fig. 4, the input of the defect segmentation sub-network is a prompt text feature vector and an image to be detected, and the output is a pixel-level defect segmentation image, which includes the prompt text encoder network and the image encoder network described above, and further includes an image decoder network.

Further, in this embodiment, the image decoder network is constructed from a residual module in ResNet and an appropriate upsampling module for increasing the spatial dimensions (e.g., width and height) of the data, e.g., in an image segmentation task, a deep feature extractor can be constructed using the residual module, and then the spatial resolution of the image can be gradually restored by the upsampling module to generate a fine segmentation map, which allows the network to effectively restore details of the image while maintaining depth and complexity.

S3, performing defect detection on the power system by using the trained deep neural network model, wherein the method comprises the following steps of:

s31, defect detection: inputting the electric power image to be detected and the prompt text into a trained model, wherein each prompt text and each electric power image to be detected output a score and an image segmentation map, and the score is used for representing the matching degree of the prompt text and the image;

s32, screening results: sequencing the predicted scores output in the step S31, and taking the result with the highest score as a defect prediction result;

s33, segmentation threshold value: if the prediction result obtained in the step S32 is that the corresponding prompt text state is defect-free, the segmentation result is not output, if the obtained prediction result is that the corresponding prompt text state is defect, the segmentation threshold value is selected to be k, and if the value of at least one point in the segmentation map is larger than k, the point is judged to be a defect point, and the defect point is set to be 1; if all points in the segmentation map are smaller than k, the points are set to be 0, and finally a binary defect segmentation mask map is obtained.

The above embodiments are preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, which includes but is not limited to the embodiments, and equivalent changes in shape, structure, and method according to the present invention are all within the scope of the present invention.

Claims

1. The method for detecting the defects of the power system with few samples based on prompt and contrast learning is characterized by comprising the following steps:

2. The method for detecting defects of a few-sample power system based on prompt and contrast learning according to claim 1, wherein the industrial defect detection dataset in S1 is a non-power-related dataset, and the specific step of establishing the power system dataset comprises:

and generating a prompt template according to the object size in the picture and the picture light condition and combining the corresponding states.

3. The prompt and contrast learning-based few-sample power system defect detection method according to claim 1, wherein the deep neural network model in S2 comprises:

and the defect segmentation sub-network is used for outputting a pixel-level defect segmentation image based on the prompt text feature vector and the image to be detected.

4. The method for detecting defects of a few-sample power system based on prompt and contrast learning according to claim 3, wherein the text coding network of the graphic large model is a pre-training model.

5. The method for detecting defects of a few-sample power system based on prompt and contrast learning according to claim 3, wherein the construction process of the graph matching subnetwork further comprises:

s22, through a wholeThe connection layer converts the feature map into feature vectors F with the same dimension as the text feature vectors _I ；

score＝Cos(F _I ,F _T )。

6. the prompt and contrast learning based few-sample power system defect detection method of claim 5, wherein the image encoder network is a convolutional neural network-based or a Transformer network-based feature extraction network.

7. The hint and contrast learning based few-sample power system defect detection method according to claim 3 or 5, wherein the defect segmentation sub-network further comprises:

an image encoder network;

8. The hint and contrast learning based few-sample power system defect detection method according to claim 7, wherein the image decoder network is composed of a residual module and an upsampling module in a res net network structure.

9. The method for detecting defects of a few-sample power system based on prompt and contrast learning according to claim 1, 3 or 5, wherein the training step of the deep neural network model in S2 is as follows:

A. training an image-text matching sub-network:

a1, freezing network parameters of a text encoder;

a21, adopting the contrast loss as a loss function of training:

where N is the total number of image pixels,for the feature vector of the ith image, +.>For the feature vector of the ith text, +.>For the feature vector of the j-th image, +.>For the feature vector of the j text, τ is a parameter with learning ability;

a22, training adopts an Adam optimizer:

B. defect segmentation sub-network training:

b1, freezing image encoder network and text encoder network parameters;

b2, training the image decoder network:

Loss＝FocalLoss+DiceLoss

wherein, focalLoss is:

wherein, diceLoss is:

b22, training adopts an Adam optimizer:

10. The method for detecting defects in a power system with few samples based on prompt and contrast learning according to claim 1, wherein the step S3 comprises the following sub-steps:

s33, if the corresponding prompt text state of the obtained prediction result is defect-free, the segmentation result is not output; if the corresponding prompt text state of the obtained prediction result is defective, selecting a segmentation threshold value as k, and if the value of at least one point in the segmentation map is greater than k, judging the point as a defective point, and enabling the point to be 1; if all points in the segmentation map are smaller than k, the points are set to be 0, and finally a binary defect segmentation mask map is obtained.