CN116563410A

CN116563410A - Electrical equipment electric spark image generation method based on two-stage generation countermeasure network

Info

Publication number: CN116563410A
Application number: CN202310575158.0A
Authority: CN
Inventors: 杨圣洪; 徐方明; 李肯立; 蔡宇辉; 杨志邦; 余思洋; 唐伟; 段明星; 吕婷
Original assignee: Hunan Kuangan Network Technology Co ltd
Current assignee: Hunan Kuangan Network Technology Co ltd
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-08-08

Abstract

The invention discloses an electric equipment electric spark image generation method based on a two-stage generation countermeasure network, which comprises the following steps: the method comprises the following steps: collecting a real image dataset and collecting text descriptions; constructing an electric spark image background stripping module and manufacturing a background-free low-resolution electric spark image set; constructing a feature extraction module; constructing a two-stage generation countermeasure network model for training, wherein a first-stage nbgGAN model is used for generating a background-free low-resolution image, and a second-stage bgGAN model is used for generating a background-free high-resolution image; training the generated two-stage countermeasure network model to obtain a trained model; and generating spark images of various fault types by using the trained electrical equipment spark image generation model. The invention enhances the characteristic representation of the electric spark, improves the identification degree of the electric spark characteristic in the image, helps the model obtain the accurate electric spark characteristic, and improves the stability of the generated image.

Description

Electrical equipment electric spark image generation method based on two-stage generation countermeasure network

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to an electric equipment spark image generation method based on a two-stage generation countermeasure network.

Background

In a power plant environment, machine failure is typically caused by spark-cracking due to short circuits or heating of the circuit. This not only results in loss of machine property, but also may pose a threat to people's life safety. Because the electric spark has the characteristics of burst, instantaneous and unsteadiness, the electric spark is hardly perceived even under the condition of being attended by people. Therefore, the research on the spark detection algorithm is particularly important.

Currently, research data on spark detection algorithms is limited, with the most significant problem being the lack of publicly available high quality spark image datasets. Furthermore, performing spark detection requires a significant amount of spark image data to train, but acquiring such data requires significant time, capital and labor costs. Meanwhile, the performance of the current spark detection algorithm is affected by various factors, such as light, noise, current and voltage, etc. Therefore, it is critical to provide high quality spark images with more different features while reducing the cost of data acquisition, increasing the diversity of the data set, and thus improving the performance and reliability of the spark detection algorithm.

The generation countermeasure network (Generative Adversarial Networks, abbreviated as GAN) is a deep learning generation model, and is composed of two neural networks, namely a generator and a discriminator. The generator generates a sample resembling real data by learning the distribution of training data, and the identifier determines whether the generated data is real data or falsified data. The two networks are opposed to each other, the goal of the generator is to generate samples that are more and more close to real data, and the goal of the identifier is to distinguish real data from counterfeit data as accurately as possible. GAN is widely studied and used in the field of image generation because it can generate high-quality images. In the process of generating the image by the GAN, the generator learns the probability distribution of the image data, and the identifier judges whether the image is real or not. Through repeated iterative learning, parameters of the generator and the discriminator are gradually optimized, and finally the generator can generate high-quality samples which are more and more approximate to the real images. Currently, GAN has made many progress in the field of image generation. For example, DCGAN has a good effect in generating a high-resolution realistic image, pix2Pix can implement semantic conversion of images, and CycleGAN can implement conversion of images in different fields. In addition, some varieties of GAN have been proposed, such as WGAN, BEGAN, styleGAN, etc., and these GAN models have also been widely focused and applied in the field of image generation.

Currently, GAN models have emerged in many leading edge solutions for generating high resolution image applications. The technical scheme of the GAN model based on the super-resolution technology takes a low-resolution image as input, and generates a corresponding high-resolution image through the GAN model. The GAN model generated based on the condition uses the condition information such as image tag, text description and the like to assist in generating the corresponding high resolution image. Another common solution is a GAN model based on transfer learning, which first uses a pre-trained GAN model to generate low resolution images, which are then converted to high resolution images by a transfer learning method.

The high-resolution image generation technical scheme based on cascading GAN adopts a stack to generate an countermeasure network model, and is StackGAN++. The method comprises the following specific steps: the composition from text to image is first achieved using a two-stage generation countermeasure network architecture, namely, stackGAN-v 1. In the first Stage, the Stage-I GAN model draws the original shape and color of a scene according to a given text description, so as to generate a low-resolution image; in the second Stage, the Stage-II GAN model takes as input the low resolution image and text description generated in the first Stage, thereby generating a high resolution image. To increase the diversity of the generator output, the stackgan++ introduces a conditional noise vector consisting of two parts, a random noise vector and a conditional vector, and controls the generation of an image by controlling the two parts of vectors. In order for the generator to focus more on important parts of the image to generate a more realistic image, the stackgan++ introduces a block attention mechanism that divides the image into blocks, then calculates the attention weight of each block, and generates different parts of the image according to the weights.

Stack generation antagonism network model stackgan++ suffers from the following drawbacks: first, because the multiple generators and recognizers in the stackgan++ are all trained based on the same conditional noise, the generated images may lack diversity. This can result in the resulting images being too similar, lacking sufficient variability and creativity. Second, since the stackgan++ is generated based on conditions, it is difficult to directly control certain properties of the generated image, such as color, texture, and shape. This may result in the generated image not meeting the specific application requirements. Furthermore, because of the multiple generators and recognizers included in the StackGAN++, in the absence of efficient enhancement feature operation, the generated image may exhibit some instabilities, such as blurring, distortion, noise, etc., which may affect the usability of the StackGAN++ in some practical applications. Finally, because the StackGAN++ includes multiple generators and recognizers, each of which requires more time to complete its respective task, the training time is relatively long and the convergence rate is slow.

Disclosure of Invention

Aiming at the problems of low quality of an electric spark generated image, unobvious electric spark characteristics and the like, the invention provides an electric spark background stripping module for reducing interference of background information on image generation. Meanwhile, the invention provides a two-stage generation countermeasure network model, which aims to generate a high-resolution image with obvious electric spark characteristics. In addition, aiming at the problem about training efficiency in the field of GAN-based image generation, the invention introduces a self-attention mechanism to improve the GAN model, and aims to improve the model convergence rate. Aiming at the defects existing in the StackGAN++ model, the invention provides an electric equipment electric spark image generation method based on a two-stage generation countermeasure network model. The method has the following advantages: first, in the acquisition image dataset and text description phase, spark images of various fault types are collected while using condition vectors, random noise and additionally introduced feature vectors as inputs to the generator to ensure diversity of the generated images. And secondly, adding a feature extraction module at the stage of image preprocessing, and directly controlling and generating an electric spark image with the attribute required by meeting the application requirement by combining the extracted attribute features. In addition, an electric spark background stripping module is provided, and an electric spark image without background can be generated, so that interference of background information on extraction of important features and generation of a target image is reduced. Meanwhile, a two-stage generation countermeasure network model is designed, wherein the background information interference is removed by the first-stage generation countermeasure network model, and a more lifelike background-free electric spark image is generated; the second-level generation focuses on background information and more electric spark details ignored by the first-level model against the network model, and simultaneously uses a zero sample classifier to help the generator generate an electric spark image with target attributes, so that the stability of the generated image is improved. And finally, introducing self-attention modules into a recognizer for generating an antagonistic network model at the second stage and a zero sample classifier so as to improve recognition and classification efficiency and speed up model convergence.

The invention discloses a two-stage generation countermeasure network-based electric equipment spark image generation method, which comprises the following steps:

s1, acquiring a real image data set and collecting text description: collecting electric spark text description and environment information text description, and collecting high-resolution electric spark image sets of various fault types;

s2, constructing an electric spark image background stripping module, and manufacturing a background-free low-resolution electric spark image set: processing the high-resolution image set to obtain a corresponding low-resolution image set, and obtaining a background-free low-resolution image set by setting a region rule;

s3, constructing a feature extraction module: the feature extraction module extracts color, shape and texture features in the input image as additional input vectors of the model;

s4, constructing a two-stage generation countermeasure network model for training, wherein the first-stage nbgGAN model is used for generating a background-free low-resolution image, and the second-stage bgGAN model is used for generating a background-free high-resolution image;

s5, training the generated two-stage countermeasure network model to obtain a trained model: based on the two-stage countermeasure network model obtained in the step S4, when the model is judged to be converged, the converged two-stage countermeasure network model is recorded as an electric spark image generation model of the electric equipment;

S6, generating electric spark images of various fault types by using the trained electric spark image generation model of the electric equipment.

Further, the step S1 of collecting a real image dataset and collecting a text description specifically includes:

collecting an electric spark text description d containing color, shape and texture characteristics and an environment information text description env of each electric spark fault type, and providing text information for an image generation model;

image set I of high-resolution electric sparks of various fault types in power grid environment is collected simultaneously _H Providing image information for an image generation model, wherein various fault types include: insulation failure, poor contact, foreign matter ingress, arc failure and short circuit failure;

all the obtained image data sets are then normalized to obtain a high resolution spark image data set I of the same size ₀ The method comprises the steps of carrying out a first treatment on the surface of the The tensor size of the high-resolution electric spark image is H, W and C; each image is marked as a category label of the fault type, and is defined for each imageAttribute tags, in the process, each real spark image is marked 1.

Further, step S2 specifically includes:

for the high-resolution electric spark image data set I obtained in S1 ₀ The bilinear interpolation operation and the downsampling method are adopted, and the low-resolution electric spark image set I is obtained after processing _L The method comprises the steps of carrying out a first treatment on the surface of the The tensor size of the low-resolution spark image isWherein r is the downsampling ratio;

stripping the image background, thereby reducing background information interference and accelerating model convergence;

will I _L Inputting the spark image background stripping module to obtain a low-resolution spark image I without background ₁ ；

The region rule of the spark image background stripping module is as follows:

1) The electric spark has the characteristic of high saturation, namely the saturation value is not lower than the preset saturation threshold value, and S is more than or equal to S _T ；

2) The electric spark has the characteristic of high brightness, namely the brightness value of the gray level graph of the electric spark is not lower than a preset brightness threshold value, and L is more than or equal to L _T ；

3) The electric spark has the characteristic of bright color, namely that each color component of the electric spark image is not lower than a preset threshold value of each color component, and R is more than or equal to R _T Or G is greater than or equal to G _T Or B is greater than or equal to B _T ；

4) The area where the electric spark is located has the characteristic of large variation amplitude of pixel values, so that the electric spark image surrounded by the edge in the image can be extracted through the edge information of the image, and the background information is removed.

Further, the specific steps of extracting the spark image are as follows:

Firstly, performing binarization processing on an input image I to obtain a binary image Bu (I), and then performing morphological gradient operation on the binary image Bu (I) to obtain edge information e (T (Bi (I))) of electric sparks;

in order to remove edge noise, performing an open operation on the binary image to obtain O (Bi (I));

next, the edge area of the image is set to white, and the background area is set to black, resulting in a processed image H (O (Bi (I)));

performing expansion operation on the image by using edge information e (T (Bi (I))) to obtain a wider edge region;

carrying out connected region analysis on the expanded binary image to obtain a region omega of an electric spark image surrounded by the edge;

finally, performing bit-wise AND operation (I, omega) on the original image I AND the binarized image, AND removing the background image which is not surrounded by the edge;

dividing the extracted electric spark image by adopting the following formula:

wherein S, L, R, G, B is the saturation component, the luminance component, the red component, the green component, and the blue component of the image, S _T 、L _T 、R _T 、G _T 、B _T For a saturation component threshold, a brightness component threshold, a red component threshold, a green component threshold AND a blue component threshold which are preset, AND represents bitwise AND operation, I is an input image, bi (-) represents binarization processing, T (-) represents morphological gradient operation, O (-) represents operation of open operation, E (-) represents obtaining edge information, H (-) is operation of setting an edge area to white AND setting a background area to black, E (x, y) represents expansion operation of x by using information y, C (-) is connected area analysis, AND omega is an area obtained after C (-) operation.

Further, S3 specifically includes:

color feature extraction: extracting color features by using a color histogram to obtain an extra color vector E _c ；

And (3) extracting shape features: extracting shape features by adopting an edge detection algorithm to obtain an additional shape vector E _s ；

Texture feature extraction: extracting texture features by using texture feature descriptors, and obtaining an additional texture vector E by using a local binary pattern for text _t 。

Further, the model building step of step S4 is as follows:

step S41: a text enhancement module T1 of a first-stage ngGAN model and a text enhancement module T2 of a second-stage ngGAN model are constructed so as to improve the continuity of data flow of text embedding potential space, and a generator can learn text characteristics conveniently;

step S42: an image enhancement module is constructed to prevent the image from being fitted excessively; in the image enhancement module, the input background-free low-resolution spark image I ₁ And the electric spark sample I generated by the generator G1 _g1 The same data enhancement strategy is adopted;

step S43: a generator G1 for constructing a first-stage nbgGAN model;

step S44: constructing a recognizer D1 of a first-stage nbgGAN model;

step S45: a generator G2 for constructing a second-stage bgGAN model;

step S46: introducing a self-attention module, and constructing a recognizer D2 of a second-level bgGAN model;

Step S47: introducing a self-attention mechanism, and constructing a zero sample classifier of a second-stage bgGAN model.

The beneficial effects of the invention are as follows:

the invention provides an electric spark background stripping module, which aims to remove the interference of invalid background information and enhance the characteristic representation of electric spark by setting region rules so as to improve the identification degree of electric spark characteristics in an image. The background-free spark image generated by the module is used as a real sample input of the first-stage nbgggan model to help the model obtain accurate spark characteristics, thereby improving the stability of the two-stage generation countermeasure network generated image.

The invention provides a two-stage generation countermeasure network model based on a self-attention mechanism, wherein a first stage GAN only generates a target image with important characteristics, a second stage GAN generates a complete high-resolution image, and a self-attention loss module is added to improve the recognition and classification efficiency of the model. The first-stage nbgggan model uses a background-free electric spark image and a generated sample containing multi-dimensional information such as a condition vector of a text, an image attribute feature vector, random noise and the like as input to generate a more realistic background-free electric spark image. In the data acquisition stage and the feature extraction stage, electric spark images of various fault types are collected, and image features are extracted as input of a generator, so that the diversity of the generated images is ensured, and the generated electric spark images can be directly controlled to have the attribute of meeting application requirements. The second stage bgGAN model uses a condition vector with environmental information as input, focusing on the background information and more spark image details ignored in the first stage bgGAN model, generating a high resolution spark image with background. Meanwhile, the zero sample classifier is used for helping the generator to generate the electric spark image with the target attribute, so that the stability of the generated image is improved. In addition, a self-attention module is introduced into the identifier of the second-stage bgGAN module and the zero-sample classifier, and more important electric spark information in the image with the background is focused, so that model convergence is accelerated.

The stackgan++ generates an image based on an input random vector, and thus cannot precisely control certain features of the generated image, such as color, texture, and the like. The method provided by the invention uses the random vector and the additionally introduced feature vector to generate the image, so that the generated image can be controlled to have important features required by the application, and the generated image is more controllable.

The image quality generated by the StackGAN++ is not stable enough, and obvious defects and imperfections can occur. By using the technical means of a background stripping module, an image enhancement module and the like, the method provided by the invention reduces the imperfect caused by noise interference or overfitting and improves the quality and stability of the generated image.

The StackGAN++ model contains multiple generator and identifier networks, resulting in long training times. The model provided by the invention introduces a self-attention module in the recognizer and the zero sample classifier, and adds a self-attention loss function value as a guide to improve recognition and classification efficiency.

Drawings

FIG. 1 is a flow chart of a spark image generation method of the present invention;

FIG. 2 is a flow chart of the feature extraction module of the present invention;

FIG. 3 is a diagram of a two-level generation countermeasure network model of the present invention;

FIG. 4 is a diagram of the structure of the first stage nbgGAN and second stage bgGAN models of the present invention;

FIG. 5 is a training diagram of a two-stage generation countermeasure network model of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings, without limiting the invention in any way, and any alterations or substitutions based on the teachings of the invention are intended to fall within the scope of the invention. The following detailed description of the invention refers to the accompanying drawings:

as shown in fig. 1, the spark image generation method of the present invention comprises the following steps:

s1, acquiring a real image data set and collecting text description. Collecting an electric spark text description d containing color, shape and texture characteristics and an environment information text description env of each electric spark fault type, and providing text information for an image generation model. Image set I of high-resolution electric sparks of various fault types in power grid environment is collected simultaneously _H Providing image information for an image generation model, wherein various fault types include: insulation failure, poor contact, foreign object ingress, arc failure and short circuit failure. All the obtained image data sets are then normalized to obtain a high resolution spark image data set I of the same size ₀ . The tensor size of the high-resolution electric spark image is H.W.C. Each image is labeled as a category label of the type of fault to which it belongs, and an attribute label is defined for each image. In this process, each real spark image is marked 1.

S2, constructing an electric spark image background stripping module, and manufacturing a background-free low-resolution electric spark image set. For the high-resolution electric spark image data set I obtained in S1 ₀ A bilinear interpolation operation and a downsampling method are adopted,processing to obtain a low-resolution electric spark image set I _L . The tensor size of the low-resolution spark image isWhere r is the downsampling ratio, herein r=4. Since the background information interferes with the image generation process of feature extraction and generation countermeasure network, the image background needs to be stripped, so that the background information interference is reduced and the model convergence is accelerated. Will I _L Inputting the spark image background stripping module to obtain a low-resolution spark image I without background ₁ 。

The region rule of the spark image background stripping module is as follows:

4) The area where the electric spark is located has the characteristic of large variation amplitude of pixel values, so that the electric spark image surrounded by the edge in the image can be extracted through the edge information of the image, and the background information is removed. The method comprises the following specific steps: firstly, binarizing an input image I to obtain a binary image Bi (I), and then performing morphological gradient operation on the binary image Bi (I) to obtain edge information e (T (Bi (I))) of electric sparks. In order to remove the edge noise, the binary image is subjected to an open operation to obtain O (Bi (I)). Next, the edge area of the image was set to white, and the background area was set to black, resulting in a processed image H (O (Bi (I))). The image is subjected to a dilation operation using the edge information e (T (Bi (I))) to obtain a wider edge region. And (3) carrying out connected region analysis on the expanded binary image to obtain a region omega of the spark image surrounded by the edge. Finally, the original image I AND the binarized image are subjected to a bitwise AND operation AND (I, omega), AND the background image which is not surrounded by edges is removed.

The image is segmented using the following formula:

wherein S, L, R, G, B is the saturation component, the luminance component, the red component, the green component, and the blue component of the image, S _T 、L _T 、R _T 、G _T 、B _T For a saturation component threshold, a brightness component threshold, a red component threshold, a green component threshold AND a blue component threshold which are preset, AND represents bitwise AND operation, I is an input image, bi (-) represents binarization processing, T (-) represents morphological gradient operation, O (-) represents operation of open operation, E (-) represents operation of obtaining edge information, H (-) is operation of setting an edge area to white AND setting a background area to black, E (x, y) represents expansion operation of x by using information y, C (-) is connected area analysis, AND omega is an area obtained after C (-) operation.

S3, constructing a feature extraction module. As shown in fig. 2, the characteristics of the generated image are controlled by extracting the color, shape and texture characteristics of the input image, and using these characteristics as additional input vectors. The specific operation is as follows:

color feature extraction: extracting color features by using a color histogram to obtain an extra color vector E _c 。

And (3) extracting shape features: the shape feature is extracted by adopting an edge detection algorithm, and the extra shape vector E is obtained by adopting a Canny edge detection algorithm _s 。

Texture feature extraction: extracting texture features using texture feature descriptors, text using Local Binary Pattern (LBP), resulting in an additional texture vector E _t 。

S4, constructing a two-stage generation countermeasure network model for training. As shown in fig. 3, the network model structure is modified and designed based on two DCGANs in cascade to achieve the goal of simultaneously generating high resolution spark images and classifying the spark failure types. The two-stage generation countermeasure network model comprises the following modules: the first-stage nbgGAN model consists of a text enhancement module T1, a generator G1, an image enhancement module and a recognizer D1 and is used for generating a low-resolution electric spark image without background; the second-stage bgGAN model consists of a text enhancement module T2, a generator G2, a recognizer D2 and a zero-sample classifier introducing a self-attention mechanism, and is used for generating a high-resolution background electric spark image and classifying fault categories of the electric spark image.

The steps of constructing the model are as follows:

step 1: as shown in fig. 4, a text enhancement module T1 of the first-level nggan model and a text enhancement module T2 of the second-level bgGAN model are constructed to improve the data stream continuity of the text embedding potential space, so that the generator can learn text features conveniently. Firstly, processing the electric spark text description d and the environment information text description env collected in the S1 by a pre-trained text editor to obtain electric spark text embedding And ambient information text embedding->Then, will->Input to text enhancement module T1 and will +.>And->Input into a text enhancement module T2 to obtain condition vectors c respectively ₁ Condition vector c ₂ 。

The specific operation in the text enhancement module T1 is as follows: first, text is embeddedInputting two full connection layers to generate two vectors, which are used as the mean mu of Gaussian distribution ₁ Sum of variances sigma ₁ . Then, a random vector epsilon is sampled from the standard normal distribution according to the formula of Gaussian distribution ₁ Finally, using formula c ₁ ＝μ ₁ +σ ₁ *ε ₁ The random vector epsilon ₁ Conversion to conditional vector c in Gaussian distribution ₁ . The specific operation in the text enhancement module T2 is as follows: condition vector c ₂ Is obtained by a method of (a) and a condition vector c ₁ Similarly, but using two other fully connected layers to generate the mean μ ₂ Sum of variances sigma ₂ And is embedded by text->And ambient information text embedding->Together, a gaussian distribution is generated. Similarly, a random vector ε is sampled from a normal distribution according to the Gaussian distribution formula ₂ And use equation c ₂ ＝μ ₂ +σ ₂ *ε ₂ The random vector epsilon ₂ Conversion to conditional vector c in Gaussian distribution ₂ 。

Step 2: and constructing an image enhancement module to prevent the image from being over fitted. In the image enhancement module, the input background-free low-resolution spark image I ₁ And the electric spark sample I generated by the generator G1 _g1 The same data enhancement strategy is adopted, comprising: (1) randomly selecting a square area of one-fourth of the original image in the image area for occlusion, (2) adding Gaussian noise to the image, (3) covering [ -1/8,1/8 ] in the image with 0]Range.

Step 3: a generator G1 of the first level nbgGAN model is built. First, the condition variable c ₁ Random noise vector z and color E extracted by the feature extraction module _c Shape E _s Texture E _t And splicing the features to form an input vector. Then, input vectorInput generator G1 for generating low-resolution spark sample I without background _g1 Wherein each generated spark image is marked 0. The generator G1 is made up of 4 transposed convolutional layers, each comprising up-sampling and convolutional operations.

Step 4: the identifier D1 of the first level nbgGAN model is constructed. First, a low-resolution background-free spark image I is obtained ₁ And G1 generation of sample input image I _g1 The enhancement module obtains an enhanced image, and the image size of the enhanced image is W ₁ ×H ₁ . The dimension of the output space of the enhanced image after passing through a series of downsampling blocks is M _d ×M _d Is a picture of the image of (a). At the same time, text is embeddedCompressing the input full-connection layer to Nd dimension, and obtaining the size M after space replication _d ×M _d ×N _d Tensors of (c). The filter map of the image is then stitched with the text tensor along the channel dimension. The resulting tensor is further input to a 1 x 1 convolutional layer, which jointly learns the features of the image and text. Finally, a full connection layer is used to generate the decision score. The downsampling block of the discriminator D1 is composed of 5 convolutional neural network layers, each convolutional neural network layer comprises a convolutional operation, a batch normalization operation and a LeakyReLU activation function, and the first convolutional layer does not have the batch normalization operation.

Step 5: and a generator G2 for constructing a second-stage bgGAN model. Step 3 generates background-free low-resolution spark sample I ₁ And condition vector c ₂ Input generator G2 for generating high-resolution spark sample I with background _g2 Each generated spark image is marked 0. The specific operation is as follows: first for Ng-dimensional condition vector c ₂ Spatially replicating to form M _g ×M _g ×N _g Tensors of size. Then sample I is spark ₁ Feeding a plurality of downsampling blocks until the output size is M _g ×M _g Is a picture of the image of (a). Then, the image features and the text features are spliced along the channel dimension and input into 4 residual blocks for learningMultimodal representations of cross-image and text features are learned. Finally, a series of upsampling layers are used to generate a size W ₂ ×H ₂ High resolution image I of (2) _g2 . The method comprises the steps that a series of downsampling blocks are all convolution layers, a first convolution layer does not have batch normalization operation, a residual block consists of convolution operation, batch normalization operation and a ReLU activation function, a series of upsampling layers are all transposed convolution layers, and each transposed convolution layer comprises upsampling and convolution operations.

Step 6: the self-attention module is introduced to construct a recognizer D2 of the second-level bgGAN model. Similar to step 4, the size is W ₂ ×H ₂ Background high resolution spark image I ₀ And G2 generation sample I _g2 Through a series of downsampling blocks, the output space dimension is M _d ×M _d Is a picture of the image of (a). The difference is that the number of downsampling blocks in this step is greater and a self-attention module is added to focus more on important spark information in the background high-resolution image. At the same time, text is embeddedInput full connection layer compression to N _d Dimension, and then space copying to generate the dimension M _d ×M _d ×N _d Tensors of (c). The filter map of the image is then stitched with the text tensor along the channel dimension. The resulting tensor is further input into a 1 x 1 convolution layer for co-learning the features of the image and text. Finally, a full connection layer is used to generate the decision score. The downsampling block of the discriminator D2 is composed of 7 convolutional neural network layers, each convolutional neural network layer comprises a convolutional operation, a batch normalization operation and a LeakyReLU activation function, and the first convolutional layer does not have the batch normalization operation.

Step 7: introducing a self-attention mechanism, and constructing a zero sample classifier of a second-stage bgGAN model. Generate sample I to be output by generator G2 _g2 And a high resolution background spark image I ₀ Input into a feature extraction module which extracts their respective color, shape and texture features andgenerating feature vector E _{generate_c} 、E _{generate_s} 、E _{generate_t} And E is _{real_c} 、E _{real_s} 、E _{real_t} . Then, sample I will be generated _g2 And true sample I ₀ The same feature vectors of the plurality of the feature vectors are spliced to obtain a spliced feature vector E _{cat_c} 、E _{cat_s} 、E _{cat_t} . And then, mapping the spliced feature vectors to a high-dimensional space through a full-connection layer, inputting the feature vectors into a multi-head self-attention module, and sending the output feature vectors generated by the module into a classifier for classification and mapping the output feature vectors into corresponding electric spark image fault types. Wherein the classifier is composed of one full connection layer.

Wherein the multi-head attention module performs the following operations: first, a three-dimensional tensor is received as input, wherein a first dimension represents the lot size, a second dimension represents the location of each pixel in the feature map, and a third dimension represents the dimension of the feature vector. Then, the input characteristic tensors are respectively subjected to three different linear transformations to obtain three new vectors: q, K, V, which are identical in dimension and equal in dimension to the input features. Then, the module calculates the dot product between Q and K, and normalizes the dot product to obtain a similarity matrix as a weighting matrix for representing the influence degree of each position on other positions. And carrying out weighted summation on V by using a similarity matrix to obtain the final self-attention output. And finally, carrying out residual connection on the self-attention output and the input feature vector, and carrying out normalization operation to obtain the final feature vector representation.

S5, the high-resolution electric spark image data set I obtained in the step S1 is processed ₀ And text description d, S2, resulting in a background-free low-resolution spark image I ₁ S3 extracted color vector E _c Shape vector E _s Texture vector E _t Input to a two-stage generation countermeasure network model for training. As shown in fig. 5, the training process of the model includes two parts: pretraining and fine tuning. The second stage bgGAN model is first fixed in the pre-training, and then the recognizer D1 and the generator G1 of the first stage nbgggan model are alternately trained. Next, the first stage n is fixedThe bgGAN model alternately trains the identifier D2, the zero sample classifier, and the generator G2 of the second stage bgGAN model. In fine tuning, the two GAN models are jointly trained. When the model convergence is judged, the converged two-stage countermeasure network model is recorded as an electric equipment spark image generation model.

Two-stage generation antagonism network model two loss functions are used to train the first stage nbgggan model: generating loss L _G1 And identifying loss L _D1 . Wherein generating the penalty includes countering the penalty L _{C_G1} And conditional loss L _{C_MSE} Identifying loss includes countering loss L _{C_D1} And binary cross entropy loss L _{cross_D1} 。

The countering loss of the first-stage nbgGAN model is: the generator G1 attempts to generate a realistic low resolution background-free image, the identifier D1 attempts to distinguish the real image from the generated image by maximizing the loss function L _{C_D1} And minimizing the loss function L _{C_G1} Alternately training discriminator D1 and generator G1. Meanwhile, KL divergence is introduced to constrain the difference between the data distribution generated by the generator and the true data distribution. The formula is:

wherein E (-) represents the expected value, S (-) represents the image enhancement operation, I ₁ Is a background-free low-resolution real image, d is a text description, c ₁ As a conditional vector, z is a noise vector randomly sampled from a given distributed gaussian distribution, E _c ,E _s ,E _t Respectively color vector, shape vector and texture vector. D (D) _KL (p _data ||p _g ) Distribution p for real data _data (x) And generator generates data distribution p _g (x) KL divergence between. k (k) ₁ Is a regularized hyper-parameter for balancing the loss function L _{C_G1} Original loss function and KL divergence in (a), herein k ₁ ＝1。μ ₁ (. Cndot.) and sigma ₁ (·) are the mean and standard deviation sampled from the text enhancement module 1, respectively. Loss of conditionL _{C1_MSE} Namely: the generator G1 attempts to minimize the mean square error loss between the generated image and the real image. The formula is:

wherein I is _i (w, h) represents the pixel value of the real image of the ith sample at position (h, w), P _i (W, H) represents the pixel value of the generated image of the ith sample at the position (H, W), N is the number of samples, and W and H represent the width and height of the image, respectively.

Cross entropy loss L _{cross_D1} Namely: the identifier D1 attempts to minimize the distance of the true class probability distribution from the model predictive probability distribution for each sample, thereby accelerating the convergence speed at the time of gradient descent optimization. The binary cross entropy loss formula is as follows:

wherein y is a real label, y epsilon (0, 1), and p is the probability of positive class when the model makes a decision.

Thus, the generation loss LG1 is as follows:

LG1＝min(L _{C_G1} )+min(L _{C1_MSE} )，

the identification loss LD1 is as follows:

LD1＝max(L _{C_D1} )+min(L _{cross_D1} )

two-stage generation antagonism network model the second stage bgGAN model is trained using three loss functions: generating loss L _G2 Zero sample classification loss L _C And identifying loss L _D2 . Wherein generating the penalty includes countering the penalty L _{C_G2} Loss of condition L _{C2_MSE} And classifier directed penalty weights, zero sample classification penalty including classification penalty L _c;s Countering loss L _adv And self-attention loss L _att Identifying loss includes countering loss L _{C_D2} Binary cross entropy loss L _{cross_D2} And pay attention toLoss of force L _{A_D2} ；

The countering loss of the second-stage bgGAN model is: the generator G2 attempts to generate a realistic high resolution background image while taking into account other image information and background information that are ignored in the first level nbgggan model, and the identifier D2 attempts to distinguish the real image from the generated image by maximizing the loss function L _{C_D2} And minimizing the loss function L _{C_G2} To train the recognizer D2 and the generator G2 alternately. The formula is:

wherein I is ₀ As a high resolution real image with background, I _g1 Generating samples for generator G1, c ₂ As a condition vector, mu ₂ (. Cndot.) and sigma ₂ (·) represents the mean and standard deviation, respectively, sampled from the text enhancement module 2.

D _KL (p _data ||p _g ) Distribution p for real data _data (x) And generator generates data distribution p _g (x) KL divergence between. k (k) ₂ Is a regularized hyper-parameter, herein k ₂ ＝1。

The generation loss L is obtained in the same manner as the first-stage nbgGAN model _G2 Conditional loss L in (2) _{C2_MSE} And identifying loss L _D2 Cross-over loss L in (1) _{cross_D2} 。

The attention module of the recognizer D2 provides attention loss, i.e. minimizes the distance between the output data and the input image from the attention mechanism. The formula is as follows:

wherein X represents an input image, A represents self-attention mechanism output data, |·|| _F Representing the Frobenius norm of the matrix.

The classification accuracy of the zero sample classifier will guide the generator G2 in the correct classA sample is generated. Zero sample classification loss L _C Namely: zero sample classifier incorporating multi-headed self-attention modules attempts to minimize classification errors, including classification loss L _cls Countering loss L _adv And self-attention loss L _att The formula is as follows:

wherein N is the number of samples, K is the number of classes of the classifier, y (i, j) represents whether the ith sample belongs to the jth class, p (i, j) represents the probability that the ith sample generated by the generator belongs to the jth class, D is the identifier, z _i Representing the noise vector of the i-th sample. C (x) _i Representing the predicted probability, w, of the recognizer for the ith position of the input image x _i Self-attention weight representing the location, w _i Can be calculated from the input features, typically normalized using a softmax function:

thus, the generation loss LG2 is as follows:

LG2＝min(L _{C_G2} )+min(L _{C2_MSE} )+c*L _C

where c is used to balance the original loss function in generator G2 with the total loss function of the zero sample classifier.

The identification loss LD2 is as follows:

LD2＝max(L _CD2 )+min(L _crossD2 )+min(l _{A_D2} )

the zero sample class loss total loss function is defined as:

L _C ＝L _cls +l*L _adv +λ*L _att

where l, λ is the hyper-parameter, l is the weight used to balance the classification loss function and the counterloss function, λ represents the loss weight of the self-attention mechanism.

S6, generating high-resolution electric spark images of fault types including insulation faults, poor contact, foreign matter entering, arc faults, short circuit faults and the like by using the trained electric spark image generation model of the electric equipment so as to expand a high-quality electric spark image data set and save time, funds and labor cost for acquiring the electric spark images. The extended high-quality electric spark image data set can be put into industrial application, scientific research, safety detection and other fields for use.

The beneficial effects of the invention are as follows:

The word "preferred" is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as "preferred" is not necessarily to be construed as advantageous over other aspects or designs. Rather, use of the word "preferred" is intended to present concepts in a concrete fashion. The term "or" as used in this application is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise or clear from the context, "X uses a or B" is intended to naturally include any of the permutations. That is, if X uses A; x is B; or X uses both A and B, then "X uses A or B" is satisfied in any of the foregoing examples.

Moreover, although the disclosure has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The present disclosure includes all such modifications and alterations and is limited only by the scope of the following claims. In particular regard to the various functions performed by the above described components (e.g., elements, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure. Furthermore, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or other features of the other implementations as may be desired and advantageous for a given or particular application. Moreover, to the extent that the terms "includes," has, "" contains, "or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term" comprising.

The functional units in the embodiment of the invention can be integrated in one processing module, or each unit can exist alone physically, or a plurality of or more than one unit can be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. The above-mentioned devices or systems may perform the storage methods in the corresponding method embodiments.

In summary, the foregoing embodiment is an implementation of the present invention, but the implementation of the present invention is not limited to the embodiment, and any other changes, modifications, substitutions, combinations, and simplifications made by the spirit and principles of the present invention should be equivalent to the substitution manner, and all the changes, modifications, substitutions, combinations, and simplifications are included in the protection scope of the present invention.

Claims

1. The electric equipment spark image generation method based on the two-stage generation countermeasure network is characterized by comprising the following steps of:

2. The two-stage generation countermeasure network-based electrical equipment spark image generation method of claim 1, wherein the capturing of the real image dataset and the gathering of the text description in step S1 specifically includes:

all the obtained image data sets are then normalized to obtain a high resolution spark image data set I of the same size ₀ The method comprises the steps of carrying out a first treatment on the surface of the The tensor size of the high-resolution electric spark image is H, W and C; h is the height, W is the width, C is the channel, each image is labeled as a category label for the type of fault to which it belongs, and an attribute label is defined for each image, in which process each real spark image is labeled 1.

3. The electrical equipment spark image generation method based on the two-stage generation countermeasure network according to claim 2, wherein step S2 specifically includes:

The region rule of the spark image background stripping module is as follows:

1) The electric spark has the characteristic of high saturation, i.e. the saturation value S is not lower than the preset saturation threshold S _T S is greater than or equal to S _T ；

2) The electric spark has the characteristic of high brightness, namely the brightness value L of the gray level graph of the electric spark is not lower than the preset brightness threshold value L _T With L not less than L _T ；

3) The electric spark has the characteristic of bright color, i.e. each color component R, G, B of the electric spark image is not lower than the preset threshold R of each color component _T 、G _T 、B _T With R not less than R _T Or G is greater than or equal to G _T Or B is greater than or equal to B _T ；

4. The electric device spark image generation method based on the two-stage generation countermeasure network according to claim 3, wherein the specific step of extracting the electric spark image is as follows:

firstly, performing binarization processing on an input image I to obtain a binary image Bi (I), and then performing morphological gradient operation on the binary image Bi (I) to obtain edge information e (T (Bi (I))) of electric sparks;

dividing the extracted electric spark image by adopting the following formula:

5. The two-stage generation countermeasure network-based electric device spark image generation method according to claim 1, wherein S3 specifically includes:

Texture feature extraction: extraction of texture features using texture feature descriptors, herein using a local binary pattern, yields an additional texture vector E _t 。

6. The two-stage generation countermeasure network-based electric device spark image generation method according to claim 1, wherein the model construction step of step S4 is as follows:

step S42: constructing an image enhancement module to prevent image over-passingFitting; in the image enhancement module, the input background-free low-resolution spark image I ₁ And the electric spark sample I generated by the generator G1 _g1 The same data enhancement strategy is adopted;

step S43: a generator G1 for constructing a first-stage nbgGAN model;

step S44: constructing a recognizer D1 of a first-stage nbgGAN model;

step S45: a generator G2 for constructing a second-stage bgGAN model;

7. The two-stage generation countermeasure network-based electric device spark image generation method according to claim 1, wherein step S41 includes: the text enhancement module T1 of the first-stage ngGAN model and the text enhancement module T2 of the second-stage ngGAN model are constructed so as to improve the continuity of data flow of text embedding potential space, and the generator can learn text characteristics conveniently: firstly, processing the electric spark text description d and the environment information text description env collected in the S1 by a pre-trained text editor to obtain electric spark text embedding And ambient information text embedding->Then, will->Input to text enhancement module T1 and will +.>And->Input into a text enhancement module T2 to obtain condition vectors c respectively ₁ Condition vector c ₂ ；

The specific operation in the text enhancement module T1 is as follows: first, text is embeddedInputting two full connection layers to generate two vectors, which are used as the mean mu of Gaussian distribution ₁ Sum of variances sigma ₁ The method comprises the steps of carrying out a first treatment on the surface of the Then, a random vector epsilon is sampled from the standard normal distribution according to the formula of Gaussian distribution ₁ Finally, using formula c ₁ ＝μ ₁ +σ ₁ *ε ₁ The random vector epsilon ₁ Conversion to conditional vector c in Gaussian distribution ₁ ；

The specific operation in the text enhancement module T2 is as follows: condition vector c ₂ Is obtained by a method of (a) and a condition vector c ₁ Similarly, but using two other fully connected layers to generate the mean μ ₂ Sum of variances sigma ₂ And is embedded by textAnd context information text embeddingTogether generating a gaussian distribution; sampling from standard normal distribution to obtain a random vector epsilon according to a Gaussian distribution formula ₂ And use equation c ₂ ＝μ ₂ +σ ₂ *ε ₂ The random vector epsilon _w Conversion to conditional vector c in Gaussian distribution ₂ ；

Step S42 includes: an image enhancement module is constructed to prevent the image from being fitted excessively; in the image enhancement module, the input background-free low-resolution spark image I ₁ And the electric spark sample I generated by the generator G1 _g1 The same data enhancement strategy is adopted, comprising: randomly selecting a square area with a quarter of the original image in the image area to shield; for a pair ofAdding Gaussian noise to the image; covering [ -1/8,1/8 ] in the image with 0]A range;

step S43 includes: a generator G1 for constructing a first-stage nbgGAN model;

first, the condition variable c ₁ Random noise vector z and color E extracted by the feature extraction module _c Shape E _s Texture E _t Splicing the features to form an input vector;

the input vector is then input to generator G1 to generate background-free low-resolution spark samples I _g1 Wherein each generated spark image is marked 0; the generator G1 is composed of 4 transposed convolutional layers, each comprising up-sampling and convolutional operations;

step S44 includes: identifier D1 of the first level nbgGAN model is built: first, a low-resolution background-free spark image I is obtained ₁ And G1 generation of sample input image I _g1 The enhancement module obtains an enhanced image, and the image size of the enhanced image is W ₁ ×H ₁ The dimension of the output space of the enhanced image after passing through a series of downsampling blocks is M _d ×M _d Is a picture of (1);

at the same time, text is embedded Input full connection layer compression to N _d Dimension, obtaining the dimension M after space copying _d ×M _d ×N _d Tensors of (a);

then, splicing the filter mapping of the image with the text tensor along the channel dimension; the obtained tensor is further input into a 1 x 1 convolution layer to jointly learn the characteristics of the image and the text;

finally, a full connection layer is used to generate decision scores; the downsampling block of the discriminator D1 is composed of 5 convolutional neural network layers, each convolutional neural network layer comprises a convolutional operation, a batch normalization operation and a LeakyReLU activation function, and the first convolutional layer does not have the batch normalization operation;

step S45 includes: generator G2 building a second level bgGAN model: the steps are as followsS3 generating background-free low-resolution electric spark sample I ₁ And condition vector c ₂ Input generator G2 for generating high-resolution spark sample I with background _g2 Marking each generated spark image as 0; the specific operation is as follows: first to N _g Dimension Condition vector c ₂ Spatially replicating to form M _g ×M _g ×N _g Tensors of size;

then sample I is spark ₁ Feeding a plurality of downsampling blocks until the output size is M _g ×M _g Is a picture of (1);

then, splicing the image features and the text features along the channel dimension, and inputting the spliced image features and the text features into 4 residual blocks for learning multi-modal representation of the cross-image and the text features;

Finally, a series of upsampling layers are used to generate a size W ₂ ×H ₂ High resolution image I of (2) _g2 The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the steps that a series of downsampling blocks are all convolution layers, a first convolution layer does not have batch normalization operation, a residual block consists of convolution operation, batch normalization operation and a ReLU activation function, a series of upsampling layers are all transposed convolution layers, and each transposed convolution layer comprises upsampling and convolution operations;

step S46 includes: introducing a self-attention module, and constructing a recognizer D2 of a second-level bgGAN model; will be of size W ₂ ×H ₂ Background high resolution spark image I ₀ And G2 generation sample I _g2 Through a series of downsampling blocks, the output space dimension is M _d ×M _d Is a picture of (1); at the same time, text is embeddedInput full connection layer compression to N _d Dimension, and then space copying to generate the dimension M _d ×M _d ×N _d Tensors of (a);

then, splicing the filter mapping of the image with the text tensor along the channel dimension, and inputting the obtained tensor into a 1×1 convolution layer for jointly learning the characteristics of the image and the text;

finally, a fully connected layer is used for generating a decision score, wherein a downsampling block of the discriminator D2 consists of 7 convolutional neural network layers, each convolutional neural network layer comprises a convolutional operation, a batch normalization operation and a LeakyReLU activation function, and the first convolutional layer does not have the batch normalization operation;

Step S47: introducing a self-attention mechanism, and constructing a zero sample classifier of a second-stage bgGAN model: generate sample I to be output by generator G2 _g2 And a high resolution background spark image I ₀ Inputting the color, shape and texture features into a feature extraction module, and generating a feature vector E _{generate_c} 、E _{generate_s} 、E _{generate_t} And E is _{real_c} 、E _{real_s} 、E _{real_t} ；

Then, sample I will be generated _g2 And true sample I ₀ The same feature vectors of the plurality of the feature vectors are spliced to obtain a spliced feature vector E _{cat_c} 、E _{cat_s} 、E _{cat_t} ；

Then, mapping the spliced feature vectors to a high-dimensional space through a full-connection layer, inputting the feature vectors into a multi-head self-attention module, and sending the output feature vectors generated by the module into a classifier for classification and mapping the output feature vectors into corresponding electric spark image fault types, wherein the classifier is composed of a full-connection layer;

wherein the multi-head attention module performs the following operations: firstly, receiving a three-dimensional tensor as input, wherein a first dimension represents the batch size, a second dimension represents the position of each pixel in the feature map, and a third dimension represents the dimension of the feature vector; then, the input characteristic tensors are respectively subjected to three different linear transformations to obtain three new vectors: q, K, V, which are identical in dimension and equal to the dimension of the input feature; then, the module calculates the dot product between Q and K, and normalizes the dot product to obtain a similarity matrix serving as a weighting matrix for representing the influence degree of each position on other positions; carrying out weighted summation on V by using a similarity matrix to obtain final self-attention output; and finally, carrying out residual connection on the self-attention output and the input feature vector, and carrying out normalization operation to obtain the final feature vector representation.

8. The two-stage generation countermeasure network-based electric device spark image generation method according to claim 1, wherein the two-stage generation countermeasure network model training process in step S5 includes two parts: pre-training and fine-tuning; in the pre-training, a second-stage bgGAN model is fixed first, and then a recognizer D1 and a generator G1 of the first-stage nbgggan model are trained alternately;

next, fixing the first-stage nbgGAN model, and alternately training a recognizer D2, a zero-sample classifier and a generator G2 of the second-stage bgGAN model;

in the fine tuning, the two GAN models are jointly trained, and when the convergence of the models is judged, the converged two-stage countermeasure network model is recorded as an electric equipment spark image generation model.

9. The two-stage generation countermeasure network-based electrical device spark image generation method of claim 8, wherein the two-stage generation countermeasure network model trains the first stage nbgggan model using two loss functions: generating loss L _G1 And identifying loss L _D1 Wherein generating the penalty includes countering the penalty L _{C_G1} And conditional loss L _{C_MSE} Identifying loss includes countering loss L _{C_D1} And binary cross entropy loss L _{cross_D1} ；

The countering loss of the first-level nbgggan model, i.e. the generator G1, attempts to generate a realistic low-resolution background-free image, the recognizer D1 attempts to distinguish the real image from the generated image by maximizing the loss function L _{C_D1} And minimizing the loss function L _{C_G1} Alternately training the discriminator D1 and the generator G1 while introducing KL divergence to constrain the difference between the data distribution generated by the generator and the true data distribution;

loss of condition L _{C1_MSE} I.e. the generator G1 tries to minimize the mean square error loss between the generated image and the real image;

two-way feedingCross entropy loss L _{cross_D1} That is, the identifier D1 tries to minimize the distance between the true class probability distribution and the model predictive probability distribution for each sample, thereby accelerating the convergence speed at the time of gradient descent optimization;

two-stage generation antagonism network model the second stage bgGAN model is trained using three loss functions: generating loss L _G2 Zero sample classification loss L _C And identifying loss L _D2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein generating the penalty includes countering the penalty L _{C_G2} Loss of condition L _{C2_MSE} And classifier directed penalty weights, zero sample classification penalty including classification penalty L _cls Countering loss L _adv And self-attention loss L _att Identifying loss includes countering loss L _{C_D2} Binary cross entropy loss L _{cross_D2} And attention loss I _{A_D2} ；

The countering loss of the second-stage bgGAN model is: the generator G2 attempts to generate a realistic high resolution background image while taking into account other image information and background information that are ignored in the first level nbgggan model, and the identifier D2 attempts to distinguish the real image from the generated image by maximizing the loss function L _{C_D2} And minimizing the loss function L _{C_G2} To train the recognizer D2 and the generator G2 alternately;

the generation loss L is obtained in the same manner as the first-stage nbgGAN model _G2 Conditional loss L in (2) _{C2_MSE} And identifying loss L _D2 Cross-over loss L in (1) _{cross_D2} ；

The attention module of the recognizer D2 provides attention loss, i.e. minimizes the distance between the output data and the input image from the attention mechanism;

the classification accuracy of the zero sample classifier will guide the generator G2 to generate samples of the correct class, zero sample classification loss L _C Namely: zero sample classifier incorporating multi-headed self-attention modules attempts to minimize classification errors, including classification loss L _cls Countering loss L _adv And self-attention loss L _att 。

10. Two-based according to claim 9A method of generating an electrical device spark image of a level generating countermeasure network, characterized in that the level generating countermeasure network model trains a first level nbgggan model using two loss functions: generating loss L _G1 And identifying loss L _D1 Wherein generating the penalty includes countering the penalty L _{C_G1} And conditional loss L _{C_MSE} Identifying loss includes countering loss L _{C_D1} And binary cross entropy loss L _{cross_D1} ；

Loss function L _{C_D1} The formula is:

loss function L _{C_G1} The formula is:

wherein E (-) represents the expected value, S (-) represents the image enhancement operation, I ₁ Is a background-free low-resolution real image, d is a text description, c ₁ As a conditional vector, z is a noise vector randomly sampled from a given distributed gaussian distribution, E _c ，E _s ，E _t Respectively color vector, shape vector and texture vector; d (D) _KL (p _data ||p _g ) Distribution p for real data _data (x) And generator generates data distribution p _g (x) KL divergence, k between ₁ Is a regularized hyper-parameter for balancing the loss function L _{C_G1} Original loss function and KL divergence, μ in (a) ₁ (. Cndot.) and sigma ₁ (. Cndot.) are the mean and standard deviation sampled from text enhancement module 1, respectively;

loss of condition L _{C1_MSE} The formula is:

wherein I is _i (w, h) represents the pixel value of the real image of the ith sample at position (h, w), P _i (W, H) represents the pixel value of the generated image of the ith sample at the position (H, W), N is the number of samples, and W and H represent the width and height of the image, respectively;

cross entropy loss L _{cross_D1} The formula is as follows:

wherein y is a real label, y is E (0, 1), and p is the probability of positive class when the model makes a decision; thus, the generation loss LG1 is as follows:

LG1＝min(L _{C_G1} )+min(L _{C1_MSE} )，

the identification loss LD1 is as follows:

LD1＝max(L _{C_D1} )+min(L _{cross_D1} )

loss function L _{C_D2} The formula is:

loss function L _{C_G2} The formula is:

wherein I is ₀ As a high resolution real image with background, I _g1 Generating samples for generator G1, c ₂ As a condition vector, mu ₂ (-) and Σ ₂ (. Cndot.) represent the mean and standard deviation, D, respectively, sampled from the text enhancement module 2 _KL (p _data ||p _g ) Distribution p for real data _data (x) And generator generates data distribution p _g (x) KL divergence, k between ₂ Is a regularized hyper-parameter;

the attention loss formula is as follows:

wherein X represents an input image, A represents self-attention mechanism output data, |·|| _F The Frobenius norm of the matrix;

classification loss L _cls Countering loss L _adv And self-attention loss L _att The formula is as follows:

wherein N is the number of samples, K is the number of classes of the classifier, y (i, j) represents whether the ith sample belongs to the jth class, p (i, j) represents the probability that the ith sample generated by the generator belongs to the jth class, D is the identifier, z _i Noise vector representing the ith sample, C (x) _i Representing the predicted probability, w, of the recognizer for the ith position of the input image x _i A self-attention weight representing the location;

the generation loss LG2 is as follows:

LG2＝min(L _{C_G2} )+min(L _{C2_MSE} )+c*L _C

wherein c is used to balance the original loss function in generator G2 with the total loss function of the zero sample classifier;

the identification loss LD2 is as follows:

the zero sample class loss total loss function is defined as:

L _C ＝L _cls +l*L _adv +λ*L _att