CN115439849B

CN115439849B - Instrument digital identification method and system based on dynamic multi-strategy GAN network

Info

Publication number: CN115439849B
Application number: CN202211211597.5A
Authority: CN
Inventors: 陈俊宇; 胡振华; 顾吉轩; 滕旭阳
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-09-08
Anticipated expiration: 2042-09-30
Also published as: CN115439849A

Abstract

The invention discloses a method and a system for identifying instrument numbers based on a dynamic multi-strategy GAN network, wherein the method comprises the following steps: step 1: processing the collected image data set, and extracting image features; step 2: training a network of images; step 3: carrying out real-time identification on the image; step 4: and when the accuracy rate is smaller than the set value, offline updating is carried out on the GAN network model. The invention greatly improves the adaptation degree of the digital identification model of the instrument to diversified instrument pictures, and can improve the precision of the digital identification model of the instrument.

Description

Instrument digital identification method and system based on dynamic multi-strategy GAN network

Technical Field

The invention belongs to the technical field of instrument digital target identification, and particularly relates to an instrument digital identification method and system based on a dynamic multi-strategy GAN network.

Background

With the continuous development of technology, various intelligent charging systems and intelligent data analysis systems are layered in the present day. Compared with the traditional manual instrument digital statistics, the intelligent system has the characteristics of high efficiency, intellectualization and the like, and meanwhile, the labor cost can be reduced, and the statistics period is shortened. However, up to now, some intelligent systems still need manual operation, and accuracy and high efficiency still have very big promotion space. The instrument digital identification based on the deep learning technology can help the intelligent charging and data analysis system to further improve the intelligence, reduce the manual participation to the greatest extent, comprehensively improve the identification efficiency and achieve the effect of twice the result with little effort. The user only needs to submit the instrument photo according to the flow, and the background data processing system can automatically identify dial numbers in the photo, so that intelligent statistics is realized.

However, due to the diversity of meters and the shooting randomness of users, the digital recognition technology based on deep learning also has many problems, such as image position deviation, image blurring, brightness unbalance of the meter photo shot by the user, and even picture overturn, picture shielding and picture defect, which all cause the final model to fail, and the numbers on the meter cannot be correctly recognized. The invention provides a technical scheme for identifying the instrument numbers based on a dynamic multi-strategy GAN network, which can effectively avoid the influence of the quality problem of the uploaded pictures of users on the instrument number identification.

At present, the generated countermeasure network (Generative Adversarial Networks, GAN) is an artificial intelligence technology that is widely applied in the fields of image recognition and natural language processing. Compared with the traditional deep learning model, the technology has great advantages in both the image recognition speed and the recognition precision. However, in the face of uncertainty of submitting photos to complex and diverse meters and users, GAN networks with single learning strategies cannot adaptively learn various image recognition task modes. In order to better adapt to the distribution of target image data, it is necessary to design a dynamic GAN network capable of self-learning. After comparing the picture generated by the Generator with the real picture, the Discriminator (Discriminator) calculates the optimization parameters and reversely transmits the optimization parameters to the Generator, so that the Generator is forced to learn to generate more realistic pictures, and the pictures are continuously circulated, so that high-quality pictures which are closest to the real data distribution are obtained.

The attention mechanism (Attention Mechanism) is also an image salient region extraction method in deep learning as a representative of the recognition strategy. Most models divide the attention mechanism into three types of spatial domains, channel domains and mixed domains according to different attention weight application modes and positions, and the required domains can be selected in actual use. However, in the face of various reasons that the images are difficult to identify due to the diversity of the meters and the shooting randomness of users, a single strategy can only solve the problem of a certain type of significance deficiency, so that if only a single domain is used, the self-adaptive requirement of a system on the meter identification cannot be met, and the results of reduced model precision, enlarged loss function and the like are generated.

Disclosure of Invention

Aiming at the current situation in the prior art, the invention discloses a meter digital identification method and system based on a dynamic multi-strategy GAN network.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

the instrument digital identification method based on the dynamic multi-strategy GAN network comprises the following steps:

step 1: processing the collected image data set, and extracting image features;

step 2: training a network of images;

step 3: carrying out real-time identification on the image;

step 4: and when the accuracy rate is smaller than the set value, offline updating is carried out on the GAN network model.

Preferably, step 1 is specifically as follows:

step 1.1: collecting an image;

step 1.2: selecting images with different layers of noise, and processing the images to enable the images to be in high quality, namely, the images have no obvious position deviation, no image blurring, no brightness unbalance, no image overturning, no shielding and no defect;

step 1.3: adjusting the size of the picture to 224×224 to obtain a picture set to be trainedAnd high quality picture set->

Step 1.4: the convolutional layer in the pretrained VGG-16 model is used as a convolutional kernel for feature extraction, VGG-16 is a classical network in a Convolutional Neural Network (CNN), and a 3 multiplied by 3 convolutional kernel is mainly used, so that the network depth is improved under the condition of the same perception field, the learning performance of the neural network is further improved, and the convolutional kernel is used for extracting a picture set to be trainedAnd high quality picture set->Feature atlas X ₁ And X ₂ 。

Where C ', H ', W ' represent the dimension, height and width of the image before convolution, respectively, and C, H, W represent the dimension, height and width of the image after convolution, respectively.

Preferably, step 2 is specifically as follows:

step 2.1: a mixed attention module composed of three Networks, namely SENet (Squeeze-and-Excitation Networks, squeezing and exciting network, modeling the correlation among characteristic channels, strengthening important characteristics to improve accuracy), DCN (Deformable Convolutional Networks, deformable convolution network, which can better adapt to geometric deformation of an image through change of receptive fields), and CCNet (Criss-cross Networks, cross attention network), acquiring context information of surrounding pixels on a cross path through introducing a novel CCA module, and finally capturing remote dependency of all pixels by each pixel, is adopted, and the input characteristic atlas X ₁ Respectively passing through the three networks in parallel; the mixed attention module is defined as follows:

the first branch is SENET for automatically learning the feature importance degrees of different channels, and the specific steps are as follows:

first for feature atlas X ₁ C-th feature map x of (2) _c And (3) performing extrusion (Squeeze) operation, namely encoding the whole spatial feature on one channel into a global feature, and realizing the global feature by adopting global average pooling to achieve the purpose of compressing the feature along the spatial dimension, wherein the formula is as follows:

wherein z is _c The numerical distribution of the c-th feature map, i.e., global information.

An Excitation operation is then performed, which captures the correlation mainly between channels. In order to reduce complexity and improve generalization capability, two fully connected layers are introduced, the formula is as follows:

s＝F _ex (z,W)＝σ(g(z,W))＝σ(W ₂ ReLU(W ₁ z))

wherein z is the output of the extrusion operation, W ₁ And W is ₂ As the weight of the material to be weighed,r is the scaling parameter 16.W (W) ₁ z is the first full-connection layer process, plays a role in reducing the dimension, and ReLU () is a common activation function, so that the output dimension is kept unchanged; w (W) ₂ ReLU(W ₁ z) is the second full-connection layer process, restores to the previous dimension, sigma is a sigmoid activation function, and output s is the feature map weight obtained through the previous full-connection layer learning;

finally, the activation value s of each channel learned in the excitation operation is used for _c Multiplying by the original feature x _c The weight coefficient of each channel of the image can be learned, and the formula is as follows:

x′ _c ＝F _scale (x _c ,s _c )＝s _c ·x _c

wherein x' _c ∈X′ ₁ ，X′ ₁ I.e. the feature atlas output after the first branch.

The second branch is DCN based on parallel network learning offset, so that the convolution kernel is offset at the sampling point of the input feature map, and the specific steps are as follows:

in the deformable convolution, the deformable convolution operation and the pooling operation are two-dimensional and are performed on the same channel, the convolution kernel R is expanded by adding an offset, for each position p on the feature map ₀ The process is as follows:

wherein p=p ₀ +p _n +Δp _n ，p _n Is for the listed positions in the convolution kernel RW is a deformable convolution parameter, offset Δp _n Since the pixel values are obtained through learning, which are usually floating point numbers, the pixel values of the non-integer coordinate positions on the input feature map can be obtained through bilinear interpolation of x (p), and the formula is as follows:

wherein q is an integer coordinate on the input feature map x, and p is a floating point coordinate on the input feature map x; g () is bilinear interpolation function, finally obtaining the feature map X (p) ∈X' ₂ ；

The third branch is CCNet capturing context information, which is specifically as follows:

introducing a cross attention module CCA (Criss-cross Attention Module), firstly, carrying out two 1X 1 convolutions on a characteristic diagram x by the CCA module to generate characteristic diagrams Q and K, and further generating attention diagram A by carrying out Affinity (Affinity) operation on the Q and K, wherein the formula is as follows:

wherein each position u in the spatial dimension of the feature map Q can yield a vector Q _u ∈R ^C′ Similarly, extracting feature vectors from K to obtain a set omega _u ∈R ^{(H+W-1)×C′} ，Ω _i,u ∈R ^C′ Representing omega _u I-th element, d _i,u E D represents feature Q _u And omega _i,u Is a common activation function, will (- +. ++ infinity) range becomes a value of one (0, 1) interval.

After the operation is completed, the initial feature image x is subjected to 1×1 convolution to generate a feature image V for feature adaptation, and feature vectors are extracted from the feature image V to obtain a set V _u Then obtain the cross characteristic vector phi at the u position _u These vectors are located in the same row or column at position u, and finally an Aggregation (Aggregation) operation is performed to collect remote context information,the formula is as follows:

wherein A is _i,u And phi is _i,u Is the bit wise multiplication of the corresponding elements, and the context information is added to the local feature x to enhance the representation of the local feature and the pixel wise.

In the whole CCNet, the feature map X extracts global context information through a circulating cross attention module RCCA formed by combining two CCA modules connected in series, then the extracted global context information and the feature map X are spliced together, and finally the feature map X 'is obtained' ₃ ；

Step 2.2: for the weight superposition of the mixed attention mechanism, adopting genetic algorithm iteration to obtain a solution with better weight distribution; the group initialization adopts a method for generating random numbers to generate 5 groups of random weights with the numerical range between 0.3 and 3Where i is the i generation population of the genetic algorithm, α is the first branch weight of the mixed attention module, β is the second branch weight of the mixed attention module, and γ is the third branch weight of the mixed attention module. Calculating a cross entropy loss function according to the extraction condition and the extraction effect of each group of weights on the picture characteristics, determining the corresponding fitness value of the cross entropy loss function, constructing a wheel disc according to each group of fitness conditions, selecting 2 groups of the wheel disc as father by a roulette wheel disc mode, and performing cross operation between the selected groups, wherein the cross method comprises the following steps:

wherein the method comprises the steps ofAnd rand e U (0, 1), η=4;

rand is a random number between 0 and 1, eta is a self-defined distribution factor, and the probability that the offspring approaches the parent is determined. And the probability is set to be 0.5% of variation, and the variation mode is as follows:

wherein k is a variation constant, and r is a random number;

the constant change of the weight is realized through the process, and the sum of the three attention weights is ensured to be equal to 3 through normalization, namely alpha ⁱ +β ⁱ +γ ⁱ ＝3；

Finally obtaining two sub-generation weights; selecting a group of F' = (alpha) with high fitness _n ,β _n ,γ _n ) And weighting the feature map to obtain a mixed attention result feature map:

step 2.3: a loss function formed by combining a softmax loss function and a cross entropy loss function; outputting the mixed attention module with the characteristic image X' and the high-quality picture characteristic image X ₂ Meanwhile, the method is input into a discriminator of the GAN network for comparison, and comprises the following specific steps:

the softmax loss function is calculated first, as follows:

wherein z is the output result of the full-connection layer of the mixed attention module output characteristic diagram X', z _k The kth value of the full link layer is represented, c is the number of classifications, k epsilon {1,2,3, …, c };

the cross entropy loss function is then calculated as follows:

wherein f (z) _c ) Output result, y, of the softmax penalty function _c True value X for high quality picture sample ₂ Thus, the final loss function is obtained through calculation.

Preferably, step 3 is specifically as follows:

step 3.1: upsampling the feature map X "through the deconvolution layer of the full convolutional neural network (Fully Convolutional Networks, FCN) to obtain an image set of 224X 224 in size;

step 3.2: the segmentation of the image instrument head adopts a full convolution neural network (Pyramid Scene Parsing Network, PSPNet), the PSPNet structure divides the acquired feature layer into grids with different sizes, and then the inside of each grid is respectively subjected to average pooling, so that the context information of different areas is aggregated.

Step 3.3: identifying the digital in the meter head by adopting a pretrained convolutional neural network VGG-16 integral model, wherein a convolutional layer in the VGG-16 basic structure adopts a 3X 3 convolutional kernel stack, a pooling layer adopts a 2X 2 window and the step length is 2,3 full connection layers; and outputting a recognition result after the soft-max layer softmax normalization function.

Preferably, the step 4 is specifically as follows:

step 4.1: after a period of time, carrying out review, and calculating the accuracy delta of digital identification in the review image; when the accuracy delta is more than 92%, model updating is not carried out; otherwise, the error picture is adjusted to be a high-quality picture, a high-quality picture set is added, and then the whole network is retrained according to the new high-quality picture set by repeating the steps 1.4 and 2 so as to obtain brand-new weight distribution;

step 4.2: randomly extracting part of images after a period of time to perform network training, calculating the accuracy of digital identification of the part of images and the loss function value of the images, when the accuracy delta is less than or equal to 92%, extracting the first 50 images with large loss function values, putting the images into a high-quality picture set after adjustment, and repeating the steps 1.4 and 2 to perform training again to obtain brand new weight distribution;

step 4.3: and (3) after each time of identifying the pictures with the set number of pictures, calculating the accuracy delta, and repeating the steps (1) and (2) by taking the pictures as a picture set to be trained when the accuracy delta of the digital identification of the images is less than or equal to 92%.

Preferably, step 4.3: and updating the high-quality picture set by checking the picture effect after image enhancement, adding the image types of the instrument or respectively manufacturing different types of high-quality picture sets.

The invention also discloses a system based on the instrument digital identification method, which comprises the following modules:

image collection and feature advance module: processing the collected image data set, and extracting image features;

and the network training module: training a network of images;

and the real-time identification module is used for: carrying out real-time identification on the image;

and an offline updating module: and when the accuracy rate is smaller than the set value, updating the model offline.

The invention provides a multi-strategy mixed attention mechanism model, namely a cross attention network (Criss-cross Networks, CCNet) in a spatial domain is combined into a mixed attention mechanism module by introducing a novel CCA module to acquire the context information of surrounding pixels on a cross path, each pixel can finally capture the remote dependency relationship of all pixels), a deformable convolution network (Deformable Convolutional Networks, DCN, the geometric deformation of an image can be better adapted through the change of a receptive field) and an extrusion and excitation network (Squeeze-and-Excitation Networks, SENet) in a channel domain, the important characteristics are enhanced by modeling the correlation among characteristic channels, the weight of each attention mechanism is dynamically optimized in the training process by a genetic algorithm (Genetic Algorithm, GA), and the approximate optimal solution of the attention mechanism distribution is obtained; and finally, adding the weights of all the parts to obtain an enhanced image. The invention greatly improves the adaptation degree of the digital identification model of the instrument to diversified instrument pictures, and can improve the precision of the digital identification model of the instrument.

Drawings

Fig. 1 is a flow chart of a method for identifying instrument numbers based on a dynamic multi-strategy GAN network.

FIG. 2 is a flow chart of the mixed attention module of the present invention.

Fig. 3 is a schematic diagram of the mixed attention module SENet flow of the present invention.

Fig. 4 is a schematic diagram of the mixed attention module CCNet flow of the present invention.

Fig. 5 is a block diagram of a dynamic multi-policy GAN network-based meter digital identification system of the invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Example 1

As shown in fig. 1-4, the method for identifying the instrument numbers based on the dynamic multi-policy GAN network in this embodiment specifically includes the following steps:

stage 1: image dataset processing, specifically as follows:

step 1.1: and (5) image collection. The pictures of the embodiment come from the sites of the properties of the enterprises in certain countries in Beijing to take the pictures of the meters in real time.

Step 1.2: high quality pictures. Images with different layers of noise are collected, the images are denoised and cut through manual operation, and parameters such as contrast, saturation and exposure of the images are adjusted, so that the images tend to be high in quality.

Step 1.3: and (5) adjusting the image size. The third party image processing library PIL (Python Imaging Library) library of Python is used for adjusting the sizes of the pictures in batches, the unified modification is 224 multiplied by 224, the feature extraction and the input of a picture enhancement network module are facilitated, and a picture set to be trained is obtainedAnd high quality picture set->

Step 1.4: and (5) extracting image features. Feature extraction refers to passing through a volumeThe convolution of the image can obtain a corresponding feature map, and the information of the image features can be obtained through the functions of a plurality of convolution kernels. The invention adopts the convolution layer in the pretrained VGG-16 model as the convolution kernel of feature extraction, VGG-16 is a classical network in a Convolutional Neural Network (CNN), and mainly uses a convolution kernel of 3 multiplied by 3, thereby ensuring that the network depth is improved under the condition of the same perception field, further improving the learning performance of the neural network, and the convolution kernel is used for extracting the picture set to be trained in the inventionAnd high quality picture set->Feature atlas X ₁ And X ₂ 。

Stage 2: the network training of the image is specifically as follows:

step 2.1: mix attention settings. The present embodiment employs a mixed attention module composed of three networks of SENet, DCN and CCNet, which focus on channel characteristics of an input image, DCN and CCNet focus on spatial characteristics of an input image, wherein DCN focuses on the relationship between adjacent pixels of an image, and CCNet focuses on global but at the same time focuses on image emphasis information. Input feature atlas X ₁ Respectively pass through the three networks in parallel. The above-described mixed attention module is defined as follows:

the first branch is SENET which can automatically learn the importance degrees of different channel characteristics, and the specific steps are as follows:

first for feature atlas X ₁ C-th feature map x of (2) _c Performing extrusion (Squeeze) operations, i.e. one channelThe whole space feature code is a global feature, and is realized by global average pooling, so that the purpose of feature compression along the space dimension is achieved, and the formula is as follows:

s＝F _ex (z,W)＝σ(g(z,W))＝σ(W ₂ ReLU(W ₁ z))

x′ _c ＝F _scale (x _c ,s _c )＝s _c ·x _c

wherein p=p ₀ +p _n +Δp _n ，p _n Is an enumeration of the listed positions in the convolution kernel R, w is a deformable convolution parameter, offset Δp _n Since the pixel values are obtained through learning, which are usually floating point numbers, the pixel values of the non-integer coordinate positions on the input feature map can be obtained through bilinear interpolation of x (p), and the formula is as follows:

The third branch is CCNet with more efficient and effective capture of context information, with the following specific steps:

to model the remote context correlation of local feature representations using lightweight computation and memory, we introduced a cross-attention module (Criss-cross Attention Module, CCA). The CCA module collects context information in horizontal and vertical directions to enhance the functionality represented by pixels.

The CCA module first generates a feature map Q and K by performing two 1×1 convolutions on the feature map x, and further generates an attention map a by performing an Affinity (Affinity) operation on Q and K, with the following formula:

After the operation is completed, the initial feature image x is subjected to 1×1 convolution to generate a feature image V for feature adaptation, and feature vectors are extracted from the feature image V to obtain a set V _u Then obtain the cross characteristic vector phi at the u position _u These vectors are located in the same row or column with position u, and finally an Aggregation (Aggregation) operation is performed to collect remote context information, and the formula is as follows:

wherein A is _i,u And phi is _i,u Is the corresponding element by bit multiplication, and the context information is added to the local feature x to enhance the representation of the local feature and pixel wise, so it has a broad range of contexts Wen Shitu, enhancing the feature performance.

In the whole CCNet, the feature map X is extracted by a circulating cross attention module (RECurrent Criss-cross Attention Module, RCCA) formed by combining two cross attention modules connected in series, and then the extracted global context information and the feature map X are spliced to obtain the feature map X' ₃ 。

Steps 1.4 to 2.1 are GAN network generator parts.

Step 2.2: hybrid attention adaptive weight allocation. Aiming at the weight superposition of the mixed attention mechanism, the invention adopts genetic algorithm iteration to obtain a solution with better weight distribution. The group initialization adopts a method for generating random numbers to generate 5 groups of random weights with the numerical range between 0.3 and 3Where i is the i generation population of the genetic algorithm, α is the first branch weight of the mixed attention module, β is the second branch weight of the mixed attention module, and γ is the third branch weight of the mixed attention module. According to the extraction condition and the extraction effect of each group of weights on the picture characteristics, a loss function is calculated, the corresponding fitness value is determined, a wheel disc is constructed according to each group of fitness conditions, 2 groups of the wheel disc are selected as father by means of a roulette wheel disc, and the selected groups are crossed by the crossing operation, wherein the crossing method comprises the following steps:

(for beta, gamma)

Wherein the method comprises the steps ofAnd rand e U (0, 1), η=4.

(for beta, gamma)

Where k is a variation constant and r is a random number.

The constant change of the weight is realized through the process, and the sum of the three attention weights is ensured to be equal to 3 through normalization, namely alpha ⁱ +β ⁱ +γ ⁱ ＝3。

Finally, two sub-generation weights are obtained. Selecting a group of F' = (alpha) with high fitness _n ,β _n ,γ _n ) The weighting of the feature map is performed and,obtaining a mixed attention result characteristic diagram:

this step is part of the evolution of the GAN network parameters.

Step 2.3: and (5) calculating a loss function. The present embodiment uses a combination of softmax loss function and Cross entropy loss function (Cross-entropy loss function). Outputting the mixed attention module with the characteristic image X' and the high-quality picture characteristic image X ₂ Meanwhile, the method is input into a discriminator of the GAN network for comparison, and comprises the following specific steps:

the softmax loss function is calculated first, as follows:

wherein z is the output result of the full-connection layer of the mixed attention module output characteristic diagram X', z _k The kth value of the full link layer is represented, c is the number of classifications, k ε {1,2,3, …, c }.

The cross entropy loss function is then calculated as follows:

wherein f (z) _c ) Output result, y, of the softmax penalty function _c True value X for high quality picture sample ₂ The final loss function can be obtained through calculation.

This step is the GAN network arbiter portion.

Stage 3: the real-time identification of the image is specifically as follows:

step 3.1: up-sampling. Upsampling the feature map X "through the deconvolution layer of the full convolutional neural network (Fully Convolutional Networks, FCN) to obtain an image set of 224X 224 in size;

step 3.2: PSPNet (Pyramid Scene Parsing Network) model. The method comprises the steps of dividing an image instrument head into grids with different sizes by adopting a full convolution neural network PSPNet, dividing the acquired feature layer into grids with different sizes by adopting a PSPNet structure, and respectively carrying out average pooling in each grid so as to realize the aggregation of the context information of different areas.

Stage 4: the parameters of the GAN network model are updated offline, and the parameters are specifically as follows:

step 4.1: and (5) periodically and manually rechecking. And after a period of use, the digital identification is carried out by using a manual identification mode, and the accuracy delta of digital identification in the review image is calculated. When the accuracy delta is more than 92%, model updating is not carried out; otherwise, manually adjusting the error picture into a high-quality picture, adding a high-quality picture set, and then repeating the steps 1.4 and 2 according to the new high-quality picture set to retrain the whole network so as to obtain brand-new weight distribution.

Step 4.2: and regularly extracting images for network training. After a period of time, randomly extracting part of images (1000 images are set in the embodiment), performing network training, calculating the accuracy of digital identification of the part of images and the loss function value of the images, when the accuracy delta is less than or equal to 92%, extracting the first 50 images with the large loss function value, manually adjusting the images, putting the images into a high-quality image set, and repeating the steps 1.4 and 2 for training again to obtain brand-new weight distribution.

Step 4.3: and (3) after each time of identifying 1000 pictures, calculating the accuracy delta, and when the accuracy delta of image digital identification is less than or equal to 92%, repeating the steps (1) and (2) by taking the pictures as a picture set to be trained to retrain the GAN network. In addition, by checking the picture effect after image enhancement, the high-quality picture set can be updated, such as adding the instrument image types, or respectively making different types of high-quality picture sets.

Example 2

As shown in fig. 5, this embodiment discloses a system based on the meter digital identification method described in embodiment 1, which includes the following modules:

and the network training module: training a network of images;

The foregoing description is only of the preferred embodiments of the invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. The instrument digital identification method based on the dynamic multi-strategy GAN network is characterized by comprising the following steps:

step 1: processing the collected image data set, and extracting image features;

step 2: training a network of images;

step 3: carrying out real-time identification on the image;

step 4: when the accuracy rate is smaller than the set value, offline updating is carried out on the GAN network model;

the step 1 is specifically as follows:

step 1.1: collecting an image;

step 1.2: selecting images with different layers of noise, and processing the images to enable the images to be high in quality;

step 1.3: the picture is resized to 224 x 224,obtaining a picture set to be trainedAnd high quality picture set->

Step 1.4: the convolution layer in the pretrained VGG-16 model is used as a convolution kernel for feature extraction, and the convolution kernel is used for extracting a picture set to be trainedAnd high quality picture set->Feature atlas X ₁ And X ₂

Wherein C ', H ', W ' represent the dimension, height and width of the image before convolution, respectively, and C, H, W represent the dimension, height and width of the image after convolution, respectively;

the step 2 is specifically as follows:

step 2.1: the input characteristic atlas X adopts a mixed attention module formed by combining three networks of SENet, DCN and CCNet ₁ Respectively passing through the three networks in parallel; the mixed attention module is defined as follows:

first for feature atlas X ₁ C-th feature map x of (2) _c The extrusion operation is carried out, namely, the whole spatial feature on one channel is encoded into a global feature, the global average pooling is adopted to realize, and the purpose of compressing the feature along the spatial dimension is achieved, wherein the formula is as follows:

wherein z is _c The numerical distribution condition of the c-th feature diagram is represented, namely global information;

then, excitation operation is carried out, and two full connection layers are introduced, wherein the formula is as follows:

s＝F _ex (z，W)＝σ(g(z，W))＝σ(W ₂ ReLU(W ₁ z))

wherein z is the output of the extrusion operation, W ₁ And W is ₂ As the weight of the material to be weighed,r is a scaling parameter 16; w (W) ₁ z is the first full-connection layer process, plays a role in reducing the dimension, and ReLU () is a common activation function, so that the output dimension is kept unchanged; w (W) ₂ ReLU(W ₁ z) is the second full-connection layer process, restores to the previous dimension, sigma is a sigmoid activation function, and output s is the feature map weight obtained through the previous full-connection layer learning;

x′ _c ＝F _scale (x _c ，s _c )＝s _c ·x _c

wherein x' _c ∈X′ ₁ ，X′ ₁ The feature atlas is output after the first branch;

in the deformable convolution, the deformable convolution operation and the pooling operation are both two-dimensional and performed on the same channel, the convolution kernel R is expanded by adding an offset, for each position p on the feature map ₀ The process is as follows:

introducing a cross attention module CCA, firstly, carrying out two 1X 1 convolutions on a characteristic diagram x by the CCA module to generate characteristic diagrams Q and K, and further generating attention force diagram A by the Q and K through affinity operation, wherein the formula is as follows:

wherein each position u in the spatial dimension of the feature map Q can yield a vector Q _u ∈R ^C′ Similarly, extracting feature vectors from K to obtain a set omega _u ∈R ^{(H+W-1)×C′} ，Ω _i，u ∈R ^C′ Representing omega _u I-th element, d _i，u E D represents feature Q _u And omega _i，u Is a common activation function, will (- +. the values in the +oo) range map to values in a (0, 1) interval;

after the above operation is completed, the initial characteristic diagram x is subjected to 1×1 convolution to generate the characteristic diagramV for feature adaptation, extracting feature vectors from V to obtain a set V _u Then obtain the cross characteristic vector phi at the u position _u The vectors are located in the same row or column with the position u, and finally the aggregation operation is performed to collect the remote context information, and the formula is as follows:

wherein A is _i，u And phi is _i，u Is the bit wise multiplication of the corresponding elements, the context information is added to the local feature x to enhance the representation of the local feature and the pixel wise;

Step 2.2: for the weight superposition of the mixed attention mechanism, adopting genetic algorithm iteration to obtain a solution with better weight distribution; the group initialization adopts a method for generating random numbers to generate 5 groups of random weights with the numerical range between 0.3 and 3Wherein i is the ith generation group of the genetic algorithm, alpha is the first branch weight of the mixed attention module, beta is the second branch weight of the mixed attention module, and gamma is the third branch weight of the mixed attention module; calculating a cross entropy loss function according to the extraction condition and the extraction effect of each group of weights on the picture characteristics, determining the corresponding fitness value of the cross entropy loss function, constructing a wheel disc according to each group of fitness conditions, selecting 2 groups of the wheel disc as father by a roulette wheel disc mode, and performing cross operation between the selected groups, wherein the cross method comprises the following steps:

wherein the method comprises the steps ofAnd rand e U (0, 1), η=4;

random number with rand between 0 and 1, eta is a self-defined distribution factor, and the probability that the offspring approaches the father is determined; and the probability is set to be 0.5% of variation, and the variation mode is as follows:

wherein k is a variation constant, and r is a random number;

Finally obtaining two sub-generation weights; selecting a group of F' = (alpha) with high fitness _n ，β _n ，γ _n ) And weighting the feature map to obtain a mixed attention result feature map:

the softmax loss function is calculated first, as follows:

the cross entropy loss function is then calculated as follows:

2. The method for identifying meter numbers based on dynamic multi-policy GAN network as claimed in claim 1, wherein step 3 is specifically as follows:

step 3.1: the feature map X' is up-sampled through a deconvolution layer of the full convolution neural network to obtain an image set with 224 multiplied by 224;

step 3.2: dividing the image instrument head by adopting a full convolution neural network, dividing the acquired characteristic layer into grids with different sizes by adopting a full convolution neural network structure, and respectively carrying out average pooling in each grid so as to realize the aggregation of the context information of different areas;

3. The method for identifying meter numbers based on dynamic multi-policy GAN network as claimed in claim 2, wherein step 4 is specifically as follows:

4. The method for digital identification of meters based on dynamic multi-policy GAN network as claimed in claim 3, wherein step 4.3: and updating the high-quality picture set by checking the picture effect after image enhancement, adding the image types of the instrument or respectively manufacturing different types of high-quality picture sets.

5. A system based on the meter digital identification method of any one of claims 1-4, comprising the following modules:

and the network training module: training a network of images;