CN115439849A

CN115439849A - Instrument digital identification method and system based on dynamic multi-strategy GAN network

Info

Publication number: CN115439849A
Application number: CN202211211597.5A
Authority: CN
Inventors: 陈俊宇; 胡振华; 顾吉轩; 滕旭阳
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2022-12-06
Anticipated expiration: 2042-09-30
Also published as: CN115439849B

Abstract

The invention discloses an instrument digital identification method and system based on a dynamic multi-strategy GAN network, wherein the method comprises the following steps: step 1: processing the collected image data set, and extracting image features; step 2: training a network of images; and 3, step 3: identifying the image in real time; and 4, step 4: and when the accuracy is smaller than a set value, performing offline updating on the GAN network model. The method greatly improves the adaptability of the digital recognition model of the instrument to diversified instrument pictures, and can improve the precision of the digital recognition model of the instrument.

Description

Instrument digital identification method and system based on dynamic multi-strategy GAN network

Technical Field

The invention belongs to the technical field of instrument digital target identification, and particularly relates to an instrument digital identification method and system based on a dynamic multi-strategy GAN network.

Background

With the continuous development of science and technology, nowadays, various intelligent charging systems and intelligent data analysis systems are developed. Compared with the traditional digital statistics of manual instruments, the intelligent system has the characteristics of high efficiency, intellectualization and the like, and meanwhile, the labor cost can be reduced, and the statistical period is shortened. However, at present, part of intelligent systems still need manual operation, and the accuracy and the efficiency also have great improvement space. And the instrument digital recognition based on the deep learning technology can help the intelligent charging and data analysis system to further improve the intelligence, reduce the manual participation to the maximum extent, comprehensively improve the recognition efficiency and achieve the effect of double the result with half the effort. A user only needs to submit an instrument photo according to the process, and the background data processing system can automatically recognize the dial plate number in the photo, so that intelligent statistics is realized.

However, due to the diversity of the instruments and the random photographing of users, the digital recognition technology based on deep learning also has many problems, for example, the instrument photo taken by the user has image position deviation, image blurring and brightness imbalance, and even still has image turning, image shielding and image defect, which all cause the final model to fail and can not correctly recognize the numbers on the instruments. Therefore, the invention provides a technical scheme for identifying the number of the instrument based on the dynamic multi-strategy GAN network, which can effectively avoid the influence on the number identification of the instrument caused by the quality problem of the picture uploaded by a user.

At present, a Generative Adaptive Networks (GAN) is an artificial intelligence technology that is widely applied in the fields of image recognition and natural language processing. Compared with the traditional deep learning model, the technology has great advantages in both image recognition speed and recognition accuracy. However, in the face of complex and various instruments and uncertainty of photo submission of users, the GAN network with a single learning strategy cannot adaptively learn various image recognition task patterns. In order to better adapt to the target image data distribution, it is necessary to design a dynamic GAN network capable of self-learning. The Discriminator (Discriminator) compares the picture generated by the Generator (Generator) with the real picture, calculates the optimized parameters and reversely transmits the optimized parameters to the Generator, so that the Generator is forced to learn and generate a more vivid picture, and the picture is continuously circulated to obtain a high-quality picture which is close to the distribution of the real data to the maximum extent.

An Attention Mechanism (Attention Mechanism) is a representative of an identification strategy and is also an image salient region extraction method in deep learning. According to different attention weight applying modes and positions, most models divide an attention mechanism into a space domain, a channel domain and a mixed domain, and only a required domain is selected in actual use. However, in view of the above various reasons that images are difficult to identify due to the diversity of the meters and the random photographing of users, a single policy often can only solve the problem of a certain type of saliency deficiency, so that if a single domain is used, the adaptive requirement of the system on the meter identification cannot be met, and results such as reduced model precision and increased loss function are generated.

Disclosure of Invention

Aiming at the current situation of the prior art, the invention discloses a method and a system for identifying the number of an instrument based on a dynamic multi-strategy GAN network.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

the instrument digital identification method based on the dynamic multi-strategy GAN network comprises the following steps:

step 1: processing the collected image data set, and extracting image features;

step 2: training a network of images;

and step 3: identifying the image in real time;

and 4, step 4: and when the accuracy is smaller than a set value, performing offline updating on the GAN network model.

Preferably, step 1 is specifically as follows:

step 1.1: collecting an image;

step 1.2: selecting images with different levels of noise, and processing the images to ensure that the images tend to have high quality, namely the images have no obvious position deviation, no image blur, no brightness imbalance, no image turnover, no shading and no defect;

step 1.3: adjusting the size of the picture to 224 multiplied by 224 to obtain a picture set to be trained

And high quality picture sets

Step 1.4: the convolution kernel in the pre-training VGG-16 model is used as a convolution kernel for feature extraction, the VGG-16 is a classic network in a Convolution Neural Network (CNN), and a convolution kernel of 3 x 3 is mainly used, so that the network depth is improved and the learning performance of the neural network is further improved under the condition of the same perception field

And high quality picture sets

Feature map set X of ₁ And X ₂ 。

Wherein C ', H ', W ' represent the dimension, height and width of the image before convolution, respectively, and C, H, W represent the dimension, height and width of the image after convolution, respectively.

Preferably, step 2 is specifically as follows:

step 2.1: the method adopts a mixed attention module formed by combining three Networks, namely SENet (stress-and-Excitation Networks), DCN (Deformable convolution network) and CCNet (cross-attention Networks), wherein the three Networks are used for enhancing important features by modeling correlation among feature channels so as to improve accuracy by enhancing the important features), and the CCNet (cross-attention Networks) are used for acquiring context information of surrounding pixels on a cross path by introducing a novel CCA module, and each pixel can finally capture remote dependence of all pixels, and an input feature atlas X is combined with the mixed attention module ₁ Respectively pass through the three in parallelA network; the mixed attention module is defined as follows:

the first branch is SEnet for automatically learning the importance degrees of different channel characteristics, and the specific steps are as follows:

firstly, a feature map set X is set ₁ C-th feature map x in (1) _c Performing an extrusion (Squeeze) operation, namely encoding the whole spatial feature on one channel into a global feature, and realizing the feature compression along the spatial dimension by adopting global average pooling, wherein the formula is as follows:

wherein z is _c And representing the value distribution of the c-th feature map, namely global information.

An Excitation (Excitation) operation follows, which mainly captures the correlation between channels. In order to reduce complexity and improve generalization capability, two full-connection layers are introduced, and the formula is as follows:

s＝F _ex (z,W)＝σ(g(z,W))＝σ(W ₂ ReLU(W ₁ z))

wherein z is the output of the pressing operation, W ₁ And W ₂ In order to be the weight, the weight is,

r is the scaling parameter 16.W ₁ z is a first full connection layer process and plays a role in dimension reduction, and ReLU () is a common activation function and keeps the output dimension unchanged; w is a group of ₂ ReLU(W ₁ z) is a second full-connection layer process, the dimension is recovered to the previous dimension, sigma is a sigmoid activation function, and output s is a feature graph weight obtained through the previous full-connection layer learning;

finally, the activation values s of all channels learned in the excitation operation are calculated _c Multiplication by the original feature x _c The weight coefficient of each channel of the image can be learned, and the formula is as follows:

x′ _c ＝F _scale (x _c ,s _c )＝s _c ·x _c

wherein x' _c ∈X′ ₁ ，X′ ₁ Namely the feature map set output after passing through the first branch.

The second branch is a DCN which is based on parallel network learning offset so that the convolution kernel is offset at the sampling point of the input characteristic diagram, and the specific steps are as follows:

in the deformable convolution, the deformable convolution operation and the pooling operation are both two-dimensional and are performed on the same channel, and the convolution kernel R is expanded by adding an offset to each position p on the feature map ₀ The following steps are changed:

wherein p = p ₀ +p _n +Δp _n ，p _n Is an enumeration of the listed positions in the convolution kernel R, w is a deformable convolution parameter, and an offset Δ p _n The floating point number is obtained through learning, so that the pixel value of the non-integer coordinate position on the input feature map can be obtained through bilinear interpolation on x (p), and the formula is as follows:

wherein q is an integer coordinate on the input feature map x, and p is a floating point coordinate on the input feature map x; g () is a bilinear interpolation function, and finally, a feature map X (p) epsilon X 'is obtained' ₂ ；

The third branch is CCNet for capturing context information, and the specific steps are as follows:

introducing a cross-Attention Module CCA (Criss-cross Attention Module), firstly performing two 1 × 1 convolutions on the feature map x to generate feature maps Q and K, and further generating an Attention map A by performing Affinity (Affinity) operation on the feature maps Q and K, wherein the formula is as follows:

wherein a vector Q can be obtained for each position u in the spatial dimension of the feature map Q _u ∈R ^C′ And similarly, extracting the characteristic vector from the K to obtain a set omega _u ∈R ^{(H+W-1)×C′} ，Ω _i,u ∈R ^C′ Represents omega _u The ith element of (1), d _i,u E.g. D represents the characteristic Q _u And omega _i,u Softmax is a common activation function that maps values in the range (- ∞, + ∞) to values in the (0, 1) interval.

After the above operations are completed, the initial feature map x is convolved by 1 × 1 to generate a feature map V for feature adaptation, and feature vectors are extracted from V to obtain a set V _u Then, a cross feature vector phi at the u position is obtained _u These vectors are located in the same row or column with the position u, and finally an Aggregation operation is performed to collect the remote context information, the formula is as follows:

wherein A is _i,u And phi _i,u Is a bitwise multiplication of the corresponding elements and context information is added to the local feature x to enhance the local feature and the representation in pixel-wise manner.

In the whole CCNet, a feature diagram X is subjected to global context information extraction through a cyclic cross attention module RCCA formed by combining two series CCA modules, then the extracted global context information is spliced with the feature diagram X, and finally a feature diagram X 'is obtained' ₃ ；

Step 2.2: aiming at weight superposition of a mixed attention mechanism, iteration is performed by adopting a genetic algorithm to obtain a weight distribution optimal solution; the family group initialization adopts a method of generating random numbers to generate 5 groups of random weights with the numerical value ranging from 0.3 to 3

Wherein i is the ith generation population of genetic algorithm, and alpha is the mixed noteThe first branch weight of the attention module, β is the second branch weight of the mixed attention module, and γ is the third branch weight of the mixed attention module. Calculating a cross entropy loss function according to the extraction condition and the extraction effect of each group of weight on the picture characteristics, determining a corresponding fitness value, constructing a roulette wheel according to each group of fitness conditions, selecting 2 groups as parents in a roulette wheel mode, and performing cross operation between the selected groups, wherein the cross method comprises the following steps:

wherein

And rand ∈ U (0, 1), η =4;

rand is a random number between 0 and 1, eta is a self-defined distribution factor, and the probability that the offspring approaches the parent is determined. And setting the variation with the probability of 0.5 percent, wherein the variation mode is as follows:

wherein k is a variation constant and r is a random number;

the constant change of the weight is realized through the process, and the sum of the three attention weights is ensured to be equal to 3 through normalization, namely alpha ⁱ +β ⁱ +γ ⁱ ＝3；

Finally, two offspring weights are obtained; selecting a group of F' = (alpha) with high fitness _n ,β _n ,γ _n ) And weighting the feature map to obtain a feature map of the mixed attention result:

step 2.3: a loss function formed by combining a softmax loss function and a cross entropy loss function is adopted; output feature map X' of mixed attention module and feature map X of high quality picture ₂ Meanwhile, the data are input into a discriminator of the GAN network for comparison, and the specific steps are as follows:

firstly, calculating a softmax loss function, wherein the formula is as follows:

wherein z is the output result of the full link layer of the output characteristic diagram X' of the mixed attention module, z _k The kth value of the full link layer is shown, c is the classification number, and k belongs to {1,2,3, \ 8230;, c };

then, a cross entropy loss function is calculated, and the formula is as follows:

wherein f (z) _c ) As an output result of the softmax loss function, y _c Is a truth value X of a high-quality picture sample ₂ And calculating to obtain the final loss function.

Preferably, step 3 is specifically as follows:

step 3.1: upsampling the feature map X' through an deconvolution layer of a full Convolutional neural network (FCN) to obtain an image set with the size of 224X 224;

step 3.2: a full convolution neural Network (PSPNet) is adopted for segmenting the image instrument header, the PSPNet structure divides the acquired feature layer into grids with different sizes, and then average pooling is respectively carried out inside each grid, so that context information of different areas is aggregated.

Step 3.3: identifying the number in the meter header by adopting a pre-trained convolutional neural network VGG-16 integral model, stacking the convolutional layer in the basic framework of the VGG-16 by adopting a 3 multiplied by 3 convolutional kernel, and adopting a 2 multiplied by 2 window and 2,3 full-connection layers as the step length in the pooling layer; and outputting the recognition result after passing through a softmax normalization function of a soft-max layer.

Preferably, step 4 is specifically as follows:

step 4.1: rechecking after a period of time, and calculating the correct rate delta of digital identification in the rechecked image; when the accuracy delta is larger than 92%, the model is not updated; otherwise, the error picture is adjusted to be a high-quality picture, a high-quality picture set is added, and then the step 1.4 and the step 2 are repeated according to the new high-quality picture set to retrain the whole network so as to obtain brand-new weight distribution;

step 4.2: randomly extracting partial images for network training after a period of time, calculating the accuracy of digital identification of the partial images and the loss function value of the images, when the accuracy delta is less than or equal to 92%, extracting the first 50 images with large loss function values, adjusting the images, putting the images into a high-quality image set, and repeating the step 1.4 and the step 2 for training again to obtain brand new weight distribution;

step 4.3: and (3) calculating the accuracy rate delta after the pictures with the set number are identified each time, and repeating the steps 1 and 2 by taking the pictures as the picture set to be trained when the image number identification accuracy rate delta is less than or equal to 92%.

Preferably, step 4.3: and updating the high-quality picture set by checking the picture effect after the image enhancement, and increasing the types of instrument images or respectively manufacturing different types of high-quality picture sets.

The invention also discloses a system based on the instrument digital identification method, which comprises the following modules:

an image collection and feature advancement module: processing the collected image data set, and extracting image features;

a network training module: training a network of images;

a real-time identification module: identifying the image in real time;

an offline update module: and when the accuracy is smaller than the set value, performing off-line updating on the model.

The invention provides a multi-strategy hybrid attention mechanism model, namely a cross attention network (Criss-cross Networks, CCNet) in a spatial domain is introduced to acquire context information of surrounding pixels on a cross path by introducing a novel CCA module, each pixel can finally capture remote dependency relationship of all pixels), a Deformable Convolutional network (DCN, which can better adapt to geometric deformation of an image through change of a receptive field) and a squeezing and Excitation network (Squeeze-and-Excitation Networks, SENET) in a channel domain are combined into a hybrid attention mechanism module, important features are strengthened by modeling correlation among characteristic channels so as to improve accuracy, and weights of various attention mechanisms are dynamically optimized in a training process through a Genetic Algorithm (GA), so that an approximately optimal solution distributed by the attention mechanism is obtained; and finally, adding the weights of all parts to obtain an enhanced image. The invention greatly improves the adaptation degree of the digital recognition model of the instrument to diversified instrument pictures and can improve the precision of the digital recognition model of the instrument.

Drawings

Fig. 1 is a flow chart of a meter number identification method based on a dynamic multi-policy GAN network according to the present invention.

FIG. 2 is a flow diagram of a hybrid attention module of the present invention.

FIG. 3 is a flow diagram of the mixed attention module SENET of the present invention.

Fig. 4 is a schematic flow diagram of the hybrid attention module CCNet of the present invention.

Fig. 5 is a block diagram of a meter number identification system based on a dynamic multi-policy GAN network according to the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Example 1

As shown in fig. 1 to 4, the method for identifying a meter number based on a dynamic multi-policy GAN network in this embodiment includes the following specific steps:

stage 1: image dataset processing, as follows:

step 1.1: and (4) collecting images. The picture of the embodiment is from the field real-time instrument picture taken by each house of a certain national enterprise in Beijing.

Step 1.2: high quality pictures. The method comprises the steps of collecting images with different levels of noise, denoising and cutting the images through manual operation, and adjusting parameters such as contrast, saturation and exposure of the images to enable the images to tend to be high in quality.

Step 1.3: and (5) adjusting the size of the image. The method comprises the steps of using a Python third-party image processing Library PIL (Python Imaging Library) Library to adjust the size of pictures in batches, uniformly modifying the size of the pictures into 224 multiplied by 224, facilitating feature extraction and inputting of a picture enhancement network module, and obtaining a picture set to be trained

And high quality picture sets

Step 1.4: and (5) extracting image features. The feature extraction means that a corresponding feature map can be obtained by convolving an image by a convolution kernel, and information of image features can be obtained by the action of a plurality of convolution kernels. The invention adopts the convolution layer in the pre-training VGG-16 model as the convolution kernel for feature extraction, the VGG-16 is a classic network in the Convolution Neural Network (CNN), and 3 x 3 convolution kernels are mainly used, so that the network depth is improved and the learning performance of the neural network is further improved under the condition of the same perception field

And high quality picture sets

Feature map set X of ₁ And X ₂ 。

And (2) stage: network training of images is specifically as follows:

step 2.1: mixed attention settings. In the embodiment, a mixed attention module formed by combining three networks of SENEt, DCN and CCNet is adopted, the SENEt emphasizes the channel characteristics of the input image, the DCN and the CCNet emphasize the spatial characteristics of the input image, the DCN emphasizes the relationship between adjacent pixel points of the image, and the CCNet emphasizes the overall situation but focuses on the image key information at the same time. Input feature set X ₁ Respectively through the three networks in parallel. The hybrid attention module is defined as follows:

the first branch is SEnet capable of automatically learning the importance degrees of different channel features, and the specific steps are as follows:

firstly, a feature map set X is set ₁ C-th feature map x in (1) _c Performing an extrusion (Squeeze) operation, i.e. encoding the whole spatial feature on one channel into a global feature, and implementing by using global average pooling to achieve the purpose of compressing the feature along the spatial dimension, wherein the formula is as follows:

s＝F _ex (z,W)＝σ(g(z,W))＝σ(W ₂ ReLU(W ₁ z))

r is the scaling parameter 16.W is a group of ₁ z is the first fully-connected layer process, which acts as a dimension-reduction, reLU ()Keeping the output dimension unchanged for a commonly used activation function; w ₂ ReLU(W ₁ z) is a second full-connection layer process, the dimension is recovered to the previous dimension, sigma is a sigmoid activation function, and output s is a feature graph weight obtained through the previous full-connection layer learning;

finally, the activation value s of each channel learned in the excitation operation is calculated _c Multiplication by the original feature x _c The weight coefficient of each channel of the image can be learned, and the formula is as follows:

x′ _c ＝F _scale (x _c ,s _c )＝s _c ·x _c

The second branch is a DCN which learns offset based on a parallel network so as to enable a convolution kernel to offset at a sampling point of an input feature map, and the DCN comprises the following specific steps:

wherein q is an integer coordinate on the input feature map x, and p is a floating point coordinate on the input feature map x; g () isBilinear interpolation function to obtain feature map X (p) epsilon X' ₂ ；

The third branch is CCNet with more efficient and effective capture of context information, which comprises the following steps:

to model remote context dependencies of local feature representations using lightweight computing and memory, we introduce a cross-Attention Module (CCA). The CCA module collects context information in the horizontal and vertical directions to enhance the per-pixel functionality.

The CCA module first performs two 1 × 1 convolutions on the feature map x to generate feature maps Q and K, and further generates an attention map a by performing Affinity (Affinity) operation on Q and K, and the formula is as follows:

After the operation is finished, performing 1 × 1 convolution on the initial feature map x to generate a feature map V for feature adaptation, and extracting feature vectors from V to obtain a set V _u Then, a cross feature vector phi at the u position is obtained _u These vectors are located in the same row or column with the position u, and finally an Aggregation operation is performed to collect remote context information, as follows:

wherein A is _i,u And phi _i,u Is toThe elements are multiplied by each other in a bit mode, and context information is added to the local feature x to enhance the representation of the local feature and the pixel mode, so that the method has a wide context view and improves feature expression.

In the whole CCNet, a feature map X is subjected to global context information extraction through a cyclic cross-Attention Module (RCCA) formed by combining two serially-connected RCAs (cross-Attention modules), and then the extracted global context information and the feature map X are spliced to obtain a feature map X' ₃ 。

The above steps 1.4 to 2.1 are part of the GAN network generator.

Step 2.2: hybrid attention adaptive weight assignment. Aiming at weight superposition of a mixed attention mechanism, the method adopts genetic algorithm iteration to obtain a weight distribution optimal solution. The family group initialization adopts a method of generating random numbers to generate 5 groups of random weights with the numerical value ranging from 0.3 to 3

Wherein i is the ith generation population of the genetic algorithm, α is the first branch weight of the mixed attention module, β is the second branch weight of the mixed attention module, and γ is the third branch weight of the mixed attention module. Calculating a loss function according to the extraction condition and the extraction effect of each group of weight on the picture characteristics, determining a corresponding fitness value, constructing a roulette wheel according to each group of fitness conditions, selecting 2 groups as parents in a roulette wheel mode, and performing cross operation between the selected groups, wherein the cross method comprises the following steps:

(for beta, gamma)

Wherein

And rand ∈ U (0, 1), η =4.

(for beta, gamma)

Where k is a variation constant and r is a random number.

The constant change of the weight is realized through the process, and the sum of the three attention weights is ensured to be equal to 3 through normalization, namely alpha ⁱ +β ⁱ +γ ⁱ ＝3。

Two child weights are finally obtained. Selecting a group of F' = (alpha) with high fitness _n ,β _n ,γ _n ) And weighting the feature map to obtain a feature map of the mixed attention result:

this step is the GAN network parameter evolution section.

Step 2.3: and (4) calculating a loss function. In this embodiment, a loss function formed by combining a softmax loss function and a Cross-entropy loss function (Cross-entropy loss function) is adopted. Output feature map X' of mixed attention module and feature map X of high quality picture ₂ Meanwhile, the method is input into a discriminator of the GAN network for comparison, and comprises the following specific steps:

wherein z is the output result of the fully connected layer of the output characteristic diagram X' of the mixed attention module, z _k Represents the kth value of the full link layer, c is the number of classes, k belongs to {1,2,3, \ 8230;, c }.

wherein f (z) _c ) As an output result of the softmax loss function, y _c Is a truth value X of a high-quality picture sample ₂ So as to calculate and obtain the final loss function.

This step is the GAN network arbiter section.

And (3) stage: the real-time identification of the image is as follows:

step 3.1: and (4) upsampling. Upsampling the feature map X' through an deconvolution layer of a full Convolutional neural network (FCN) to obtain an image set with the size of 224X 224;

step 3.2: PSPNet (Pyramid Scene matching Network) model. A full convolution neural network PSPNet is adopted for segmenting the image instrument header, the PSPNet structure divides the acquired feature layer into grids with different sizes, and then average pooling is respectively carried out inside each grid, so that context information of different areas is aggregated.

Step 3.3: identifying the number in the meter header by adopting a pre-trained convolutional neural network VGG-16 integral model, stacking the convolutional layer in the basic framework of the VGG-16 by adopting a 3 multiplied by 3 convolutional kernel, and adopting a 2 multiplied by 2 window and 2,3 full-connection layers as the step length in the pooling layer; and outputting the recognition result after the softmax normalization function of the soft-max layer.

And (4) stage: offline updating parameters of the GAN network model, specifically as follows:

step 4.1: and (5) performing manual reinspection at regular intervals. After the digital image is used for a period of time, a manual identification mode is utilized for rechecking, and the correct rate delta of digital identification in a rechecked image is calculated. When the accuracy delta is larger than 92%, the model is not updated; otherwise, the error picture is manually adjusted to be a high-quality picture, a high-quality picture set is added, and then the step 1.4 and the step 2 are repeated according to the new high-quality picture set to retrain the whole network so as to obtain brand-new weight distribution.

Step 4.2: and periodically extracting images for network training. After a period of time, randomly extracting partial images (set to 1000 in this embodiment) to perform network training, calculating the accuracy of digital recognition of the partial images and the loss function value of the images, when the accuracy δ is less than or equal to 92%, extracting the first 50 images with large loss function values, placing the images into a high-quality image set after manual adjustment, and repeating step 1.4 and step 2 to perform training again to obtain brand new weight distribution.

Step 4.3: and (3) after 1000 images are identified each time, calculating the accuracy rate delta, and repeating the steps 1 and 2 to retrain the GAN network by taking the images as the image set to be trained when the image number identification accuracy rate delta is less than or equal to 92%. In addition, by checking the picture effect after the image enhancement, the high-quality picture set can be updated, such as adding the instrument image types, or respectively manufacturing different high-quality picture sets.

Example 2

As shown in fig. 5, the present embodiment discloses a system based on the method for identifying a meter number in embodiment 1, which includes the following modules:

a network training module: training a network of images;

a real-time identification module: identifying the image in real time;

The foregoing is considered as illustrative only of the preferred embodiments of the invention and accompanying technical principles. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. The instrument digital identification method based on the dynamic multi-strategy GAN network is characterized by comprising the following steps of:

step 1: processing the collected image data set and extracting image features;

step 2: training a network of images;

and step 3: identifying the image in real time;

2. The method for identifying the number of the meters based on the dynamic multi-strategy GAN network as claimed in claim 1, wherein the step 1 is as follows:

step 1.1: collecting an image;

step 1.2: selecting images with different levels of noise, and processing the images to ensure that the images tend to have high quality;

And high quality picture sets

Step 1.4: adopting the convolution layer in the pre-training VGG-16 model as a convolution kernel for feature extraction, wherein the convolution kernel is used for extracting a picture set to be trained

And high quality picture sets

Feature map set X of ₁ And X ₂

3. The method for identifying the number of the meter based on the dynamic multi-strategy GAN network as claimed in claim 2, wherein the step 2 is as follows:

step 2.1: inputting a feature map set X by adopting a mixed attention module formed by combining three networks of SENEt, DCN and CCNet ₁ Respectively passing through the three networks in parallel; the hybrid attention module is defined as follows:

the first branch is SEnet for automatically learning the importance degrees of different channel features, and the specific steps are as follows:

firstly, a feature map set X is set ₁ C-th feature map x in (1) _c Performing extrusion operation, namely encoding the whole spatial feature on one channel into a global feature, and realizing the feature compression along the spatial dimension by adopting global average pooling, wherein the formula is as follows:

wherein z is _c Representing the value distribution of the c characteristic diagram, namely global information;

then, excitation operation is carried out, and two full connection layers are introduced, wherein the formula is as follows:

s＝F _ex (z,W)＝σ(g(z,W))＝σ(W ₂ ReLU(W ₁ z))

r is a scaling parameter 16; w is a group of ₁ z is the first fully-connected layer process, which plays the role of dimension reduction, reLU () is the commonly used activation functionCounting, keeping the output dimension unchanged; w is a group of ₂ ReLU(W ₁ z) is a second full-connection layer process, the dimension is recovered to the previous dimension, sigma is a sigmoid activation function, and output s is a feature graph weight obtained through the previous full-connection layer learning;

x′ _c ＝F _scale (x _c ,s _c )＝s _c ·x _c

wherein x' _c ∈X′ ₁ ，X′ ₁ The feature graph set is output after passing through the first branch;

in the deformable convolution, the deformable convolution operation and the pooling operation are both two-dimensional and are performed on the same channel, and the convolution kernel R is expanded by adding an offset for each position p on the feature map ₀ The following steps are changed:

wherein q is an integer coordinate on the input feature map x, and p is a floating point coordinate on the input feature map x; g () is a two-wireA linear interpolation function is carried out to finally obtain a feature map X (p) epsilon X' ₂ ；

introducing a cross attention module CCA, firstly performing two 1 × 1 convolutions on the feature diagram x to generate feature diagrams Q and K, and further generating an attention diagram A by performing affinity operation on the Q and the K, wherein the formula is as follows:

wherein a vector Q can be obtained for each position u in the spatial dimension of the feature map Q _u ∈R ^C′ And similarly, extracting the characteristic vector from the K to obtain a set omega _u ∈R ^{(H+W-1)×C′} ，Ω _i,u ∈R ^C′ Represents omega _u The ith element of (2), d _i,u E.g. D represents the characteristic Q _u And omega _i,u Softmax is a common activation function, mapping values in the range (— infinity, + ∞) into values in a (0, 1) interval;

after the operation is finished, performing 1 × 1 convolution on the initial feature map x to generate a feature map V for feature adaptation, and extracting feature vectors from V to obtain a set V _u Then, a cross feature vector phi at the u position is obtained _u And the vectors are positioned in the same row or the same column with the position u, and finally, the aggregation operation is carried out to collect the remote context information, wherein the formula is as follows:

wherein A is _i,u And phi _i,u Is the multiplication of corresponding elements in bit, context information is added to the local feature x to enhance the representation of the local feature and the pixel mode;

in the whole CCNet, a feature diagram x is subjected to global context information extraction through a circulating cross attention module RCCA formed by combining two CCA modules connected in series, and then the extraction is carried outSplicing the global context information and the feature map X to obtain a feature map X' ₃ ；

Wherein i is the group of the ith generation of the genetic algorithm, α is the first branch weight of the mixed attention module, β is the second branch weight of the mixed attention module, and γ is the third branch weight of the mixed attention module; calculating a cross entropy loss function according to the extraction condition and the extraction effect of each group of weight on the picture characteristics, determining a corresponding fitness value, constructing a roulette wheel according to each group of fitness conditions, selecting 2 groups as parents in a roulette wheel mode, and performing cross operation between the selected groups, wherein the cross method comprises the following steps:

wherein

And rand ∈ U (0, 1), η =4;

rand is a random number between 0 and 1, eta is a self-defined distribution factor, and the probability that the offspring approaches the parent is determined; and setting the variation with the probability of 0.5 percent, wherein the variation mode is as follows:

wherein k is a variation constant and r is a random number;

step 2.3: a loss function formed by combining a softmax loss function and a cross entropy loss function is adopted; blending the attention module output profile X' with the high quality picture profile X ₂ Meanwhile, the method is input into a discriminator of the GAN network for comparison, and comprises the following specific steps:

wherein z is the output result of the fully connected layer of the output characteristic diagram X' of the mixed attention module, z _k The kth value of the full link layer is shown, c is the classification number, and k belongs to {1,2,3, \ 8230;, c };

wherein f (z) _c ) As an output result of the softmax loss function, y _c Is a high quality picture sample truth value X ₂ And calculating to obtain the final loss function.

4. The method for identifying the number of the meter based on the dynamic multi-strategy GAN network as claimed in claim 3, wherein the step 3 is as follows:

step 3.1: the feature map X' is subjected to up-sampling through an deconvolution layer of a full convolution neural network to obtain an image set with the size of 224 multiplied by 224;

step 3.2: the segmentation of the image instrument header adopts a full convolution neural network, the full convolution neural network structure divides the obtained characteristic layer into grids with different sizes, and then average pooling is respectively carried out inside each grid, so that context information of different areas is aggregated;

5. The method for identifying the meter number based on the dynamic multi-policy GAN network as claimed in claim 4, wherein the step 4 is as follows:

6. The method for identifying the number of the meters based on the dynamic multi-strategy GAN network as claimed in claim 5, wherein the step 4.3: and updating the high-quality picture set by checking the picture effect after the image enhancement, and increasing the types of instrument images or respectively manufacturing different types of high-quality picture sets.

7. A system based on the digital identification method of the instrument in any one of the claims 1-6, which is characterized by comprising the following modules:

an image collection and feature advancement module: processing the collected image data set and extracting image features;

a network training module: training a network of images;

a real-time identification module: identifying the image in real time;

an offline update module: and when the accuracy is less than the set value, performing offline updating on the model.