CN115439849A - Instrument digital identification method and system based on dynamic multi-strategy GAN network - Google Patents

Instrument digital identification method and system based on dynamic multi-strategy GAN network Download PDF

Info

Publication number
CN115439849A
CN115439849A CN202211211597.5A CN202211211597A CN115439849A CN 115439849 A CN115439849 A CN 115439849A CN 202211211597 A CN202211211597 A CN 202211211597A CN 115439849 A CN115439849 A CN 115439849A
Authority
CN
China
Prior art keywords
feature
image
feature map
follows
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211211597.5A
Other languages
Chinese (zh)
Other versions
CN115439849B (en
Inventor
陈俊宇
胡振华
顾吉轩
滕旭阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202211211597.5A priority Critical patent/CN115439849B/en
Publication of CN115439849A publication Critical patent/CN115439849A/en
Application granted granted Critical
Publication of CN115439849B publication Critical patent/CN115439849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an instrument digital identification method and system based on a dynamic multi-strategy GAN network, wherein the method comprises the following steps: step 1: processing the collected image data set, and extracting image features; step 2: training a network of images; and 3, step 3: identifying the image in real time; and 4, step 4: and when the accuracy is smaller than a set value, performing offline updating on the GAN network model. The method greatly improves the adaptability of the digital recognition model of the instrument to diversified instrument pictures, and can improve the precision of the digital recognition model of the instrument.

Description

Instrument digital identification method and system based on dynamic multi-strategy GAN network
Technical Field
The invention belongs to the technical field of instrument digital target identification, and particularly relates to an instrument digital identification method and system based on a dynamic multi-strategy GAN network.
Background
With the continuous development of science and technology, nowadays, various intelligent charging systems and intelligent data analysis systems are developed. Compared with the traditional digital statistics of manual instruments, the intelligent system has the characteristics of high efficiency, intellectualization and the like, and meanwhile, the labor cost can be reduced, and the statistical period is shortened. However, at present, part of intelligent systems still need manual operation, and the accuracy and the efficiency also have great improvement space. And the instrument digital recognition based on the deep learning technology can help the intelligent charging and data analysis system to further improve the intelligence, reduce the manual participation to the maximum extent, comprehensively improve the recognition efficiency and achieve the effect of double the result with half the effort. A user only needs to submit an instrument photo according to the process, and the background data processing system can automatically recognize the dial plate number in the photo, so that intelligent statistics is realized.
However, due to the diversity of the instruments and the random photographing of users, the digital recognition technology based on deep learning also has many problems, for example, the instrument photo taken by the user has image position deviation, image blurring and brightness imbalance, and even still has image turning, image shielding and image defect, which all cause the final model to fail and can not correctly recognize the numbers on the instruments. Therefore, the invention provides a technical scheme for identifying the number of the instrument based on the dynamic multi-strategy GAN network, which can effectively avoid the influence on the number identification of the instrument caused by the quality problem of the picture uploaded by a user.
At present, a Generative Adaptive Networks (GAN) is an artificial intelligence technology that is widely applied in the fields of image recognition and natural language processing. Compared with the traditional deep learning model, the technology has great advantages in both image recognition speed and recognition accuracy. However, in the face of complex and various instruments and uncertainty of photo submission of users, the GAN network with a single learning strategy cannot adaptively learn various image recognition task patterns. In order to better adapt to the target image data distribution, it is necessary to design a dynamic GAN network capable of self-learning. The Discriminator (Discriminator) compares the picture generated by the Generator (Generator) with the real picture, calculates the optimized parameters and reversely transmits the optimized parameters to the Generator, so that the Generator is forced to learn and generate a more vivid picture, and the picture is continuously circulated to obtain a high-quality picture which is close to the distribution of the real data to the maximum extent.
An Attention Mechanism (Attention Mechanism) is a representative of an identification strategy and is also an image salient region extraction method in deep learning. According to different attention weight applying modes and positions, most models divide an attention mechanism into a space domain, a channel domain and a mixed domain, and only a required domain is selected in actual use. However, in view of the above various reasons that images are difficult to identify due to the diversity of the meters and the random photographing of users, a single policy often can only solve the problem of a certain type of saliency deficiency, so that if a single domain is used, the adaptive requirement of the system on the meter identification cannot be met, and results such as reduced model precision and increased loss function are generated.
Disclosure of Invention
Aiming at the current situation of the prior art, the invention discloses a method and a system for identifying the number of an instrument based on a dynamic multi-strategy GAN network.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
the instrument digital identification method based on the dynamic multi-strategy GAN network comprises the following steps:
step 1: processing the collected image data set, and extracting image features;
step 2: training a network of images;
and step 3: identifying the image in real time;
and 4, step 4: and when the accuracy is smaller than a set value, performing offline updating on the GAN network model.
Preferably, step 1 is specifically as follows:
step 1.1: collecting an image;
step 1.2: selecting images with different levels of noise, and processing the images to ensure that the images tend to have high quality, namely the images have no obvious position deviation, no image blur, no brightness imbalance, no image turnover, no shading and no defect;
step 1.3: adjusting the size of the picture to 224 multiplied by 224 to obtain a picture set to be trained
Figure BDA0003875292190000021
And high quality picture sets
Figure BDA0003875292190000022
Step 1.4: the convolution kernel in the pre-training VGG-16 model is used as a convolution kernel for feature extraction, the VGG-16 is a classic network in a Convolution Neural Network (CNN), and a convolution kernel of 3 x 3 is mainly used, so that the network depth is improved and the learning performance of the neural network is further improved under the condition of the same perception field
Figure BDA0003875292190000023
And high quality picture sets
Figure BDA0003875292190000024
Feature map set X of 1 And X 2
Figure BDA0003875292190000025
Wherein C ', H ', W ' represent the dimension, height and width of the image before convolution, respectively, and C, H, W represent the dimension, height and width of the image after convolution, respectively.
Preferably, step 2 is specifically as follows:
step 2.1: the method adopts a mixed attention module formed by combining three Networks, namely SENet (stress-and-Excitation Networks), DCN (Deformable convolution network) and CCNet (cross-attention Networks), wherein the three Networks are used for enhancing important features by modeling correlation among feature channels so as to improve accuracy by enhancing the important features), and the CCNet (cross-attention Networks) are used for acquiring context information of surrounding pixels on a cross path by introducing a novel CCA module, and each pixel can finally capture remote dependence of all pixels, and an input feature atlas X is combined with the mixed attention module 1 Respectively pass through the three in parallelA network; the mixed attention module is defined as follows:
the first branch is SEnet for automatically learning the importance degrees of different channel characteristics, and the specific steps are as follows:
firstly, a feature map set X is set 1 C-th feature map x in (1) c Performing an extrusion (Squeeze) operation, namely encoding the whole spatial feature on one channel into a global feature, and realizing the feature compression along the spatial dimension by adopting global average pooling, wherein the formula is as follows:
Figure BDA0003875292190000031
wherein z is c And representing the value distribution of the c-th feature map, namely global information.
An Excitation (Excitation) operation follows, which mainly captures the correlation between channels. In order to reduce complexity and improve generalization capability, two full-connection layers are introduced, and the formula is as follows:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 ReLU(W 1 z))
wherein z is the output of the pressing operation, W 1 And W 2 In order to be the weight, the weight is,
Figure BDA0003875292190000032
r is the scaling parameter 16.W 1 z is a first full connection layer process and plays a role in dimension reduction, and ReLU () is a common activation function and keeps the output dimension unchanged; w is a group of 2 ReLU(W 1 z) is a second full-connection layer process, the dimension is recovered to the previous dimension, sigma is a sigmoid activation function, and output s is a feature graph weight obtained through the previous full-connection layer learning;
finally, the activation values s of all channels learned in the excitation operation are calculated c Multiplication by the original feature x c The weight coefficient of each channel of the image can be learned, and the formula is as follows:
x′ c =F scale (x c ,s c )=s c ·x c
wherein x' c ∈X′ 1 ,X′ 1 Namely the feature map set output after passing through the first branch.
The second branch is a DCN which is based on parallel network learning offset so that the convolution kernel is offset at the sampling point of the input characteristic diagram, and the specific steps are as follows:
in the deformable convolution, the deformable convolution operation and the pooling operation are both two-dimensional and are performed on the same channel, and the convolution kernel R is expanded by adding an offset to each position p on the feature map 0 The following steps are changed:
Figure BDA0003875292190000033
wherein p = p 0 +p n +Δp n ,p n Is an enumeration of the listed positions in the convolution kernel R, w is a deformable convolution parameter, and an offset Δ p n The floating point number is obtained through learning, so that the pixel value of the non-integer coordinate position on the input feature map can be obtained through bilinear interpolation on x (p), and the formula is as follows:
Figure BDA0003875292190000041
wherein q is an integer coordinate on the input feature map x, and p is a floating point coordinate on the input feature map x; g () is a bilinear interpolation function, and finally, a feature map X (p) epsilon X 'is obtained' 2
The third branch is CCNet for capturing context information, and the specific steps are as follows:
introducing a cross-Attention Module CCA (Criss-cross Attention Module), firstly performing two 1 × 1 convolutions on the feature map x to generate feature maps Q and K, and further generating an Attention map A by performing Affinity (Affinity) operation on the feature maps Q and K, wherein the formula is as follows:
Figure BDA0003875292190000042
wherein a vector Q can be obtained for each position u in the spatial dimension of the feature map Q u ∈R C′ And similarly, extracting the characteristic vector from the K to obtain a set omega u ∈R (H+W-1)×C′ ,Ω i,u ∈R C′ Represents omega u The ith element of (1), d i,u E.g. D represents the characteristic Q u And omega i,u Softmax is a common activation function that maps values in the range (- ∞, + ∞) to values in the (0, 1) interval.
After the above operations are completed, the initial feature map x is convolved by 1 × 1 to generate a feature map V for feature adaptation, and feature vectors are extracted from V to obtain a set V u Then, a cross feature vector phi at the u position is obtained u These vectors are located in the same row or column with the position u, and finally an Aggregation operation is performed to collect the remote context information, the formula is as follows:
Figure BDA0003875292190000043
wherein A is i,u And phi i,u Is a bitwise multiplication of the corresponding elements and context information is added to the local feature x to enhance the local feature and the representation in pixel-wise manner.
In the whole CCNet, a feature diagram X is subjected to global context information extraction through a cyclic cross attention module RCCA formed by combining two series CCA modules, then the extracted global context information is spliced with the feature diagram X, and finally a feature diagram X 'is obtained' 3
Step 2.2: aiming at weight superposition of a mixed attention mechanism, iteration is performed by adopting a genetic algorithm to obtain a weight distribution optimal solution; the family group initialization adopts a method of generating random numbers to generate 5 groups of random weights with the numerical value ranging from 0.3 to 3
Figure BDA0003875292190000044
Wherein i is the ith generation population of genetic algorithm, and alpha is the mixed noteThe first branch weight of the attention module, β is the second branch weight of the mixed attention module, and γ is the third branch weight of the mixed attention module. Calculating a cross entropy loss function according to the extraction condition and the extraction effect of each group of weight on the picture characteristics, determining a corresponding fitness value, constructing a roulette wheel according to each group of fitness conditions, selecting 2 groups as parents in a roulette wheel mode, and performing cross operation between the selected groups, wherein the cross method comprises the following steps:
Figure BDA0003875292190000051
wherein
Figure BDA0003875292190000052
And rand ∈ U (0, 1), η =4;
rand is a random number between 0 and 1, eta is a self-defined distribution factor, and the probability that the offspring approaches the parent is determined. And setting the variation with the probability of 0.5 percent, wherein the variation mode is as follows:
Figure BDA0003875292190000053
wherein k is a variation constant and r is a random number;
the constant change of the weight is realized through the process, and the sum of the three attention weights is ensured to be equal to 3 through normalization, namely alpha iii =3;
Figure BDA0003875292190000054
Finally, two offspring weights are obtained; selecting a group of F' = (alpha) with high fitness nnn ) And weighting the feature map to obtain a feature map of the mixed attention result:
Figure BDA0003875292190000055
step 2.3: a loss function formed by combining a softmax loss function and a cross entropy loss function is adopted; output feature map X' of mixed attention module and feature map X of high quality picture 2 Meanwhile, the data are input into a discriminator of the GAN network for comparison, and the specific steps are as follows:
firstly, calculating a softmax loss function, wherein the formula is as follows:
Figure BDA0003875292190000056
wherein z is the output result of the full link layer of the output characteristic diagram X' of the mixed attention module, z k The kth value of the full link layer is shown, c is the classification number, and k belongs to {1,2,3, \ 8230;, c };
then, a cross entropy loss function is calculated, and the formula is as follows:
Figure BDA0003875292190000057
wherein f (z) c ) As an output result of the softmax loss function, y c Is a truth value X of a high-quality picture sample 2 And calculating to obtain the final loss function.
Preferably, step 3 is specifically as follows:
step 3.1: upsampling the feature map X' through an deconvolution layer of a full Convolutional neural network (FCN) to obtain an image set with the size of 224X 224;
step 3.2: a full convolution neural Network (PSPNet) is adopted for segmenting the image instrument header, the PSPNet structure divides the acquired feature layer into grids with different sizes, and then average pooling is respectively carried out inside each grid, so that context information of different areas is aggregated.
Step 3.3: identifying the number in the meter header by adopting a pre-trained convolutional neural network VGG-16 integral model, stacking the convolutional layer in the basic framework of the VGG-16 by adopting a 3 multiplied by 3 convolutional kernel, and adopting a 2 multiplied by 2 window and 2,3 full-connection layers as the step length in the pooling layer; and outputting the recognition result after passing through a softmax normalization function of a soft-max layer.
Preferably, step 4 is specifically as follows:
step 4.1: rechecking after a period of time, and calculating the correct rate delta of digital identification in the rechecked image; when the accuracy delta is larger than 92%, the model is not updated; otherwise, the error picture is adjusted to be a high-quality picture, a high-quality picture set is added, and then the step 1.4 and the step 2 are repeated according to the new high-quality picture set to retrain the whole network so as to obtain brand-new weight distribution;
step 4.2: randomly extracting partial images for network training after a period of time, calculating the accuracy of digital identification of the partial images and the loss function value of the images, when the accuracy delta is less than or equal to 92%, extracting the first 50 images with large loss function values, adjusting the images, putting the images into a high-quality image set, and repeating the step 1.4 and the step 2 for training again to obtain brand new weight distribution;
step 4.3: and (3) calculating the accuracy rate delta after the pictures with the set number are identified each time, and repeating the steps 1 and 2 by taking the pictures as the picture set to be trained when the image number identification accuracy rate delta is less than or equal to 92%.
Preferably, step 4.3: and updating the high-quality picture set by checking the picture effect after the image enhancement, and increasing the types of instrument images or respectively manufacturing different types of high-quality picture sets.
The invention also discloses a system based on the instrument digital identification method, which comprises the following modules:
an image collection and feature advancement module: processing the collected image data set, and extracting image features;
a network training module: training a network of images;
a real-time identification module: identifying the image in real time;
an offline update module: and when the accuracy is smaller than the set value, performing off-line updating on the model.
The invention provides a multi-strategy hybrid attention mechanism model, namely a cross attention network (Criss-cross Networks, CCNet) in a spatial domain is introduced to acquire context information of surrounding pixels on a cross path by introducing a novel CCA module, each pixel can finally capture remote dependency relationship of all pixels), a Deformable Convolutional network (DCN, which can better adapt to geometric deformation of an image through change of a receptive field) and a squeezing and Excitation network (Squeeze-and-Excitation Networks, SENET) in a channel domain are combined into a hybrid attention mechanism module, important features are strengthened by modeling correlation among characteristic channels so as to improve accuracy, and weights of various attention mechanisms are dynamically optimized in a training process through a Genetic Algorithm (GA), so that an approximately optimal solution distributed by the attention mechanism is obtained; and finally, adding the weights of all parts to obtain an enhanced image. The invention greatly improves the adaptation degree of the digital recognition model of the instrument to diversified instrument pictures and can improve the precision of the digital recognition model of the instrument.
Drawings
Fig. 1 is a flow chart of a meter number identification method based on a dynamic multi-policy GAN network according to the present invention.
FIG. 2 is a flow diagram of a hybrid attention module of the present invention.
FIG. 3 is a flow diagram of the mixed attention module SENET of the present invention.
Fig. 4 is a schematic flow diagram of the hybrid attention module CCNet of the present invention.
Fig. 5 is a block diagram of a meter number identification system based on a dynamic multi-policy GAN network according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Example 1
As shown in fig. 1 to 4, the method for identifying a meter number based on a dynamic multi-policy GAN network in this embodiment includes the following specific steps:
stage 1: image dataset processing, as follows:
step 1.1: and (4) collecting images. The picture of the embodiment is from the field real-time instrument picture taken by each house of a certain national enterprise in Beijing.
Step 1.2: high quality pictures. The method comprises the steps of collecting images with different levels of noise, denoising and cutting the images through manual operation, and adjusting parameters such as contrast, saturation and exposure of the images to enable the images to tend to be high in quality.
Step 1.3: and (5) adjusting the size of the image. The method comprises the steps of using a Python third-party image processing Library PIL (Python Imaging Library) Library to adjust the size of pictures in batches, uniformly modifying the size of the pictures into 224 multiplied by 224, facilitating feature extraction and inputting of a picture enhancement network module, and obtaining a picture set to be trained
Figure BDA0003875292190000071
And high quality picture sets
Figure BDA0003875292190000072
Step 1.4: and (5) extracting image features. The feature extraction means that a corresponding feature map can be obtained by convolving an image by a convolution kernel, and information of image features can be obtained by the action of a plurality of convolution kernels. The invention adopts the convolution layer in the pre-training VGG-16 model as the convolution kernel for feature extraction, the VGG-16 is a classic network in the Convolution Neural Network (CNN), and 3 x 3 convolution kernels are mainly used, so that the network depth is improved and the learning performance of the neural network is further improved under the condition of the same perception field
Figure BDA0003875292190000081
And high quality picture sets
Figure BDA0003875292190000082
Feature map set X of 1 And X 2
Figure BDA0003875292190000083
Wherein C ', H ', W ' represent the dimension, height and width of the image before convolution, respectively, and C, H, W represent the dimension, height and width of the image after convolution, respectively.
And (2) stage: network training of images is specifically as follows:
step 2.1: mixed attention settings. In the embodiment, a mixed attention module formed by combining three networks of SENEt, DCN and CCNet is adopted, the SENEt emphasizes the channel characteristics of the input image, the DCN and the CCNet emphasize the spatial characteristics of the input image, the DCN emphasizes the relationship between adjacent pixel points of the image, and the CCNet emphasizes the overall situation but focuses on the image key information at the same time. Input feature set X 1 Respectively through the three networks in parallel. The hybrid attention module is defined as follows:
the first branch is SEnet capable of automatically learning the importance degrees of different channel features, and the specific steps are as follows:
firstly, a feature map set X is set 1 C-th feature map x in (1) c Performing an extrusion (Squeeze) operation, i.e. encoding the whole spatial feature on one channel into a global feature, and implementing by using global average pooling to achieve the purpose of compressing the feature along the spatial dimension, wherein the formula is as follows:
Figure BDA0003875292190000084
wherein z is c And representing the value distribution of the c-th feature map, namely global information.
An Excitation (Excitation) operation follows, which mainly captures the correlation between channels. In order to reduce complexity and improve generalization capability, two full-connection layers are introduced, and the formula is as follows:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 ReLU(W 1 z))
wherein z is the output of the pressing operation, W 1 And W 2 In order to be the weight, the weight is,
Figure BDA0003875292190000085
r is the scaling parameter 16.W is a group of 1 z is the first fully-connected layer process, which acts as a dimension-reduction, reLU ()Keeping the output dimension unchanged for a commonly used activation function; w 2 ReLU(W 1 z) is a second full-connection layer process, the dimension is recovered to the previous dimension, sigma is a sigmoid activation function, and output s is a feature graph weight obtained through the previous full-connection layer learning;
finally, the activation value s of each channel learned in the excitation operation is calculated c Multiplication by the original feature x c The weight coefficient of each channel of the image can be learned, and the formula is as follows:
x′ c =F scale (x c ,s c )=s c ·x c
wherein x' c ∈X′ 1 ,X′ 1 Namely the feature map set output after passing through the first branch.
The second branch is a DCN which learns offset based on a parallel network so as to enable a convolution kernel to offset at a sampling point of an input feature map, and the DCN comprises the following specific steps:
in the deformable convolution, the deformable convolution operation and the pooling operation are both two-dimensional and are performed on the same channel, and the convolution kernel R is expanded by adding an offset to each position p on the feature map 0 The following steps are changed:
Figure BDA0003875292190000091
wherein p = p 0 +p n +Δp n ,p n Is an enumeration of the listed positions in the convolution kernel R, w is a deformable convolution parameter, and an offset Δ p n The floating point number is obtained through learning, so that the pixel value of the non-integer coordinate position on the input feature map can be obtained through bilinear interpolation on x (p), and the formula is as follows:
Figure BDA0003875292190000092
wherein q is an integer coordinate on the input feature map x, and p is a floating point coordinate on the input feature map x; g () isBilinear interpolation function to obtain feature map X (p) epsilon X' 2
The third branch is CCNet with more efficient and effective capture of context information, which comprises the following steps:
to model remote context dependencies of local feature representations using lightweight computing and memory, we introduce a cross-Attention Module (CCA). The CCA module collects context information in the horizontal and vertical directions to enhance the per-pixel functionality.
The CCA module first performs two 1 × 1 convolutions on the feature map x to generate feature maps Q and K, and further generates an attention map a by performing Affinity (Affinity) operation on Q and K, and the formula is as follows:
Figure BDA0003875292190000093
wherein a vector Q can be obtained for each position u in the spatial dimension of the feature map Q u ∈R C′ And similarly, extracting the characteristic vector from the K to obtain a set omega u ∈R (H+W-1)×C′ ,Ω i,u ∈R C′ Represents omega u The ith element of (1), d i,u E.g. D represents the characteristic Q u And omega i,u Softmax is a common activation function that maps values in the range (- ∞, + ∞) to values in the (0, 1) interval.
After the operation is finished, performing 1 × 1 convolution on the initial feature map x to generate a feature map V for feature adaptation, and extracting feature vectors from V to obtain a set V u Then, a cross feature vector phi at the u position is obtained u These vectors are located in the same row or column with the position u, and finally an Aggregation operation is performed to collect remote context information, as follows:
Figure BDA0003875292190000094
wherein A is i,u And phi i,u Is toThe elements are multiplied by each other in a bit mode, and context information is added to the local feature x to enhance the representation of the local feature and the pixel mode, so that the method has a wide context view and improves feature expression.
In the whole CCNet, a feature map X is subjected to global context information extraction through a cyclic cross-Attention Module (RCCA) formed by combining two serially-connected RCAs (cross-Attention modules), and then the extracted global context information and the feature map X are spliced to obtain a feature map X' 3
The above steps 1.4 to 2.1 are part of the GAN network generator.
Step 2.2: hybrid attention adaptive weight assignment. Aiming at weight superposition of a mixed attention mechanism, the method adopts genetic algorithm iteration to obtain a weight distribution optimal solution. The family group initialization adopts a method of generating random numbers to generate 5 groups of random weights with the numerical value ranging from 0.3 to 3
Figure BDA0003875292190000101
Wherein i is the ith generation population of the genetic algorithm, α is the first branch weight of the mixed attention module, β is the second branch weight of the mixed attention module, and γ is the third branch weight of the mixed attention module. Calculating a loss function according to the extraction condition and the extraction effect of each group of weight on the picture characteristics, determining a corresponding fitness value, constructing a roulette wheel according to each group of fitness conditions, selecting 2 groups as parents in a roulette wheel mode, and performing cross operation between the selected groups, wherein the cross method comprises the following steps:
Figure BDA0003875292190000102
(for beta, gamma)
Wherein
Figure BDA0003875292190000103
And rand ∈ U (0, 1), η =4.
rand is a random number between 0 and 1, eta is a self-defined distribution factor, and the probability that the offspring approaches the parent is determined. And setting the variation with the probability of 0.5 percent, wherein the variation mode is as follows:
Figure BDA0003875292190000104
(for beta, gamma)
Where k is a variation constant and r is a random number.
The constant change of the weight is realized through the process, and the sum of the three attention weights is ensured to be equal to 3 through normalization, namely alpha iii =3。
Figure BDA0003875292190000105
Two child weights are finally obtained. Selecting a group of F' = (alpha) with high fitness nnn ) And weighting the feature map to obtain a feature map of the mixed attention result:
Figure BDA0003875292190000106
this step is the GAN network parameter evolution section.
Step 2.3: and (4) calculating a loss function. In this embodiment, a loss function formed by combining a softmax loss function and a Cross-entropy loss function (Cross-entropy loss function) is adopted. Output feature map X' of mixed attention module and feature map X of high quality picture 2 Meanwhile, the method is input into a discriminator of the GAN network for comparison, and comprises the following specific steps:
firstly, calculating a softmax loss function, wherein the formula is as follows:
Figure BDA0003875292190000111
wherein z is the output result of the fully connected layer of the output characteristic diagram X' of the mixed attention module, z k Represents the kth value of the full link layer, c is the number of classes, k belongs to {1,2,3, \ 8230;, c }.
Then, a cross entropy loss function is calculated, and the formula is as follows:
Figure BDA0003875292190000112
wherein f (z) c ) As an output result of the softmax loss function, y c Is a truth value X of a high-quality picture sample 2 So as to calculate and obtain the final loss function.
This step is the GAN network arbiter section.
And (3) stage: the real-time identification of the image is as follows:
step 3.1: and (4) upsampling. Upsampling the feature map X' through an deconvolution layer of a full Convolutional neural network (FCN) to obtain an image set with the size of 224X 224;
step 3.2: PSPNet (Pyramid Scene matching Network) model. A full convolution neural network PSPNet is adopted for segmenting the image instrument header, the PSPNet structure divides the acquired feature layer into grids with different sizes, and then average pooling is respectively carried out inside each grid, so that context information of different areas is aggregated.
Step 3.3: identifying the number in the meter header by adopting a pre-trained convolutional neural network VGG-16 integral model, stacking the convolutional layer in the basic framework of the VGG-16 by adopting a 3 multiplied by 3 convolutional kernel, and adopting a 2 multiplied by 2 window and 2,3 full-connection layers as the step length in the pooling layer; and outputting the recognition result after the softmax normalization function of the soft-max layer.
And (4) stage: offline updating parameters of the GAN network model, specifically as follows:
step 4.1: and (5) performing manual reinspection at regular intervals. After the digital image is used for a period of time, a manual identification mode is utilized for rechecking, and the correct rate delta of digital identification in a rechecked image is calculated. When the accuracy delta is larger than 92%, the model is not updated; otherwise, the error picture is manually adjusted to be a high-quality picture, a high-quality picture set is added, and then the step 1.4 and the step 2 are repeated according to the new high-quality picture set to retrain the whole network so as to obtain brand-new weight distribution.
Step 4.2: and periodically extracting images for network training. After a period of time, randomly extracting partial images (set to 1000 in this embodiment) to perform network training, calculating the accuracy of digital recognition of the partial images and the loss function value of the images, when the accuracy δ is less than or equal to 92%, extracting the first 50 images with large loss function values, placing the images into a high-quality image set after manual adjustment, and repeating step 1.4 and step 2 to perform training again to obtain brand new weight distribution.
Step 4.3: and (3) after 1000 images are identified each time, calculating the accuracy rate delta, and repeating the steps 1 and 2 to retrain the GAN network by taking the images as the image set to be trained when the image number identification accuracy rate delta is less than or equal to 92%. In addition, by checking the picture effect after the image enhancement, the high-quality picture set can be updated, such as adding the instrument image types, or respectively manufacturing different high-quality picture sets.
Example 2
As shown in fig. 5, the present embodiment discloses a system based on the method for identifying a meter number in embodiment 1, which includes the following modules:
an image collection and feature advancement module: processing the collected image data set, and extracting image features;
a network training module: training a network of images;
a real-time identification module: identifying the image in real time;
an offline update module: and when the accuracy is smaller than the set value, performing off-line updating on the model.
The foregoing is considered as illustrative only of the preferred embodiments of the invention and accompanying technical principles. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (7)

1. The instrument digital identification method based on the dynamic multi-strategy GAN network is characterized by comprising the following steps of:
step 1: processing the collected image data set and extracting image features;
step 2: training a network of images;
and step 3: identifying the image in real time;
and 4, step 4: and when the accuracy is smaller than a set value, performing offline updating on the GAN network model.
2. The method for identifying the number of the meters based on the dynamic multi-strategy GAN network as claimed in claim 1, wherein the step 1 is as follows:
step 1.1: collecting an image;
step 1.2: selecting images with different levels of noise, and processing the images to ensure that the images tend to have high quality;
step 1.3: adjusting the size of the picture to 224 multiplied by 224 to obtain a picture set to be trained
Figure FDA0003875292180000011
And high quality picture sets
Figure FDA0003875292180000012
Step 1.4: adopting the convolution layer in the pre-training VGG-16 model as a convolution kernel for feature extraction, wherein the convolution kernel is used for extracting a picture set to be trained
Figure FDA0003875292180000013
And high quality picture sets
Figure FDA0003875292180000014
Feature map set X of 1 And X 2
Figure FDA0003875292180000015
Wherein C ', H ', W ' represent the dimension, height and width of the image before convolution, respectively, and C, H, W represent the dimension, height and width of the image after convolution, respectively.
3. The method for identifying the number of the meter based on the dynamic multi-strategy GAN network as claimed in claim 2, wherein the step 2 is as follows:
step 2.1: inputting a feature map set X by adopting a mixed attention module formed by combining three networks of SENEt, DCN and CCNet 1 Respectively passing through the three networks in parallel; the hybrid attention module is defined as follows:
the first branch is SEnet for automatically learning the importance degrees of different channel features, and the specific steps are as follows:
firstly, a feature map set X is set 1 C-th feature map x in (1) c Performing extrusion operation, namely encoding the whole spatial feature on one channel into a global feature, and realizing the feature compression along the spatial dimension by adopting global average pooling, wherein the formula is as follows:
Figure FDA0003875292180000016
wherein z is c Representing the value distribution of the c characteristic diagram, namely global information;
then, excitation operation is carried out, and two full connection layers are introduced, wherein the formula is as follows:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 ReLU(W 1 z))
wherein z is the output of the pressing operation, W 1 And W 2 In order to be the weight, the weight is,
Figure FDA0003875292180000017
r is a scaling parameter 16; w is a group of 1 z is the first fully-connected layer process, which plays the role of dimension reduction, reLU () is the commonly used activation functionCounting, keeping the output dimension unchanged; w is a group of 2 ReLU(W 1 z) is a second full-connection layer process, the dimension is recovered to the previous dimension, sigma is a sigmoid activation function, and output s is a feature graph weight obtained through the previous full-connection layer learning;
finally, the activation value s of each channel learned in the excitation operation is calculated c Multiplication by the original feature x c The weight coefficient of each channel of the image can be learned, and the formula is as follows:
x′ c =F scale (x c ,s c )=s c ·x c
wherein x' c ∈X′ 1 ,X′ 1 The feature graph set is output after passing through the first branch;
the second branch is a DCN which learns offset based on a parallel network so as to enable a convolution kernel to offset at a sampling point of an input feature map, and the DCN comprises the following specific steps:
in the deformable convolution, the deformable convolution operation and the pooling operation are both two-dimensional and are performed on the same channel, and the convolution kernel R is expanded by adding an offset for each position p on the feature map 0 The following steps are changed:
Figure FDA0003875292180000021
wherein p = p 0 +p n +Δp n ,p n Is an enumeration of the listed positions in the convolution kernel R, w is a deformable convolution parameter, and an offset Δ p n The floating point number is obtained through learning, so that the pixel value of the non-integer coordinate position on the input feature map can be obtained through bilinear interpolation on x (p), and the formula is as follows:
Figure FDA0003875292180000022
wherein q is an integer coordinate on the input feature map x, and p is a floating point coordinate on the input feature map x; g () is a two-wireA linear interpolation function is carried out to finally obtain a feature map X (p) epsilon X' 2
The third branch is CCNet for capturing context information, and the specific steps are as follows:
introducing a cross attention module CCA, firstly performing two 1 × 1 convolutions on the feature diagram x to generate feature diagrams Q and K, and further generating an attention diagram A by performing affinity operation on the Q and the K, wherein the formula is as follows:
Figure FDA0003875292180000023
wherein a vector Q can be obtained for each position u in the spatial dimension of the feature map Q u ∈R C′ And similarly, extracting the characteristic vector from the K to obtain a set omega u ∈R (H+W-1)×C′ ,Ω i,u ∈R C′ Represents omega u The ith element of (2), d i,u E.g. D represents the characteristic Q u And omega i,u Softmax is a common activation function, mapping values in the range (— infinity, + ∞) into values in a (0, 1) interval;
after the operation is finished, performing 1 × 1 convolution on the initial feature map x to generate a feature map V for feature adaptation, and extracting feature vectors from V to obtain a set V u Then, a cross feature vector phi at the u position is obtained u And the vectors are positioned in the same row or the same column with the position u, and finally, the aggregation operation is carried out to collect the remote context information, wherein the formula is as follows:
Figure FDA0003875292180000031
wherein A is i,u And phi i,u Is the multiplication of corresponding elements in bit, context information is added to the local feature x to enhance the representation of the local feature and the pixel mode;
in the whole CCNet, a feature diagram x is subjected to global context information extraction through a circulating cross attention module RCCA formed by combining two CCA modules connected in series, and then the extraction is carried outSplicing the global context information and the feature map X to obtain a feature map X' 3
Step 2.2: aiming at weight superposition of a mixed attention mechanism, iteration is performed by adopting a genetic algorithm to obtain a weight distribution optimal solution; the family group initialization adopts a method of generating random numbers to generate 5 groups of random weights with the numerical value ranging from 0.3 to 3
Figure FDA0003875292180000032
Wherein i is the group of the ith generation of the genetic algorithm, α is the first branch weight of the mixed attention module, β is the second branch weight of the mixed attention module, and γ is the third branch weight of the mixed attention module; calculating a cross entropy loss function according to the extraction condition and the extraction effect of each group of weight on the picture characteristics, determining a corresponding fitness value, constructing a roulette wheel according to each group of fitness conditions, selecting 2 groups as parents in a roulette wheel mode, and performing cross operation between the selected groups, wherein the cross method comprises the following steps:
Figure FDA0003875292180000033
wherein
Figure FDA0003875292180000034
And rand ∈ U (0, 1), η =4;
rand is a random number between 0 and 1, eta is a self-defined distribution factor, and the probability that the offspring approaches the parent is determined; and setting the variation with the probability of 0.5 percent, wherein the variation mode is as follows:
Figure FDA0003875292180000035
wherein k is a variation constant and r is a random number;
the constant change of the weight is realized through the process, and the sum of the three attention weights is ensured to be equal to 3 through normalization, namely alpha iii =3;
Figure FDA0003875292180000036
Finally, two offspring weights are obtained; selecting a group of F' = (alpha) with high fitness nnn ) And weighting the feature map to obtain a feature map of the mixed attention result:
Figure FDA0003875292180000037
step 2.3: a loss function formed by combining a softmax loss function and a cross entropy loss function is adopted; blending the attention module output profile X' with the high quality picture profile X 2 Meanwhile, the method is input into a discriminator of the GAN network for comparison, and comprises the following specific steps:
firstly, calculating a softmax loss function, wherein the formula is as follows:
Figure FDA0003875292180000041
wherein z is the output result of the fully connected layer of the output characteristic diagram X' of the mixed attention module, z k The kth value of the full link layer is shown, c is the classification number, and k belongs to {1,2,3, \ 8230;, c };
then, a cross entropy loss function is calculated, and the formula is as follows:
Figure FDA0003875292180000042
wherein f (z) c ) As an output result of the softmax loss function, y c Is a high quality picture sample truth value X 2 And calculating to obtain the final loss function.
4. The method for identifying the number of the meter based on the dynamic multi-strategy GAN network as claimed in claim 3, wherein the step 3 is as follows:
step 3.1: the feature map X' is subjected to up-sampling through an deconvolution layer of a full convolution neural network to obtain an image set with the size of 224 multiplied by 224;
step 3.2: the segmentation of the image instrument header adopts a full convolution neural network, the full convolution neural network structure divides the obtained characteristic layer into grids with different sizes, and then average pooling is respectively carried out inside each grid, so that context information of different areas is aggregated;
step 3.3: identifying the number in the meter header by adopting a pre-trained convolutional neural network VGG-16 integral model, stacking the convolutional layer in the basic framework of the VGG-16 by adopting a 3 multiplied by 3 convolutional kernel, and adopting a 2 multiplied by 2 window and 2,3 full-connection layers as the step length in the pooling layer; and outputting the recognition result after passing through a softmax normalization function of a soft-max layer.
5. The method for identifying the meter number based on the dynamic multi-policy GAN network as claimed in claim 4, wherein the step 4 is as follows:
step 4.1: rechecking after a period of time, and calculating the correct rate delta of digital identification in the rechecked image; when the accuracy delta is larger than 92%, the model is not updated; otherwise, the error picture is adjusted to be a high-quality picture, a high-quality picture set is added, and then the step 1.4 and the step 2 are repeated according to the new high-quality picture set to retrain the whole network so as to obtain brand-new weight distribution;
step 4.2: randomly extracting partial images for network training after a period of time, calculating the accuracy of digital identification of the partial images and the loss function value of the images, when the accuracy delta is less than or equal to 92%, extracting the first 50 images with large loss function values, adjusting the images, putting the images into a high-quality image set, and repeating the step 1.4 and the step 2 for training again to obtain brand new weight distribution;
step 4.3: and (3) calculating the accuracy rate delta after the pictures with the set number are identified each time, and repeating the steps 1 and 2 by taking the pictures as the picture set to be trained when the image number identification accuracy rate delta is less than or equal to 92%.
6. The method for identifying the number of the meters based on the dynamic multi-strategy GAN network as claimed in claim 5, wherein the step 4.3: and updating the high-quality picture set by checking the picture effect after the image enhancement, and increasing the types of instrument images or respectively manufacturing different types of high-quality picture sets.
7. A system based on the digital identification method of the instrument in any one of the claims 1-6, which is characterized by comprising the following modules:
an image collection and feature advancement module: processing the collected image data set and extracting image features;
a network training module: training a network of images;
a real-time identification module: identifying the image in real time;
an offline update module: and when the accuracy is less than the set value, performing offline updating on the model.
CN202211211597.5A 2022-09-30 2022-09-30 Instrument digital identification method and system based on dynamic multi-strategy GAN network Active CN115439849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211211597.5A CN115439849B (en) 2022-09-30 2022-09-30 Instrument digital identification method and system based on dynamic multi-strategy GAN network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211211597.5A CN115439849B (en) 2022-09-30 2022-09-30 Instrument digital identification method and system based on dynamic multi-strategy GAN network

Publications (2)

Publication Number Publication Date
CN115439849A true CN115439849A (en) 2022-12-06
CN115439849B CN115439849B (en) 2023-09-08

Family

ID=84251574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211211597.5A Active CN115439849B (en) 2022-09-30 2022-09-30 Instrument digital identification method and system based on dynamic multi-strategy GAN network

Country Status (1)

Country Link
CN (1) CN115439849B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036875A (en) * 2023-07-11 2023-11-10 南京航空航天大学 Infrared weak and small moving target generation algorithm based on fusion attention GAN

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830271A (en) * 2018-06-13 2018-11-16 深圳市云识科技有限公司 A kind of digital displaying meter Recognition of Reading method based on convolutional neural networks
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
CN114266898A (en) * 2022-01-11 2022-04-01 辽宁石油化工大学 Liver cancer identification method based on improved EfficientNet
CN114782669A (en) * 2022-01-07 2022-07-22 西安理工大学 Digital instrument automatic identification, positioning and reading method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830271A (en) * 2018-06-13 2018-11-16 深圳市云识科技有限公司 A kind of digital displaying meter Recognition of Reading method based on convolutional neural networks
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
CN114782669A (en) * 2022-01-07 2022-07-22 西安理工大学 Digital instrument automatic identification, positioning and reading method based on deep learning
CN114266898A (en) * 2022-01-11 2022-04-01 辽宁石油化工大学 Liver cancer identification method based on improved EfficientNet

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036875A (en) * 2023-07-11 2023-11-10 南京航空航天大学 Infrared weak and small moving target generation algorithm based on fusion attention GAN
CN117036875B (en) * 2023-07-11 2024-04-26 南京航空航天大学 Infrared weak and small moving target generation algorithm based on fusion attention GAN

Also Published As

Publication number Publication date
CN115439849B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN110163808B (en) Single-frame high-dynamic imaging method based on convolutional neural network
Ren et al. Low-light image enhancement via a deep hybrid network
CN110378844B (en) Image blind motion blur removing method based on cyclic multi-scale generation countermeasure network
Lin et al. Hyperspectral image denoising via matrix factorization and deep prior regularization
CN111028177B (en) Edge-based deep learning image motion blur removing method
CN109035142B (en) Satellite image super-resolution method combining countermeasure network with aerial image prior
CN110634108B (en) Composite degraded network live broadcast video enhancement method based on element-cycle consistency confrontation network
Guo et al. Multiview high dynamic range image synthesis using fuzzy broad learning system
CN112541877B (en) Defuzzification method, system, equipment and medium for generating countermeasure network based on condition
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
CN111861886B (en) Image super-resolution reconstruction method based on multi-scale feedback network
CN111583285A (en) Liver image semantic segmentation method based on edge attention strategy
CN111047543A (en) Image enhancement method, device and storage medium
CN114266898A (en) Liver cancer identification method based on improved EfficientNet
CN112561799A (en) Infrared image super-resolution reconstruction method
CN112699838A (en) Hyperspectral mixed pixel nonlinear blind decomposition method based on spectral diagnosis characteristic weighting
CN115439849B (en) Instrument digital identification method and system based on dynamic multi-strategy GAN network
CN113379606B (en) Face super-resolution method based on pre-training generation model
Choi et al. Test-time adaptation for video frame interpolation via meta-learning
Wen et al. The power of complementary regularizers: Image recovery via transform learning and low-rank modeling
CN115358952B (en) Image enhancement method, system, equipment and storage medium based on meta-learning
CN116152128A (en) High dynamic range multi-exposure image fusion model and method based on attention mechanism
Siddiqui et al. Hierarchical color correction for camera cell phone images
CN110648291B (en) Unmanned aerial vehicle motion blurred image restoration method based on deep learning
CN115346091A (en) Method and device for generating Mura defect image data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant