CN113673509A - Instrument detection and classification method based on image text - Google Patents

Instrument detection and classification method based on image text Download PDF

Info

Publication number
CN113673509A
CN113673509A CN202110855223.6A CN202110855223A CN113673509A CN 113673509 A CN113673509 A CN 113673509A CN 202110855223 A CN202110855223 A CN 202110855223A CN 113673509 A CN113673509 A CN 113673509A
Authority
CN
China
Prior art keywords
layer
convolution
network
training
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110855223.6A
Other languages
Chinese (zh)
Other versions
CN113673509B (en
Inventor
田联房
王昭霖
杜启亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Original Assignee
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Zhuhai Institute of Modern Industrial Innovation of South China University of Technology filed Critical South China University of Technology SCUT
Priority to CN202110855223.6A priority Critical patent/CN113673509B/en
Publication of CN113673509A publication Critical patent/CN113673509A/en
Application granted granted Critical
Publication of CN113673509B publication Critical patent/CN113673509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an instrument detection and classification method based on an image text, which comprises the following steps: 1) constructing an instrument positioning data set, improving YOLO network training, and outputting a dial plate image by using a network; 2) constructing a character detection data set, improving EAST network training, and outputting character images by using a network; 3) constructing a character recognition data set, performing CRNN network training, and outputting character information by using a network; 4) and constructing a text classification data set, carrying out TextCNN network training, and outputting an instrument type by using a network. The invention realizes instrument detection and instrument text information detection and identification by using the neural network, has higher precision and better generalization capability under different backgrounds, can accurately detect instruments with different sizes, has no limitation of acquisition angle and distance, can solve the problem that the positions of the instruments can be identified but the types of the instruments are difficult to distinguish by using character information on the instruments, and can detect the instruments and identify the types of the instruments.

Description

Instrument detection and classification method based on image text
Technical Field
The invention relates to the technical field of image processing and neural networks, in particular to an instrument detection and classification method based on an image text.
Background
The instrument is used as a monitoring device, mainly comprises a pressure instrument, a temperature instrument, a flow instrument, an electrical instrument and an electronic measuring instrument, is widely applied to various aspects of industrial production and social life, and provides great convenience for life production. Compared with an artificial classification method, the method has the advantages of wide application range and high classification efficiency, and gradually becomes mainstream along with the development of an image processing technology and a neural network technology.
At present, research and implementation of an instrument classification method mainly focuses on classification training of different types of instrument images by using a neural network, and the method has some defects, for example, the instrument images to be identified need training, the distinction degree of different types of instruments on the images is not very large, and the depth network does not have ideal performance of identification and classification of different instruments. At present, research and implementation of a character recognition method on an instrument mainly focus on application of a traditional image processing technology, and specifically, character information is obtained through a series of image processing technologies of filtering, graying, thresholding, edge detection and template detection. With the rapid development of image processing and neural network technology in recent years, character positioning, recognition and classification by using a neural network become possible. The method mainly comprises three algorithms of text detection, text recognition and text classification, wherein the text detection mainly carries out quadrilateral frame positioning on character information through a neural network, reading of the text information on the instrument is realized through a text recognition algorithm, and the text information is classified through a text classification algorithm to obtain the type of the instrument.
In combination with the above discussion, the method for detecting and classifying the instrument with real-time performance and high precision has higher practical application value.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an image text-based instrument detection and classification method, which uses a neural network to realize instrument detection and instrument text information detection and identification, has higher precision and better generalization capability under different backgrounds, can accurately detect instruments with different sizes, has no limitation of acquisition angle and distance, can solve the problem that the positions of the instruments can be identified but the types of the instruments are difficult to distinguish by using character information on the instruments, and can detect the instruments and identify the types of the instruments.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a meter detection and classification method based on image texts comprises the following steps:
1) marking dial plate positions by using an instrument image to construct an instrument positioning data set, dividing the instrument positioning data set into a training set and a testing set, training and improving a YOLO network by using the training set according to loading parameters, obtaining an optimal improved YOLO network after training is finished, inputting the testing set into the optimal improved YOLO network, outputting a dial plate image and cutting the dial plate image; the improved YOLO network is characterized in that a backbone network is optimized into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;
2) marking the character positions in the dial image cut out in the step 1) to construct a character detection data set, dividing the character detection data set into a training set and a testing set, then loading training parameters, training an improved EAST network by using the training set, obtaining an optimal improved EAST network after training is finished, inputting the testing set into the optimal improved EAST network, outputting the character positions in the character detection data set and cutting the character positions into character images; the improved EAST network is characterized in that a backbone network is changed into VGG to improve network detection accuracy, and an output layer structure modification prediction module of the improved EAST network only predicts a vertex by using a head element to improve the prediction performance of a long character;
3) marking the character information in the character image cut in the step 2) to construct a character recognition data set, dividing the character recognition data set into a training set and a testing set, then loading training parameters to train the CRNN by using the training set, obtaining an optimal CRNN after training is finished, inputting the testing set into the optimal CRNN, and outputting the character information;
4) splicing the character information output in the step 3) into a text, labeling the instrument type corresponding to the text to construct a text classification data set, dividing the text classification data set into a training set and a test set, loading training parameters to train a TextCNN network by using the training set, obtaining an optimal TextCNN network after the training is finished, and inputting the test set into the optimal TextCNN network to output the instrument type corresponding to the text.
Further, in step 1), firstly, various instrument images in different environments are collected through a camera, preprocessing operations of filtering and image enhancement are carried out on the instrument images, then abnormal data in the instrument images are removed, the abnormal data include data with surface dirt, extreme illumination and incomplete shooting abnormality, then the rest data are marked, the marked content is the dial position, an instrument positioning data set is constructed, and the instrument positioning data set is divided into a training set and a testing set.
Further, in step 1), the specific conditions of the improved YOLO network are as follows:
a. constructing a feature extraction network according to the requirements of real-time performance and high precision:
the first layer is a combined convolution module 1-A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the second layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the third layer is a combined convolution module 1-C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fifth layer is a combined convolution module 1-C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the sixth layer is a combined convolution module 1-B which consists of a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the seventh layer is a combined convolution module 1-C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the eighth layer is a combined convolution module 1-D which consists of five combined convolution modules 1-B;
the ninth layer is a combined convolution module 1-C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the tenth layer is a combined convolution module 1-B which consists of a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
b. constructing prediction networks for outputting and predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network;
b1, inputting the data as the tenth layer output of the feature extraction network, wherein the large-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and has the following structure:
the first layer is a combined convolution module 1-D which consists of five combined convolution modules 1-B;
the second layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the third layer is a convolution layer;
b2, inputting the eight-layer output of the feature extraction network and the first-layer output of the large-size target prediction network, wherein the medium-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and the structure of the medium-size target prediction network is as follows:
the first layer is an input fusion module 1-E, which consists of a combined convolution module 1-B, an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 1-D which consists of five combined convolution modules 1-B;
the third layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a convolution layer;
b3, inputting the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, wherein the small-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and the structure of the small-size target prediction network is as follows:
the first layer is an input fusion module 1-E, which consists of a combined convolution module 1-B, an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 1-D which consists of five combined convolution modules 1-B;
the third layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a convolution layer;
finally, the output of the large-size target prediction network, the medium-size target prediction network and the small-size target prediction network passes through a non-maximum value inhibition layer to obtain the positions and the types of the predicted targets;
c. setting loss functions including a central coordinate loss function, a width and height loss function, a confidence coefficient loss function and a category loss function;
the center coordinate loss function is formulated as follows:
Lossxy=markobject*(2-w*h)*Losslog(xytrue,xypredict)
in the formula, LossxyRepresenting loss of central coordinates, markobjectRepresenting whether the anchor frame has the zone bit of the object or not, w representing the width of the anchor frame, h representing the height of the anchor frame, and LosslogRepresenting a binary cross-entropy loss, xytrueRepresenting the true central coordinate value, xypredictRepresenting a predicted central coordinate value;
the broad height loss function is formulated as follows:
Losswh=0.5*markobject*(2-w*h)*(whtrue-whpredict)2
in the formula, LosswhRepresents the wide high loss, whtrueRepresents the true width and height value, whpredictRepresents a predicted aspect ratio value;
the confidence loss function is formulated as follows:
Lossconfidence=markobject*Losslog(markobject,cpredict)+(1-markobject)*Losslog(markobject,cpredict)*markignore
in the formula, LossconfidenceRepresenting a loss of confidence, cpredictRepresenting confidence values, mark, of prediction boxesignoreA flag bit representing an anchor box with an IOU less than a threshold;
the class loss function is formulated as follows:
Losscls=markobject*Losslog(clstrue,clspredict)
in the formula, LossclsRepresents class loss, clstrueRepresenting the true class, clspredictRepresenting a prediction category;
the total loss function is formulated as follows:
Loss=(Lossxy+Losswh+Lossconfidence+Losscls)/numf
where Loss represents total Loss and numf represents the floating point number of the total number of inputs;
loading training parameters to train the improved YOLO network, wherein the training parameters are set as follows: setting a training optimizer Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; setting verification set interval detection training accuracy, marking a training completion mark as reaching a maximum training period or meeting requirements in a mean-cross-parallel ratio, and storing the network after the training completion reaches the optimum;
and inputting the test set into an optimal improved YOLO network to obtain the dial position and the dial image.
Further, in step 2), the details of the modified EAST network are as follows:
a. constructing a feature extraction network, wherein the structure is as follows:
the first layer is a combined convolution module 2-B which consists of two combined convolution modules 2-A and a maximum pooling layer, and the combined convolution module 2-A consists of a zero padding layer, a convolution layer and an active layer;
the second layer is a combined convolution module 2-B which consists of two combined convolution modules 2-A and a maximum pooling layer;
the third layer is a combined convolution module 2-C which consists of three combined convolution modules 2-A and a maximum pooling layer;
the fourth layer is a combined convolution module 2-C which consists of three combined convolution modules 2-A and a maximum pooling layer;
the fifth layer is a combined convolution module 2-C which consists of three combined convolution modules 2-A and a maximum pooling layer;
b. constructing a feature fusion network, wherein the structure is as follows:
the first layer is an input fusion module 2-G which consists of an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 2-E which consists of two batch normalization layers, a combined convolution module 2-D and a combined convolution module 2-A; the combined convolution module 2-D consists of a zero padding layer, a convolution layer and an active layer;
the third layer is an input fusion module 2-G which consists of an up-sampling layer and a tensor splicing layer;
the fourth layer is a combined convolution module 2-E which consists of two batch normalization layers, a combined convolution module 2-D and a combined convolution module 2-A;
the fifth layer is an input fusion module 2-G which consists of an up-sampling layer and a tensor splicing layer;
the sixth layer is a combined convolution module 2-F which consists of three batch normalization layers, a combined convolution module 2-D and two combined convolution modules 2-A;
c. constructing a prediction network, wherein the structure is as follows:
the first layer is divided into three branches, and the first branch consists of a combined convolution module 2-D; the second branch consists of a combined convolution module 2-D; the third branch consists of a combined convolution module 2-D;
the second layer is an input fusion module which is formed by splicing three branches of the first layer;
d. setting a loss function comprising a category loss function, a geometric shape loss function and an angle loss function;
the class loss function is formulated as follows:
Figure BDA0003183841580000071
in the formula, LSRepresenting the class loss, beta represents the weight,
Figure BDA0003183841580000072
is a predicted category, Y*Is a real category;
the geometry loss function is formulated as follows:
Figure BDA0003183841580000073
in the formula, LAABBWhich represents a loss of the geometric shape,
Figure BDA0003183841580000074
representing the geometry, R, of a predictive quadrilateral text box AABB*Representing the geometry of a real quadrangular textbox AABBShape, IoU denotes the intersection ratio;
the angle loss function is formulated as follows:
Figure BDA0003183841580000075
in the formula, Lθ
Figure BDA0003183841580000076
θ*) Is the loss of the angle of the beam,
Figure BDA0003183841580000077
is a predicted value of the rotation angle, theta*Is the actual value of the rotation angle;
loading training parameters to train an improved EAST network, wherein the training parameters are set as follows: setting a training optimizer Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; setting verification set interval detection training accuracy, marking a training completion mark as reaching a maximum training period or meeting requirements in a mean-cross-parallel ratio, and storing the network after the training completion reaches the optimum;
and inputting the test set into an optimal improved EAST network to obtain a text position, and cutting the text position into a character image.
Further, in step 3), the specific situation of the CRNN network is as follows:
a. constructing a feature extraction network, wherein the structure is as follows:
the first layer is a combined convolution module 3-A which consists of a zero filling layer, a convolution layer and an active layer;
the second layer is a maximum pooling layer;
the third layer is a combined convolution module 3-A which consists of a zero filling layer, a convolution layer and an activation layer;
the fourth layer is a maximum pooling layer;
the fifth layer is a combined convolution module 3-B which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the sixth layer is a combined convolution module 3-A which consists of a zero filling layer, a convolution layer and an activation layer;
the seventh layer is a maximum pooling layer;
the eighth layer is a combined convolution module 3-B which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the ninth layer is a combined convolution module 3-A which consists of a zero filling layer, a convolution layer and an activation layer;
the tenth layer is a maximum pooling layer;
the eleventh layer is a combined convolution module 3-C, which consists of a zero-padding layer, a convolution layer, a batch normalization layer and an activation layer;
b. constructing a prediction network, wherein the structure is as follows:
the first layer is a cyclic convolution module, which consists of a bidirectional LSTM;
the second layer is a full connection layer;
the third layer is a cyclic convolution module which consists of a bidirectional LSTM;
the fourth layer is a full connection layer;
c. setting a decoder to convert the output sequence into character information;
d. setting a loss function as a CTC (connectionist Temporal classification) loss function;
the CTC loss function is formulated as follows:
LCTC=-ln∏(x,z)p(z|x)=-∑(x,z)∈Slnp(z|x)
in the formula, LCTCRepresents the CTC loss, p (z | x) represents the probability of a given input x output sequence z, S is the training set;
loading training parameters to train the CRNN, wherein the training parameters are set as follows: setting the training optimizer to Adam, the initial learning rate to 0.0001, the maximum training period to 100, and the batch size to 32; setting verification set interval detection training accuracy, setting a training completion flag to reach a maximum training period or meet requirements on identification accuracy, and storing the network after training is completed to reach an optimal value;
and inputting the test set into the optimal CRNN network to obtain character information.
Further, in step 4), the details of the TextCNN network are as follows:
a. a network structure is constructed as follows:
the first layer is an embedding layer;
the second layer is a convolution module;
the third layer is a maximum pooling layer;
the fourth layer consists of a full connection layer, a Dropout layer and an activation layer;
the fifth layer consists of a full connecting layer and an activation layer;
b. setting a loss function as a multi-class cross entropy, wherein the formula is as follows:
Figure BDA0003183841580000091
in the formula, LCrossEntropyRepresenting loss, n representing number of classes, yiRepresenting the true probability of the corresponding i category,
Figure BDA0003183841580000101
representing the prediction probability of the corresponding i category;
loading training parameters to train the TextCNN network, wherein the training parameters are set as follows: setting a training optimizer adam, an initial learning rate of 0.001, iteration times of 1000 and a batch size of 64; setting verification set interval detection training accuracy, marking a training completion mark as reaching the maximum iteration number and meeting the requirement of accuracy, and storing the network after the training completion reaches the optimum;
and inputting the test set into an optimal TextCNN network to obtain a corresponding instrument type.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention realizes instrument positioning and character detection and identification on the instrument by using the neural network, and has higher precision and better generalization capability under different backgrounds compared with the traditional image processing method.
2. The invention can predict targets with different sizes by using the improved YOLO network, comprehensively selects the prediction frame with the most suitable size, can accurately detect instruments with different sizes, and has no limitation on acquisition angle and distance.
3. Compared with other target detection networks, the improved EAST network has better detection performance and higher detection speed on characters, and can have better detection effect on characters shot at different angles.
4. The invention uses the character information on the instruments to splice into the text, can solve the problem that the positions of the instruments can be identified but the types of the instruments are difficult to distinguish in the machine vision field, and can not only detect the instruments but also identify the types of the instruments.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a block diagram of an improved YOLO network.
Fig. 3 is a block diagram of an improved EAST network.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1, the method for detecting and classifying meters of image texts provided by this embodiment includes the following steps:
1) collecting instrument images shot under different actual scenes, removing interference data which have the influences of fuzziness, extreme angles and instrument deficiency and affect recognition, labeling dial plate positions in other data by using an open source labeling tool labelImg, constructing an instrument positioning data set, dividing the instrument positioning data set into a training set and a testing set, training parameters by using the training set to improve a YOLO network, obtaining an optimal improved YOLO network after training is finished, inputting the testing set into the optimal improved YOLO network, outputting the dial plate images and cutting the dial plate images; the improved YOLO network is characterized in that a backbone network is optimized into a mobile lightweight network so as to reduce network parameters and calculation amount and improve operation speed.
According to the specific application scene and the characteristics of the identified object, an improved YOLO network is designed, if the activation layer in the step is not additionally stated, the activation layer is a Leaky Relu activation function, and the method comprises the following steps:
a. constructing a feature extraction network
And constructing a feature extraction network according to the requirements of real-time performance and high precision. The feature extraction network is mainly composed of a plurality of combined convolution modules.
The structure of the feature extraction network is as follows:
the input image is 416 × 416 × 3.
The first layer is the combined convolution module 1-a, shown in fig. 2 (a). The module first passes through the zero-padding layer with an output of 418 x 3. Then the convolution kernel is (3,3), the step length is 2, the filter number is 32, and the output is 208 multiplied by 32.
The second layer is the combined convolution module 1-B, shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 208 × 208 × 32. And after the convolution, batch normalization and activation layers, the convolution kernel is (1,1), the step size is 1, the number of filters is 64, the input and output sizes are consistent by using filling, and the output is 208 multiplied by 64.
The third layer is a combined convolution module 1-C, as shown in fig. 2 (C). The module first passes through the zero-padding layer and the output is 210 x 64. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 2, and the output is 104 multiplied by 64. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 104 multiplied by 128.
The fourth layer is a combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 104 × 104 × 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 104 multiplied by 128.
The fifth layer is a combined convolution module 1-C, as shown in fig. 2 (C). The module first passes through the zero-padding layer with an output of 106 x 128. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 2, and the output is 52 multiplied by 128. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.
The sixth layer is a combined convolution module 1-B, shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52 × 52 × 256. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.
The seventh layer is a combined convolution module 1-C, as shown in fig. 2 (C). The module first passes through the zero-padding layer and the output is 54 x 256. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 2, and the output is 26 multiplied by 256. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512.
The eighth layer is the combined convolution module 1-D, as shown in FIG. 2 (D). The modules pass through five combined convolution modules 1-B in sequence, as shown in fig. 2 (B). In each of the combined convolution modules 1-B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 26 × 26 × 512. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1,1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512. After sequentially passing through the same combined convolution modules 1-B, the output is 26 × 26 × 512.
The ninth layer is a combined convolution module 1-C, as shown in FIG. 2 (C). The module first passes through the zero-padding layer and the output is 28 x 512. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step length is 2, and the output is 13 multiplied by 512. And finally, performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1,1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.
The tenth layer is the combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 13 × 13 × 1024. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1,1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.
b. Building a predictive network
And constructing prediction networks for outputting and predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network.
b1 large-size target prediction network
The input is the tenth layer output of the feature extraction network, and the large-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers.
The input image is 13 × 13 × 1024.
The large-size target prediction network has the following structure:
the first layer is the combined convolution module 1-D, as shown in FIG. 2 (D). The modules pass through five combined convolution modules 1-B in sequence, as shown in fig. 2 (B). In the first combined convolution module 1-B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1,1), the step size is 1, padding is used to make the input and output size consistent, and the output is 13 × 13 × 1024. And performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1,1), the step length is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 512. In the second combined convolution module 1-B, the input is first passed through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 13 × 13 × 512. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1,1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024. Then, after the two combined convolution modules 1-B with different parameters are alternately input, the output is 13 multiplied by 512.
The second layer is the combined convolution module 1-B, shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 13 × 13 × 512. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1,1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.
The third layer is a convolutional layer. The convolution kernel is (1,1), the step size is 1, the number of filters is 256, and the output is 13 × 13 × 256.
b2 medium size target prediction network
The input is the eighth layer output of the feature extraction network and the first layer output of the large-size target prediction network, and the medium-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers.
The input images are 26 × 26 × 512 and 13 × 13 × 512.
The medium-sized target prediction network structure is as follows:
the first layer is the input fusion module, as shown in fig. 2 (e). The input 13 x 512 first goes through the combined convolution module 1-B where the deep convolution, batch normalization layer and activation layer are first passed, the convolution kernel is (1,1) with a step size of 1, padding is used to make the input and output sizes consistent, and the output is 13 x 512. And performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1,1), the step length is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 512. And then passes through an up-sampling layer, the sampling factor is 2, and the output is 26 multiplied by 512. Finally, the output and input are 26 × 26 × 512 through a tensor splicing layer, and the output is 26 × 26 × 1024.
The second layer is the combined convolution module 1-D, as shown in FIG. 2 (D). The modules pass through five combined convolution modules 1-B in sequence, as shown in fig. 2 (B). In the first combined convolution module 1-B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1,1), the step size is 1, padding is used to make the input and output size consistent, and the output is 26 × 26 × 1024. And then the data is subjected to convolution, batch normalization and activation layers, the convolution kernel is (1,1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 256. In the second combined convolution module 1-B, the input is first passed through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, padding is used to make the input and output size consistent, and the output is 26 × 26 × 256. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1,1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512. After the two different parameters of the combined convolution modules 1-B are alternately input, the output is 26 × 26 × 256.
The third layer is a combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 26 × 26 × 256. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1,1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512.
The fourth layer is a convolutional layer. The convolution kernel is (1,1), the step size is 1, the number of filters is 256, and the output is 26 × 26 × 256.
b3 small-size target prediction network
The input is the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, and the small-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers.
The input images are 52 × 52 × 256 and 26 × 26 × 256.
The small-size target prediction network structure is as follows:
the first layer is the input fusion module, as shown in fig. 2 (e). The input 26 × 26 × 256 first passes through the combined convolution module 1-B, where the deep convolution, batch normalization layer and activation layer are first passed, the convolution kernel is (1,1) and the step size is 1, padding is used to make the input and output size uniform, and the output is 26 × 26 × 256. And then the data is subjected to convolution, batch normalization and activation layers, the convolution kernel is (1,1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 256. And the sampling factor is 2 after passing through an up-sampling layer, and the output is 52 multiplied by 256. Finally, the output and input 52 × 52 × 256 go through a tensor concatenation layer, and the output is 52 × 52 × 512.
The second layer is the combined convolution module 1-D, as shown in FIG. 2 (D). The modules pass through five combined convolution modules 1-B in sequence, as shown in fig. 2 (B). In the first combined convolution module 1-B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1,1), the step size is 1, padding is used to make the input and output size consistent, and the output is 52 × 52 × 512. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 128. In the second combined convolution module 1-B, the input is first passed through a deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52 x 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256. After the two different parameters of the combined convolution modules 1-B are alternately input, the output is 52 x 128.
The third layer is a combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3,3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52 × 52 × 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.
The fourth layer is a convolutional layer. The convolution kernel is (1,1), the step size is 1, the number of filters is 256, and the output is 52 × 52 × 256.
And finally, obtaining the predicted target position and the predicted target category through a non-maximum value inhibition layer by using the output 13 × 13 × 256 of the large-size target prediction network, the output 26 × 26 × 256 of the medium-size target prediction network and the output 52 × 52 × 256 of the small-size target prediction network.
c. Setting a loss function
And setting the loss function as a summation mean of a central coordinate loss function, a width and height loss function, a confidence coefficient loss and a category loss function. The loss function is formulated as follows:
Loss=(Lossxy+Losswh+Lossconfidence+Losscls)/numf
wherein Loss represents total Loss, LossxyRepresenting Loss of center coordinates, LosswhRepresenting a wide high Loss, LossconfidenceRepresenting Loss of confidence, LossclsRepresents class loss and numf represents the floating point number of the total number of inputs. The respective loss functions are formulated as follows:
Lossxy=markobject*(2-w*h)*Losslog(xytrue,xypredict)
Losswh=0.5*markobject*(2-w*h)*(whtrue-whpredict)2
Lossconfidence=markobject*Losslog(markobject,cpredict)+(1-markobject)*Losslog(markobject,cpredict)*markignore
Losscls=markobject*Losslog(clstrue,clspredict)
wherein markobjectRepresenting whether the anchor frame has the zone bit of the object or not, w representing the width of the anchor frame, h representing the height of the anchor frame, and LosslogRepresents twoValue cross entropy loss, xytrueRepresenting the true central coordinate value, xypredictRepresents the predicted central coordinate value, whtrueRepresents the true width and height value, whpredictRepresenting the predicted width and height values, cpredictRepresenting confidence values, mark, of prediction boxesignoreFlags, cls, representing anchor boxes whose IOU is less than a thresholdtrueRepresenting the true class, clspredictRepresenting a prediction category.
Training an improved YOLO network, comprising the steps of:
d1 setting training parameters
Setting Adam as a training optimizer, 0.001 as an initial learning rate, 500 as an iteration number, 8 as a batch size, and K means clustering on all labels generates initial prior frames (38, 29), (65, 52), (94, 87), (142, 134), (195, 69), (216, 206), (337, 320), (397, 145), (638, 569).
d2, Online data enhancement
The data enhancement is carried out on the input image, the data set is expanded, and the data enhancement method comprises the following steps: random mirror image turning, random noise addition and random contrast adjustment.
d3 setting training completion flag
Setting verification set interval detection training accuracy, marking a training completion mark as reaching the maximum iteration number of 500 and enabling the accuracy to meet requirements, and storing the network after the training completion reaches the optimal value.
And inputting the test set into an optimal improved YOLO network to obtain the dial position and the dial image.
2) Marking the character positions in the dial image cut out in the step 1) to construct a character detection data set, dividing the character detection data set into a training set and a testing set, then loading training parameters, training an improved EAST network by using the training set, obtaining an optimal improved EAST network after training is finished, inputting the testing set into the optimal improved EAST network, outputting the character positions in the character detection data set and cutting the character positions into character images; the improved EAST network is to change the backbone network into VGG to improve the network detection accuracy, and the output layer structure modification prediction module only uses the head element to predict the vertex so as to improve the prediction performance of the long character.
According to the specific application scene and the characteristics of the recognition object, an improved EAST network is designed, and the following activation layers are Relu activation functions if not additionally stated. The method comprises the following steps:
a. constructing a feature extraction network
The structure of the feature extraction network is as follows:
the input image is 256 × 256 × 3.
The first layer is a combined convolution module 2-B, which, as shown in fig. 3 (B), consists of two combined convolution modules 2-a and one max-pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, with an output of 258 x 3, the convolution kernel of (3,3), the step size of 1, the number of filters of 64, and the output of 256 x 64, through the convolutional layer and the active layer. The second convolution module 2-a, first passes through the zero-padding layer and outputs 258 × 258 × 64, then passes through the convolution layer and the active layer, the convolution kernel is (3,3), the step size is 1, the number of filters is 64, and the output is 256 × 256 × 64. After a maximum pooling level, the pooling kernel size is (2,2) step size is 2 and the output is 128 × 128 × 64.
The second layer is a combined convolution module 2-B, which consists of two combined convolution modules 2-A and one maximum pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, then through the convolutional layer and the active layer, with a convolution kernel of (3,3), a step size of 1, and a number of filters of 128. The second convolution module 2-a, first passes through the zero-crossing filler layer, then the convolution layer and the activation layer, with a convolution kernel of (3,3), a step size of 1, and a filter number of 128. After a maximum pooling level, the pooling kernel size is (2,2) with step size of 2 and the output is 64 × 64 × 128.
The third layer is a combined convolution module 2-C, which, as shown in fig. 3 (C), consists of three combined convolution modules 2-a and one max pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, then through the convolutional layer and the active layer, with a convolution kernel of (3,3), a step size of 1, and a filter number of 256. The second convolution module 2-a, first passes through the zero-crossing filler layer, then the convolution layer and the activation layer, with a convolution kernel of (3,3), a step size of 1, and a filter number of 256. The third convolution module 2-a passes through the zero-crossing filling layer, the convolution layer and the activation layer, the convolution kernel is (3,3), the step length is 1, and the number of filters is 256. After a maximum pooling layer, the pooling kernel size is (2,2) step size is 2 and the output is 32 × 32 × 256.
The fourth layer is a combined convolution module 2-C, which consists of three combined convolution modules 2-A and a maximum pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, then through the convolutional layer and the active layer, with a convolution kernel of (3,3), a step size of 1, and a filter number of 512. The second convolution module 2-a, first passes through the zero-crossing filler layer, then the convolution layer and the activation layer, the convolution kernel is (3,3), the step size is 1, and the number of filters is 512. The third convolution module 2-a passes through the zero-crossing filling layer, the convolution layer and the activation layer, the convolution kernel is (3,3), the step length is 1, and the number of filters is 512. And then passing through a maximum pooling layer, the pooling kernel size is (2,2) and the step size is 2, and the output is 16 × 16 × 512.
And the fifth layer is a combined convolution module 2-C which consists of three combined convolution modules 2-A and a maximum pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, then through the convolutional layer and the active layer, with a convolution kernel of (3,3), a step size of 1, and a filter number of 512. The second convolution module 2-a, first passes through the zero-crossing filler layer, then the convolution layer and the activation layer, the convolution kernel is (3,3), the step size is 1, and the number of filters is 512. The third convolution module 2-a passes through the zero-crossing filling layer, the convolution layer and the activation layer, the convolution kernel is (3,3), the step length is 1, and the number of filters is 512. And passing through a maximum pooling layer, the pooling kernel size is (2,2), the step size is 2, and the output is 8 × 8 × 512.
b. Building feature fusion networks
The first level is the input fusion module 2-G, as shown in FIG. 3 (G). The last layer output of the feature extraction network is 8 × 8 × 512, the sampling factor is 2, and the output is 16 × 16 × 512. The output and the fourth layer output of the feature extraction network are 16 multiplied by 512, and the output is 16 multiplied by 1024 through a tensor splicing layer.
The second layer is a combined convolution module 2-E, which, as shown in fig. 3 (E), consists of two batch normalization layers, one combined convolution module 2-D and one combined convolution module 2-a. The combined convolution module 2-D consists of a zero-padded layer, a convolutional layer, and an active layer. First, a batch normalization layer is passed, and then a combined convolution module 2-D is passed. The combined convolution module 2-D passes through the zero-crossing filling layer, the convolution layer and the activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 128, and the output is 16 × 16 × 128. Then passes through a batch normalization layer and then passes through a combined convolution module 2-A. The combined convolution module 2-A passes through the zero-padding layer, the convolution layer and the activation layer, the convolution kernel is (3,3), the step size is 1, the number of filters is 64, and the output is 16 × 16 × 64.
The third layer is an input fusion module 2-G. The second layer output of the feature fusion network is 16 × 16 × 64, the sampling factor is 2, and the output is 32 × 32 × 64. The output of the third layer of the output and feature extraction network is 32 multiplied by 256 and passes through a tensor splicing layer, and the output is 32 multiplied by 320.
The fourth layer is the combined convolution module 2-E. First, a batch normalization layer is passed, and then a combined convolution module 2-D is passed. The combined convolution module 2-D passes through the zero-crossing filler layer first, then the convolution layer and the activation layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 128, and the output is 32 × 32 × 128. Then passes through a batch normalization layer and then passes through a combined convolution module 1-A. The combined convolution module 1-a passes through the zero-padding layer first, then the convolution layer and the active layer, the convolution kernel is (3,3), the step size is 1, the number of filters is 64, and the output is 32 × 32 × 64.
The fifth layer is an input fusion module 1-G. The second layer output of the feature fusion network is 32 multiplied by 64 firstly passes through an upsampling layer, the sampling factor is 2, and the output is 64 multiplied by 64. The output and the second layer output of the feature extraction network are 64 multiplied by 128, and the output is 64 multiplied by 192 through a tensor splicing layer.
The sixth layer is a combined convolution module 1-F, which, as shown in fig. 3 (F), consists of three batch normalization layers, one combined convolution module 1-D and two combined convolution modules 1-a. First, a batch normalization layer is passed, and then a combined convolution module 1-D is passed. The combined convolution module 1-D passes through the zero-crossing filler layer first, the convolution layer and the active layer, the convolution kernel is (1,1), the step size is 1, the number of filters is 32, and the output is 64 × 64 × 32. Then passes through a batch normalization layer and then passes through a combined convolution module 1-A. The combined convolution module 1-a passes through the zero-padding layer first, then the convolution layer and the active layer, the convolution kernel is (3,3), the step size is 1, the number of filters is 32, and the output is 64 × 64 × 32. Then passes through a batch normalization layer and then passes through a combined convolution module 1-A. The combined convolution module 1-a passes through the zero-padding layer first, then the convolution layer and the active layer, the convolution kernel is (3,3), the step size is 1, the number of filters is 32, and the output is 64 × 64 × 32.
c. Building a predictive network
The first layer has three branches, the first branch is composed of a combined convolution module 1-D, and passes through a zero padding layer, a convolution layer and an activation layer, the convolution kernel is (1,1), the step length is 1, the number of filters is 1, and the output is 64 multiplied by 1. The second branch consists of a combined convolution module 1-D, first passing through a zero-padding layer, then through convolutional layers and activation layers, with a convolution kernel of (1,1), step size of 1, filter number of 2, and output of 64 × 64 × 2. The third branch consists of a combined convolution module 1-D, first passing through a zero-padding layer, then passing through convolution layers and activation layers, the convolution kernel is (1,1), the step size is 1, the number of filters is 4, and the output is 64 x 4.
The second layer is an input fusion module which is formed by splicing three branches of the first layer of the prediction network, and the output is 64 multiplied by 7.
d. Setting a loss function
The loss function is set to the sum of the class loss, the geometry loss and the angle loss.
The class loss function is formulated as follows:
Figure BDA0003183841580000211
wherein L isSRepresenting the class loss, beta represents the weight,
Figure BDA0003183841580000212
is a predicted category, Y*Is a real category.
The geometry loss function is formulated as follows:
Figure BDA0003183841580000213
wherein L isAABBRepresenting the loss function of the geometry shape,
Figure BDA0003183841580000214
representing the predicted AABB geometry, R*Representing the geometry of the real AABB, IoU representing the cross-over ratio.
The angle loss function is formulated as follows:
Figure BDA0003183841580000215
wherein L isθ
Figure BDA0003183841580000216
θ*) Is a function of the angular loss that is,
Figure BDA0003183841580000217
is a prediction of the rotation angle, theta*Is the true case of the rotation angle.
Training an improved EAST network, comprising the steps of:
e1 setting training parameters
And setting a training optimizer, an initial learning rate, iteration times, batch size and an initial prior frame.
e2, Online data enhancement
The data enhancement is carried out on the input image, the data set is expanded, and the main method of the data enhancement is as follows: noise is added randomly, and contrast is adjusted randomly.
e3, setting training completion flag
And setting verification set interval detection training accuracy, marking a training completion mark as reaching the maximum iteration number and meeting the accuracy requirement, and storing the network after the training completion reaches the optimum.
And inputting the test set into an optimal improved EAST network to obtain a character image.
3) Marking the character information in the character image cut in the step 2) to construct a character recognition data set, dividing the character recognition data set into a training set and a testing set, then loading training parameters to train the CRNN by using the training set, obtaining the optimal CRNN after the training is finished, inputting the testing set into the optimal CRNN, and outputting the character information.
Constructing a CRNN network, comprising the following steps:
a. constructing a feature extraction network
The input image is w × 32 × 1, where w is the width of the input image, and is adaptively changed according to the input picture size.
The first layer is the combined convolution module 3-a, first passes through a zero-padding layer, then a convolution layer and an active layer, the convolution kernel is (3,3), the step size is 1, the filter is 64, and the output is w × 32 × 64.
The second layer is the maximum pooling layer, the pooling kernel size is (2,2), the step size is 2, and the output is
Figure BDA0003183841580000221
The third layer is a combined convolution module 3-A, firstly passes through a zero filling layer, then passes through a convolution layer and an activation layer, the convolution kernel is (3,3), the step length is 1, the filter is 128, and the output is
Figure BDA0003183841580000222
The fourth layer is the maximum pooling layer, the pooling kernel size is (2,2), the step length is 2, and the output is
Figure BDA0003183841580000223
The fifth layer is a combined convolution module 3-B, which firstly passes through a zero filling layer, then passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (3,3), the step length is 1, and the filterIs 256, the output is
Figure BDA0003183841580000224
The sixth layer is a combined convolution module 3-A, firstly passes through a zero filling layer, then passes through a convolution layer and an activation layer, the convolution kernel is (3,3), the step length is 1, the filter is 256, and the output is
Figure BDA0003183841580000225
The seventh layer is the maximum pooling layer, the pooling kernel size is (2,2), the step size is 2, and the output is
Figure BDA0003183841580000226
The eighth layer is a combined convolution module 3-B, which firstly passes through a zero filling layer, then passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (3,3), the step length is 1, the filter is 512, and the output is
Figure BDA0003183841580000227
The ninth layer is a combined convolution module 3-A, firstly passes through a zero filling layer, then passes through a convolution layer and an activation layer, the convolution kernel is (3,3), the step length is 1, the filter is 512, and the output is
Figure BDA0003183841580000228
The tenth layer is the maximum pooling layer, the pooling kernel size is (2,2), the step length is 2, and the output is
Figure BDA0003183841580000229
The eleventh layer is a combined convolution module 3-C, which passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (2,2), the step length is 1, the filter is 512, and the output is
Figure BDA0003183841580000231
b. Building a predictive network
The first layer is a cyclic convolution module, which consists of a bi-directional LSTM with an output of
Figure BDA0003183841580000232
The second layer is a full connection layer, the output is
Figure BDA0003183841580000233
The third layer is a cyclic convolution module which consists of a bidirectional LSTM with an output of
Figure BDA0003183841580000234
The fourth layer is a full connection layer, the output is
Figure BDA0003183841580000235
c. Setting decoder
Converting the output of a prediction network into
Figure BDA0003183841580000236
Each element ranging from 0 to 6735, corresponding to a respective independent character (where 0 corresponds to a null character), corresponding to the division of a line of text into lines
Figure BDA0003183841580000237
The character predicts the block. And processing the sequence from left to right, and outputting character information according to the element value corresponding to the character library when the element is not 0 and is the same as the previous element.
d. Setting a loss function
The loss function is set to be the ctc (connectionist Temporal classification) loss function.
The CTC loss function is formulated as follows:
LCTC=-lnΠ(x,z)p(z|x)=-∑(x,z)∈S lnp(z|x)
wherein L isCTCRepresenting the CTC loss function, p (z | x) represents the probability of outputting a sequence z given an input x, S being the training set.
Training a CRNN network, comprising the steps of:
e1 setting training parameters
And setting a training optimizer, an initial learning rate, iteration times and batch size.
e2, setting training completion flag
And setting verification set interval detection training accuracy, marking a training completion mark as reaching the maximum iteration number and meeting the accuracy requirement, and storing the network after the training completion reaches the optimum.
And inputting the test set into the optimal CRNN network to obtain character information.
4) Splicing the character information output in the step 3) into a text, labeling the instrument type corresponding to the text to construct a text classification data set, dividing the text classification data set into a training set and a test set, loading training parameters to train a TextCNN network by using the training set, obtaining an optimal TextCNN network after the training is finished, and inputting the test set into the optimal TextCNN network to output the instrument type corresponding to the text.
Constructing a TextCNN network, comprising the following steps:
a. constructing a network structure:
the first layer is an embedded layer, the length of an input text is m, and the word vectorization is 600 multiplied by 64, namely the input tensor;
the second layer is a convolution module, the convolution kernel is (5, 5), the step length is 1, the number of filters is 256, and the output is 596 multiplied by 256;
the third layer is a maximum pooling layer, and the output is 1 multiplied by 256;
the fourth layer consists of a full connection layer, a Dropout layer and an activation layer, and the output is 1 multiplied by 128;
the fifth layer consists of a full connection layer and an activation layer, the output is 1 xcls, and cls is the category number;
b. setting a loss function
Setting loss function as multi-class cross entropy
Figure BDA0003183841580000241
Wherein L isCrossEntropyRepresenting loss, n representing number of classes, yiRepresenting the true probability of the corresponding i category,
Figure BDA0003183841580000242
representing the prediction probability of the corresponding i category.
Training a TextCNN network, comprising the steps of:
c1 setting training parameters
And setting a training optimizer, an initial learning rate, iteration times and batch size.
c2 setting training completion flag
And setting verification set interval detection training accuracy, marking a training completion mark as reaching the maximum iteration number and meeting the accuracy requirement, and storing the network after the training completion reaches the optimum.
And inputting the test set into the instrument type corresponding to the optimal TextCNN network output text.
In conclusion, after the scheme is adopted, the invention provides a new method for detecting and classifying the instrument image, and the neural network is used as an effective method for detecting and classifying the instrument, so that the problem that the instrument type is difficult to read can be effectively solved, the development of an automatic instrument identification technology is effectively promoted, and the method has actual popularization value and is worthy of popularization.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (6)

1. A meter detection and classification method based on image texts is characterized by comprising the following steps:
1) marking dial plate positions by using an instrument image to construct an instrument positioning data set, dividing the instrument positioning data set into a training set and a testing set, training and improving a YOLO network by using the training set according to loading parameters, obtaining an optimal improved YOLO network after training is finished, inputting the testing set into the optimal improved YOLO network, outputting a dial plate image and cutting the dial plate image; the improved YOLO network is characterized in that a backbone network is optimized into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;
2) marking the character positions in the dial image cut out in the step 1) to construct a character detection data set, dividing the character detection data set into a training set and a testing set, then loading training parameters, training an improved EAST network by using the training set, obtaining an optimal improved EAST network after training is finished, inputting the testing set into the optimal improved EAST network, outputting the character positions in the character detection data set and cutting the character positions into character images; the improved EAST network is characterized in that a backbone network is changed into VGG to improve network detection accuracy, and an output layer structure modification prediction module of the improved EAST network only predicts a vertex by using a head element to improve the prediction performance of a long character;
3) marking the character information in the character image cut in the step 2) to construct a character recognition data set, dividing the character recognition data set into a training set and a testing set, then loading training parameters to train the CRNN by using the training set, obtaining an optimal CRNN after training is finished, inputting the testing set into the optimal CRNN, and outputting the character information;
4) splicing the character information output in the step 3) into a text, labeling the instrument type corresponding to the text to construct a text classification data set, dividing the text classification data set into a training set and a test set, loading training parameters to train a TextCNN network by using the training set, obtaining an optimal TextCNN network after the training is finished, and inputting the test set into the optimal TextCNN network to output the instrument type corresponding to the text.
2. The method for detecting and classifying meters based on image texts as claimed in claim 1, wherein in step 1), first, various meter images under different environments are collected by a camera, preprocessing operations of filtering and image enhancement are performed on the meter images, then abnormal data in the meter images, including data with dirty surface, extreme illumination and incomplete shooting abnormality, are removed, and then the rest of data are labeled, wherein the labeled content is the dial position, and a meter positioning data set is constructed and divided into a training set and a testing set.
3. The method for classifying instrument detection based on image text as claimed in claim 1, wherein in step 1), the specific conditions of the improved YOLO network are as follows:
a. constructing a feature extraction network according to the requirements of real-time performance and high precision:
the first layer is a combined convolution module 1-A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the second layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the third layer is a combined convolution module 1-C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fifth layer is a combined convolution module 1-C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the sixth layer is a combined convolution module 1-B which consists of a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the seventh layer is a combined convolution module 1-C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the eighth layer is a combined convolution module 1-D which consists of five combined convolution modules 1-B;
the ninth layer is a combined convolution module 1-C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the tenth layer is a combined convolution module 1-B which consists of a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
b. constructing prediction networks for outputting and predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network;
b1, inputting the data as the tenth layer output of the feature extraction network, wherein the large-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and has the following structure:
the first layer is a combined convolution module 1-D which consists of five combined convolution modules 1-B;
the second layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the third layer is a convolution layer;
b2, inputting the eight-layer output of the feature extraction network and the first-layer output of the large-size target prediction network, wherein the medium-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and the structure of the medium-size target prediction network is as follows:
the first layer is an input fusion module 1-E, which consists of a combined convolution module 1-B, an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 1-D which consists of five combined convolution modules 1-B;
the third layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a convolution layer;
b3, inputting the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, wherein the small-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and the structure of the small-size target prediction network is as follows:
the first layer is an input fusion module 1-E, which consists of a combined convolution module 1-B, an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 1-D which consists of five combined convolution modules 1-B;
the third layer is a combined convolution module 1-B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a convolution layer;
finally, the output of the large-size target prediction network, the medium-size target prediction network and the small-size target prediction network passes through a non-maximum value inhibition layer to obtain the positions and the types of the predicted targets;
c. setting loss functions including a central coordinate loss function, a width and height loss function, a confidence coefficient loss function and a category loss function;
the center coordinate loss function is formulated as follows:
Lossxy=markobject*(2-w*h)*Losslog(xytrue,xypredict)
in the formula, LossxyRepresenting loss of central coordinates, markobjectRepresenting whether the anchor frame has the zone bit of the object or not, w representing the width of the anchor frame, h representing the height of the anchor frame, and LosslogRepresenting a binary cross-entropy loss, xytrueRepresenting the true central coordinate value, xypredictRepresenting a predicted central coordinate value;
the broad height loss function is formulated as follows:
Losswh=0.5*markobject*(2-w*h)*(whtrue-whpredict)2
in the formula, LosswhRepresents the wide high loss, whtrueRepresents the true width and height value, whpredictRepresents a predicted aspect ratio value;
the confidence loss function is formulated as follows:
Lossconfidence=markobject*Losslog(markobject,cpredict)+(1-markobject)*Losslog(markobject,cpredict)*markignore
in the formula, LossconfidenceRepresenting a loss of confidence, cpredictRepresenting confidence values, mark, of prediction boxesignoreA flag bit representing an anchor box with an IOU less than a threshold;
the class loss function is formulated as follows:
Losscls=markobject*Losslog(clstrue,clspredict)
in the formula, LossclsRepresents class loss, clstrueRepresenting the true class, clspredictRepresenting a prediction category;
the total loss function is formulated as follows:
Loss=(Lossxy+Losswh+Lossconfidence+Losscls)/numf
where Loss represents total Loss and numf represents the floating point number of the total number of inputs;
loading training parameters to train the improved YOLO network, wherein the training parameters are set as follows: setting a training optimizer Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; setting verification set interval detection training accuracy, marking a training completion mark as reaching a maximum training period or meeting requirements in a mean-cross-parallel ratio, and storing the network after the training completion reaches the optimum;
and inputting the test set into an optimal improved YOLO network to obtain the dial position and the dial image.
4. The method for detecting and classifying meters based on image texts as claimed in claim 1, wherein in step 2), the concrete conditions of the improved EAST network are as follows:
a. constructing a feature extraction network, wherein the structure is as follows:
the first layer is a combined convolution module 2-B which consists of two combined convolution modules 2-A and a maximum pooling layer, and the combined convolution module 2-A consists of a zero padding layer, a convolution layer and an active layer;
the second layer is a combined convolution module 2-B which consists of two combined convolution modules 2-A and a maximum pooling layer;
the third layer is a combined convolution module 2-C which consists of three combined convolution modules 2-A and a maximum pooling layer;
the fourth layer is a combined convolution module 2-C which consists of three combined convolution modules 2-A and a maximum pooling layer;
the fifth layer is a combined convolution module 2-C which consists of three combined convolution modules 2-A and a maximum pooling layer;
b. constructing a feature fusion network, wherein the structure is as follows:
the first layer is an input fusion module 2-G which consists of an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 2-E which consists of two batch normalization layers, a combined convolution module 2-D and a combined convolution module 2-A; the combined convolution module 2-D consists of a zero padding layer, a convolution layer and an active layer;
the third layer is an input fusion module 2-G which consists of an up-sampling layer and a tensor splicing layer;
the fourth layer is a combined convolution module 2-E which consists of two batch normalization layers, a combined convolution module 2-D and a combined convolution module 2-A;
the fifth layer is an input fusion module 2-G which consists of an up-sampling layer and a tensor splicing layer;
the sixth layer is a combined convolution module 2-F which consists of three batch normalization layers, a combined convolution module 2-D and two combined convolution modules 2-A;
c. constructing a prediction network, wherein the structure is as follows:
the first layer is divided into three branches, and the first branch consists of a combined convolution module 2-D; the second branch consists of a combined convolution module 2-D; the third branch consists of a combined convolution module 2-D;
the second layer is an input fusion module which is formed by splicing three branches of the first layer;
d. setting a loss function comprising a category loss function, a geometric shape loss function and an angle loss function;
the class loss function is formulated as follows:
Figure FDA0003183841570000061
in the formula, LSRepresenting the class loss, beta represents the weight,
Figure FDA0003183841570000062
is a predicted category, Y*Is a real category;
the geometry loss function is formulated as follows:
Figure FDA0003183841570000063
in the formula, LAABBWhich represents a loss of the geometric shape,
Figure FDA0003183841570000064
representing the geometry, R, of a predictive quadrilateral text box AABB*Representing the geometry of the real quadrangular text box AABB, IoU representing the intersection ratio;
the angle loss function is formulated as follows:
Figure FDA0003183841570000065
in the formula (I), the compound is shown in the specification,
Figure FDA0003183841570000066
is the loss of the angle of the beam,
Figure FDA0003183841570000067
is a predicted value of the rotation angle, theta*Is the actual value of the rotation angle;
loading training parameters to train an improved EAST network, wherein the training parameters are set as follows: setting a training optimizer Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; setting verification set interval detection training accuracy, marking a training completion mark as reaching a maximum training period or meeting requirements in a mean-cross-parallel ratio, and storing the network after the training completion reaches the optimum;
and inputting the test set into an optimal improved EAST network to obtain a text position, and cutting the text position into a character image.
5. The method as claimed in claim 1, wherein in step 3), the CRNN network is specifically as follows:
a. constructing a feature extraction network, wherein the structure is as follows:
the first layer is a combined convolution module 3-A which consists of a zero filling layer, a convolution layer and an active layer;
the second layer is a maximum pooling layer;
the third layer is a combined convolution module 3-A which consists of a zero filling layer, a convolution layer and an activation layer;
the fourth layer is a maximum pooling layer;
the fifth layer is a combined convolution module 3-B which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the sixth layer is a combined convolution module 3-A which consists of a zero filling layer, a convolution layer and an activation layer;
the seventh layer is a maximum pooling layer;
the eighth layer is a combined convolution module 3-B which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the ninth layer is a combined convolution module 3-A which consists of a zero filling layer, a convolution layer and an activation layer;
the tenth layer is a maximum pooling layer;
the eleventh layer is a combined convolution module 3-C, which consists of a zero-padding layer, a convolution layer, a batch normalization layer and an activation layer;
b. constructing a prediction network, wherein the structure is as follows:
the first layer is a cyclic convolution module, which consists of a bidirectional LSTM;
the second layer is a full connection layer;
the third layer is a cyclic convolution module which consists of a bidirectional LSTM;
the fourth layer is a full connection layer;
c. setting a decoder to convert the output sequence into character information;
d. setting a loss function as a CTC (connectionist Temporal classification) loss function;
the CTC loss function is formulated as follows:
LCTC=-lnΠ(x,z)p(z|x)=-∑(x,z)∈Sln p(z|x)
in the formula, LCTCRepresents the CTC loss, p (z | x) represents the probability of a given input x output sequence z, S is the training set;
loading training parameters to train the CRNN, wherein the training parameters are set as follows: setting the training optimizer to Adam, the initial learning rate to 0.0001, the maximum training period to 100, and the batch size to 32; setting verification set interval detection training accuracy, setting a training completion flag to reach a maximum training period or meet requirements on identification accuracy, and storing the network after training is completed to reach an optimal value;
and inputting the test set into the optimal CRNN network to obtain character information.
6. The method for classifying meters based on image texts according to claim 1, wherein in step 4), the TextCNN network is specified as follows:
a. a network structure is constructed as follows:
the first layer is an embedding layer;
the second layer is a convolution module;
the third layer is a maximum pooling layer;
the fourth layer consists of a full connection layer, a Dropout layer and an activation layer;
the fifth layer consists of a full connecting layer and an activation layer;
b. setting a loss function as a multi-class cross entropy, wherein the formula is as follows:
Figure FDA0003183841570000081
in the formula, LCrossEntropyRepresenting loss, n representing number of classes, yiRepresenting the true probability of the corresponding i category,
Figure FDA0003183841570000082
representing the prediction probability of the corresponding i category;
loading training parameters to train the TextCNN network, wherein the training parameters are set as follows: setting a training optimizer adam, an initial learning rate of 0.001, iteration times of 1000 and a batch size of 64; setting verification set interval detection training accuracy, marking a training completion mark as reaching the maximum iteration number and meeting the requirement of accuracy, and storing the network after the training completion reaches the optimum;
and inputting the test set into an optimal TextCNN network to obtain a corresponding instrument type.
CN202110855223.6A 2021-07-28 2021-07-28 Instrument detection classification method based on image text Active CN113673509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110855223.6A CN113673509B (en) 2021-07-28 2021-07-28 Instrument detection classification method based on image text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110855223.6A CN113673509B (en) 2021-07-28 2021-07-28 Instrument detection classification method based on image text

Publications (2)

Publication Number Publication Date
CN113673509A true CN113673509A (en) 2021-11-19
CN113673509B CN113673509B (en) 2023-06-09

Family

ID=78540390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110855223.6A Active CN113673509B (en) 2021-07-28 2021-07-28 Instrument detection classification method based on image text

Country Status (1)

Country Link
CN (1) CN113673509B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936280A (en) * 2021-11-23 2022-01-14 河海大学 Embedded instrument code disc character automatic identification system and method
CN114338346A (en) * 2021-12-29 2022-04-12 中国工商银行股份有限公司 Alarm message processing method and device and electronic equipment
CN115424121A (en) * 2022-07-30 2022-12-02 南京理工大学紫金学院 Power pressing plate switch inspection method based on computer vision
CN116416626A (en) * 2023-06-12 2023-07-11 平安银行股份有限公司 Method, device, equipment and storage medium for acquiring circular seal data
CN116958998A (en) * 2023-09-20 2023-10-27 四川泓宝润业工程技术有限公司 Digital instrument reading identification method based on deep learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision
CN110543878A (en) * 2019-08-07 2019-12-06 华南理工大学 pointer instrument reading identification method based on neural network
CN111062282A (en) * 2019-12-05 2020-04-24 武汉科技大学 Transformer substation pointer type instrument identification method based on improved YOLOV3 model
CN111368825A (en) * 2020-02-25 2020-07-03 华南理工大学 Pointer positioning method based on semantic segmentation
CN111401358A (en) * 2020-02-25 2020-07-10 华南理工大学 Instrument dial plate correction method based on neural network
CN111639643A (en) * 2020-05-22 2020-09-08 深圳市赛为智能股份有限公司 Character recognition method, character recognition device, computer equipment and storage medium
CN111814919A (en) * 2020-08-31 2020-10-23 江西小马机器人有限公司 Instrument positioning and identifying system based on deep learning
CN112801094A (en) * 2021-02-02 2021-05-14 中国长江三峡集团有限公司 Pointer instrument image inclination correction method
CN112861867A (en) * 2021-02-01 2021-05-28 北京大学 Pointer type instrument panel identification method, system and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision
CN110543878A (en) * 2019-08-07 2019-12-06 华南理工大学 pointer instrument reading identification method based on neural network
CN111062282A (en) * 2019-12-05 2020-04-24 武汉科技大学 Transformer substation pointer type instrument identification method based on improved YOLOV3 model
CN111368825A (en) * 2020-02-25 2020-07-03 华南理工大学 Pointer positioning method based on semantic segmentation
CN111401358A (en) * 2020-02-25 2020-07-10 华南理工大学 Instrument dial plate correction method based on neural network
CN111639643A (en) * 2020-05-22 2020-09-08 深圳市赛为智能股份有限公司 Character recognition method, character recognition device, computer equipment and storage medium
CN111814919A (en) * 2020-08-31 2020-10-23 江西小马机器人有限公司 Instrument positioning and identifying system based on deep learning
CN112861867A (en) * 2021-02-01 2021-05-28 北京大学 Pointer type instrument panel identification method, system and storage medium
CN112801094A (en) * 2021-02-02 2021-05-14 中国长江三峡集团有限公司 Pointer instrument image inclination correction method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936280A (en) * 2021-11-23 2022-01-14 河海大学 Embedded instrument code disc character automatic identification system and method
CN113936280B (en) * 2021-11-23 2024-04-05 河海大学 Automatic character recognition system and method for code disc of embedded instrument
CN114338346A (en) * 2021-12-29 2022-04-12 中国工商银行股份有限公司 Alarm message processing method and device and electronic equipment
CN115424121A (en) * 2022-07-30 2022-12-02 南京理工大学紫金学院 Power pressing plate switch inspection method based on computer vision
CN115424121B (en) * 2022-07-30 2023-10-13 南京理工大学紫金学院 Electric power pressing plate switch inspection method based on computer vision
CN116416626A (en) * 2023-06-12 2023-07-11 平安银行股份有限公司 Method, device, equipment and storage medium for acquiring circular seal data
CN116416626B (en) * 2023-06-12 2023-08-29 平安银行股份有限公司 Method, device, equipment and storage medium for acquiring circular seal data
CN116958998A (en) * 2023-09-20 2023-10-27 四川泓宝润业工程技术有限公司 Digital instrument reading identification method based on deep learning
CN116958998B (en) * 2023-09-20 2023-12-26 四川泓宝润业工程技术有限公司 Digital instrument reading identification method based on deep learning

Also Published As

Publication number Publication date
CN113673509B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN110543878B (en) Pointer instrument reading identification method based on neural network
CN113673509B (en) Instrument detection classification method based on image text
CN106127204B (en) A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks
CN111368825B (en) Pointer positioning method based on semantic segmentation
CN113239930A (en) Method, system and device for identifying defects of cellophane and storage medium
CN115223063B (en) Deep learning-based unmanned aerial vehicle remote sensing wheat new variety lodging area extraction method and system
CN105335760A (en) Image number character recognition method
CN113420619A (en) Remote sensing image building extraction method
CN111369526A (en) Multi-type old bridge crack identification method based on semi-supervised deep learning
CN116704137B (en) Reverse modeling method for point cloud deep learning of offshore oil drilling platform
CN105184225A (en) Multinational paper money image identification method and apparatus
CN113850799A (en) YOLOv 5-based trace DNA extraction workstation workpiece detection method
CN112926556A (en) Aerial photography power transmission line strand breaking identification method and system based on semantic segmentation
CN112464704A (en) Remote sensing image identification method based on feature fusion and rotating target detector
CN113837166B (en) Automatic pointer instrument reading method based on deep learning
CN116977747B (en) Small sample hyperspectral classification method based on multipath multi-scale feature twin network
CN117523394A (en) SAR vessel detection method based on aggregation characteristic enhancement network
CN113673508B (en) Pointer instrument image data synthesis method
CN116052110A (en) Intelligent positioning method and system for pavement marking defects
CN115953394A (en) Target segmentation-based detection method and system for mesoscale ocean vortexes
CN116310902A (en) Unmanned aerial vehicle target detection method and system based on lightweight neural network
CN116385364A (en) Multi-level ground lead defect identification method based on parallax auxiliary semantic segmentation
CN115830302A (en) Multi-scale feature extraction and fusion power distribution network equipment positioning identification method
CN110889418A (en) Gas contour identification method
CN114782322A (en) YOLOv5 model arc additive manufacturing molten pool defect detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant