CN113673509B - Instrument detection classification method based on image text - Google Patents

Instrument detection classification method based on image text Download PDF

Info

Publication number
CN113673509B
CN113673509B CN202110855223.6A CN202110855223A CN113673509B CN 113673509 B CN113673509 B CN 113673509B CN 202110855223 A CN202110855223 A CN 202110855223A CN 113673509 B CN113673509 B CN 113673509B
Authority
CN
China
Prior art keywords
layer
convolution
network
training
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110855223.6A
Other languages
Chinese (zh)
Other versions
CN113673509A (en
Inventor
田联房
王昭霖
杜启亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Original Assignee
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Zhuhai Institute of Modern Industrial Innovation of South China University of Technology filed Critical South China University of Technology SCUT
Priority to CN202110855223.6A priority Critical patent/CN113673509B/en
Publication of CN113673509A publication Critical patent/CN113673509A/en
Application granted granted Critical
Publication of CN113673509B publication Critical patent/CN113673509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an instrument detection classification method based on image text, which comprises the following steps: 1) Constructing an instrument positioning data set, improving YOLO network training, and outputting dial images by using a network; 2) Constructing a character detection data set, improving EAST network training, and outputting character images by using a network; 3) Constructing a character recognition data set, training a CRNN network, and outputting character information by using the network; 4) And constructing a text classification data set, training a textCNN network, and outputting the instrument type by using the network. The invention uses the neural network to realize the detection of the instrument and the detection and identification of the text information of the instrument, has higher precision and better generalization capability under different backgrounds, can accurately detect the instruments with different sizes, has no limitation of acquisition angles and distances, can solve the problems that the position of the instrument can be identified but the type of the instrument is difficult to distinguish in the field of machine vision by utilizing the character information on the instrument, and can detect the instrument and identify the type of the instrument.

Description

Instrument detection classification method based on image text
Technical Field
The invention relates to the technical field of image processing and neural networks, in particular to an instrument detection classification method based on image texts.
Background
The instrument is used as a monitoring device, mainly including pressure instruments, temperature instruments, flow instruments, electrical instruments and electronic measuring instruments, and is widely applied to various aspects of industrial production and social life, thereby providing great convenience for living generation. Compared with the manual classification method, the method has the advantages of wide application range and high classification efficiency, the method is also gradually mainstream along with the development of image processing technology and neural network technology, the key links in the method comprise positioning and identification of texts in the meters, and whether the accuracy of the positioning and identification information of the texts has important influence on the classification and dimension reading of the meters.
At present, research and implementation of instrument classification methods are mainly focused on classifying and training different types of instrument images by using a neural network, and the method has some defects, such as training of the instrument images to be identified, the degree of distinction of the different types of instruments on the images is not very large, and the identification and classification performance of the depth network on the different instruments is not ideal. At present, research and implementation of a character recognition method on an instrument are mainly focused on application of a traditional image processing technology, and character information is acquired through a series of image processing technologies including filtering, graying, thresholding, edge detection and template detection. With the rapid development of image processing and neural network technology in recent years, it is possible to use neural networks for character location recognition classification. The text detection mainly carries out quadrilateral frame positioning on character information through a neural network, text information reading on the instrument is realized through the text recognition algorithm, and the text information is classified through the text classification algorithm to obtain the type of the instrument.
By combining the discussion, the instrument detection classification method with real-time performance and high precision has higher practical application value.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides an instrument detection classification method based on image texts, which uses a neural network to realize instrument detection and instrument text information detection and identification, has higher precision and better generalization capability under different backgrounds, can accurately detect instruments with different sizes, has no limitation of acquisition angles and distances, can solve the problems that the instrument position can be identified in the field of machine vision but the instrument types are difficult to distinguish by utilizing character information on the instruments, and can detect the instruments and identify the instrument types.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows: an instrument detection classification method based on image text comprises the following steps:
1) Marking the positions of the dial plates by using the instrument images to construct an instrument positioning data set, dividing the instrument positioning data set into a training set and a testing set, recharging parameters, training an improved YOLO network by using the training set, obtaining an optimal improved YOLO network after training, inputting the testing set into the optimal improved YOLO network, outputting dial plate images and cutting out the dial plate images; wherein, the improved YOLO network optimizes the backbone network to be a mobilent lightweight network so as to reduce network parameters and calculated amount and improve operation speed;
2) Marking the character positions in the dial images cut out in the step 1) to construct a character detection data set, dividing the character detection data set into a training set and a test set, recharging training parameters, training an improved EAST network by using the training set, obtaining an optimal improved EAST network after training, inputting the test set into the optimal improved EAST network, outputting the character positions in the character detection data set, and cutting the character detection data set into character images; the improved EAST network is to change a backbone network into VGG to improve network detection accuracy, and the output layer structure modification prediction module only uses head elements to predict vertexes so as to improve the prediction performance of long characters;
3) Marking the character information in the character image cut in the step 2) to construct a character recognition data set, dividing the character recognition data set into a training set and a test set, recharging training parameters to train the CRNN network by using the training set, obtaining an optimal CRNN network after training, inputting the test set into the optimal CRNN network, and outputting the character information;
4) Splicing the character information output in the step 3) into a text, marking instrument types corresponding to the text to construct a text classification data set, dividing the text classification data set into a training set and a test set, recharging training parameters to train the textCNN network by using the training set, obtaining an optimal textCNN network after training, and inputting the test set into the optimal textCNN network to output the instrument types corresponding to the text.
In step 1), various instrument images under different environments are collected through a camera, filtering and image enhancement preprocessing operations are performed on the instrument images, abnormal data in the instrument images are removed, the abnormal data comprise data with surface dirt, illumination ends and incomplete shooting abnormality, the rest data are marked, the marked content is dial positions, an instrument positioning data set is constructed, and the instrument positioning data set is divided into a training set and a testing set.
Further, in step 1), the specific case of the improved YOLO network is as follows:
a. constructing a feature extraction network according to the real-time and high-precision requirements:
the first layer is a combined convolution module 1-A, which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the second layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the third layer is a combined convolution module 1-C, which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a combined convolution module 1-B, which consists of a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
The fifth layer is a combined convolution module 1-C, which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the sixth layer is a combined convolution module 1-B, which consists of a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the seventh layer is a combined convolution module 1-C, which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the eighth layer is a combined convolution module 1-D, which consists of five combined convolution modules 1-B;
the ninth layer is a combined convolution module 1-C, which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the tenth layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
b. constructing and outputting prediction networks for predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network;
b1, inputting a tenth layer of output of a feature extraction network, wherein the large-size target prediction network consists of a plurality of combination convolution modules and convolution layers, and has the following structure:
The first layer is a combined convolution module 1-D, which consists of five combined convolution modules 1-B;
the second layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the third layer is a convolution layer;
b2, inputting an eighth layer output of a characteristic extraction network and a first layer output of a large-size target prediction network, wherein the medium-size target prediction network consists of a plurality of combination convolution modules and convolution layers, and the structure is as follows:
the first layer is an input fusion module 1-E, which consists of a combined convolution module 1-B, an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 1-D, which consists of five combined convolution modules 1-B;
the third layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a convolution layer;
b3, taking the input as the sixth layer output of the characteristic extraction network and the second layer output of the medium-size target prediction network, wherein the small-size target prediction network consists of a plurality of combination convolution modules and convolution layers and has the following structure:
the first layer is an input fusion module 1-E, which consists of a combined convolution module 1-B, an up-sampling layer and a tensor splicing layer;
The second layer is a combined convolution module 1-D, which consists of five combined convolution modules 1-B;
the third layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a convolution layer;
finally, the output of the large-size target prediction network, the medium-size target prediction network and the small-size target prediction network is processed through a non-maximum suppression layer to obtain the predicted target position and category;
c. the loss function is set to have a center coordinate loss function, a wide-high loss function, a confidence loss function and a category loss function;
the center coordinate loss function formula is as follows:
Loss xy =mark object *(2-w*h)*Loss log (xy true ,xy predict )
in the Loss xy Representing center coordinate loss, mark object A flag bit representing whether an object exists in the anchor frame, w represents the width of the anchor frame, h represents the height of the anchor frame, and Loss log Representing a binary cross entropy loss, xy true Representing the true central coordinate value, xy predict Representing a predicted central coordinate value;
the wide-high loss function formula is as follows:
Loss wh =0.5*mark object *(2-w*h)*(wh true -wh predict ) 2
in the Loss wh Representing the loss of width and height,wh true representing the true width and height values wh predict Representing a predicted width and height value;
the confidence loss function formula is as follows:
Loss confidence =mark object *Loss log (mark object ,c predict )+(1-mark object )*Loss log (mark object ,c predict )*mark ignore
in the Loss confidence Representing confidence loss, c predict Confidence value representing prediction frame, mark ignore A flag bit representing an anchor block having an IOU less than a threshold;
the class loss function formula is as follows:
Loss cls =mark object *Loss log (cls true ,cls predict )
in the Loss cls Representing class loss, cls true Representing true class, cls predict Representing a prediction category;
the total loss function formula is as follows:
Loss=(Loss xy +Loss wh +Loss confidence +Loss cls )/numf
where Loss represents total Loss, numf represents floating point number of total input number;
loading training parameters to train the improved YOLO network, wherein the training parameters are set as follows: setting a training optimizer to Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark is the maximum training period or the average crossing ratio meeting the requirement;
inputting the test set into an optimal improved YOLO network to obtain the dial position and the dial image.
Further, in step 2), the specific case of the improved EAST network is as follows:
a. the method comprises the following steps of constructing a feature extraction network:
the first layer is a combined convolution module 2-B, which consists of two combined convolution modules 2-A and a maximum pooling layer, wherein the combined convolution module 2-A consists of a zero filling layer, a convolution layer and an activation layer;
the second layer is a combined convolution module 2-B, which consists of two combined convolution modules 2-A and a maximum pooling layer;
The third layer is a combined convolution module 2-C, which consists of three combined convolution modules 2-A and a maximum pooling layer;
the fourth layer is a combined convolution module 2-C, which consists of three combined convolution modules 2-A and a maximum pooling layer;
the fifth layer is a combined convolution module 2-C, which consists of three combined convolution modules 2-A and a maximum pooling layer;
b. the feature fusion network is constructed, and the structure is as follows:
the first layer is an input fusion module 2-G, which consists of an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 2-E, which consists of two batch normalization layers, a combined convolution module 2-D and a combined convolution module 2-A; wherein the combined convolution module 2-D consists of a zero padding layer, a convolution layer and an activation layer;
the third layer is an input fusion module 2-G, which consists of an up-sampling layer and a tensor splicing layer;
the fourth layer is a combined convolution module 2-E, which consists of two batch normalization layers, a combined convolution module 2-D and a combined convolution module 2-A;
the fifth layer is an input fusion module 2-G, which consists of an up-sampling layer and a tensor splicing layer;
the sixth layer is a combined convolution module 2-F, which consists of three batch normalization layers, one combined convolution module 2-D and two combined convolution modules 2-A;
c. The prediction network is constructed, and the structure is as follows:
the first layer is divided into three branches, and the first branch consists of a combined convolution module 2-D; the second branch consists of a combined convolution module 2-D; the third branch consists of a combined convolution module 2-D;
the second layer is an input fusion module which is formed by splicing three branches of the first layer;
d. the set loss function comprises a category loss function, a geometric shape loss function and an angle loss function;
the class loss function formula is as follows:
Figure BDA0003183841580000071
wherein L is S Representing the class loss, beta represents the weight,
Figure BDA0003183841580000072
is a predicted category, Y * Is a true category;
the geometry loss function formula is as follows:
Figure BDA0003183841580000073
wherein L is AABB Representing the loss of geometry and,
Figure BDA0003183841580000074
representing the geometry of the predictive quadrilateral text box AABB, R * Representing the geometry of a real quadrilateral text box AABB, ioU representing the intersection ratio;
the angle loss function formula is as follows:
Figure BDA0003183841580000075
wherein L is θ
Figure BDA0003183841580000076
θ * ) Is an angle loss->
Figure BDA0003183841580000077
Is the predicted value of the rotation angle theta * Is the actual value of the rotation angle;
training and improving EAST network by loading training parameters, wherein the training parameters are set as follows: setting a training optimizer to Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark is the maximum training period or the average crossing ratio meeting the requirement;
Inputting the test set into the optimal improved EAST network to obtain text positions, and cutting the text positions into character images.
Further, in step 3), the specific cases of the CRNN network are as follows:
a. the method comprises the following steps of constructing a feature extraction network:
the first layer is a combined convolution module 3-A, which consists of a zero filling layer, a convolution layer and an activation layer;
the second layer is a maximum pooling layer;
the third layer is a combined convolution module 3-A, which consists of a zero filling layer, a convolution layer and an activation layer;
the fourth layer is the largest pooling layer;
the fifth layer is a combined convolution module 3-B, which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the sixth layer is a combined convolution module 3-A, which consists of a zero filling layer, a convolution layer and an activation layer;
the seventh layer is the largest pooling layer;
the eighth layer is a combined convolution module 3-B, which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the ninth layer is a combined convolution module 3-A, which consists of a zero filling layer, a convolution layer and an activation layer;
the tenth layer is the largest pooling layer;
the eleventh layer is a combined convolution module 3-C, which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
b. The prediction network is constructed, and the structure is as follows:
the first layer is a cyclic convolution module, which consists of a bidirectional LSTM;
the second layer is a full-connection layer;
the third layer is a circular convolution module, which consists of a bidirectional LSTM;
the fourth layer is a full-connection layer;
c. setting a decoder to convert the output sequence into character information;
d. setting a loss function as a CTC (Connectionist Temporal Classification) loss function;
the CTC loss function formula is as follows:
L CTC =-ln∏ (x,z) p(z|x)=-∑ (x,z)∈S lnp(z|x)
wherein L is CTC Representing CTC loss, p (z|x) represents the probability of a given input x output sequence z, S being the training set;
training the CRNN network by loading training parameters, wherein the training parameters are set as follows: setting a training optimizer to Adam, an initial learning rate of 0.0001, a maximum training period of 100 and a batch size of 32; setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark meets the requirement for the maximum training period or the identification accuracy;
and inputting the test set into the optimal CRNN network to obtain character information.
Further, in step 4), the concrete case of the TextCNN network is as follows:
a. the network structure is constructed as follows:
the first layer is an embedded layer;
The second layer is a convolution module;
the third layer is a maximum pooling layer;
the fourth layer consists of a full connection layer, a Dropout layer and an activation layer;
the fifth layer consists of a full connection layer and an activation layer;
b. the loss function is set as multi-classification cross entropy, and the formula is as follows:
Figure BDA0003183841580000091
wherein L is CrossEntropy Represents loss, n represents category number, y i Representing the true probability of the corresponding i-category,
Figure BDA0003183841580000101
representing a prediction probability corresponding to the i category;
loading training parameters to train the textCNN network, wherein the training parameters are set as follows: setting a training optimizer adam, an initial learning rate of 0.001, iteration times of 1000 and batch size of 64; setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark is that the maximum iteration number and the accuracy meet the requirements;
and inputting the test set into an optimal textCNN network to obtain a corresponding instrument type.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention uses the neural network to realize instrument positioning and character detection and recognition on the instrument, and has higher precision and better generalization capability under different backgrounds compared with the traditional image processing method.
2. According to the invention, targets with different sizes can be predicted by using the improved YOLO network, the prediction frames with the most suitable sizes are comprehensively selected, the instruments with different sizes can be accurately detected, and the limitations of acquisition angles and distances are avoided.
3. Compared with other target detection networks, the improved EAST network has better detection performance and faster detection speed on characters, and has better detection effect on characters shot at different angles.
4. The invention can solve the problems that the position of the instrument can be identified in the field of machine vision but the type of the instrument is difficult to distinguish by utilizing the character information on the instrument to splice into the text, thereby not only detecting the instrument but also identifying the type of the instrument.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic block diagram of an improved YOLO network.
Fig. 3 is a schematic block diagram of an improved EAST network.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
As shown in fig. 1, the method for detecting and classifying the image text according to the embodiment includes the following steps:
1) Collecting instrument images shot in different actual scenes, removing interference data with fuzzy, angle extreme and instrument missing influence identification, marking dial positions in other data by using an open source marking tool labelImg, constructing an instrument positioning data set, dividing the instrument positioning data set into a training set and a testing set, reloading parameters, training the training set to improve a YOLO network, obtaining an optimal improved YOLO network after training, inputting the testing set into the optimal improved YOLO network, outputting dial images and cutting dial images; the improved YOLO network optimizes the backbone network to be a mobilent lightweight network so as to reduce network parameters and calculation amount and improve operation speed.
According to the specific application scene and the characteristics of the identification object, designing and improving the YOLO network, wherein the activation layer is a leakage Relu activation function if not additionally stated in the step, and the method comprises the following steps:
a. constructing a feature extraction network
And constructing a feature extraction network according to the real-time and high-precision requirements. The feature extraction network is mainly composed of a plurality of combined convolution modules.
The feature extraction network has the following structure:
the input image is 416×416×3.
The first layer is the combined convolution module 1-a, as shown in fig. 2 (a). The module first passes through zero padding layer and outputs 418 x 3. Then the convolution layer, the batch normalization layer and the activation layer are adopted, the convolution kernel is (3, 3), the step length is 2, the number of filters is 32, and the output is 208 multiplied by 32.
The second layer is the combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through the deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the filling is used to make the input and output size consistent, and the output is 208×208×32. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 64, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 208 multiplied by 64.
The third layer is the combined convolution module 1-C, as shown in fig. 2 (C). The module first passes through the zero padding layer and outputs 210 x 64. And then the depth convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (3, 3), the step size is 2, and the output is 104 multiplied by 64. Finally, through convolution, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step length is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 104 multiplied by 128.
The fourth layer is the combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through the deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the filling is used to make the input and output size consistent, and the output is 104×104×128. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 128, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 104 multiplied by 128.
The fifth layer is the combined convolution module 1-C, as shown in fig. 2 (C). The module first outputs 106×106×128 through zero padding layers. And then the depth convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (3, 3), the step size is 2, and the output is 52 multiplied by 128. Finally, through convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), step length is 1, the number of filters is 256, the input and output are consistent by using filling, and the output is 52 multiplied by 256.
The sixth layer is the combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through the deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the filling is used to make the input and output size consistent, and the output is 52×52×256. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 256, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 52 multiplied by 256.
The seventh layer is the combined convolution module 1-C, as shown in fig. 2 (C). The module first passes through zero padding layer and outputs 54×54×256. And then the depth convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (3, 3), the step size is 2, and the output is 26 multiplied by 256. Finally, through convolution, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512.
The eighth layer is the combined convolution module 1-D, as shown in fig. 2 (D). The modules pass through five combined convolution modules 1-B in sequence as shown in fig. 2 (B). In each combination convolution module 1-B, the input first goes through the deep convolution, the batch normalization layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, the padding is used to make the input and the output uniform in size, and the output is 26×26×512. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 26 multiplied by 512. After passing through the same combined convolution modules 1-B in turn, the output is 26×26×512.
The ninth layer is the combined convolution module 1-C, as shown in fig. 2 (C). The module first passes through the zero padding layer and outputs 28 x 512. Then through the depth convolution, the batch normalization layer and the activation layer, the convolution kernel is (3, 3), the step length is 2, and the output is 13 multiplied by 512. Finally, through convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), step length is 1, the number of filters is 1024, the size of input and output is consistent by using filling, and the output is 13×13×1024.
The tenth layer is the combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through the deep convolution, the batch normalization layer and the activation layer, the convolution kernel is (3, 3), the step length is 1, the filling is used to make the input and output size consistent, and the output is 13×13×1024. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 13 multiplied by 1024.
b. Constructing a predictive network
And constructing and outputting prediction networks for predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network.
b1, large-size target prediction network
The input is the tenth layer output of the feature extraction network, and the large-size target prediction network mainly comprises a plurality of combination convolution modules and convolution layers.
The input image is 13×13×1024.
The large-size target prediction network has the following structure:
the first layer is the combined convolution module 1-D, as shown in fig. 2 (D). The modules pass through five combined convolution modules 1-B in sequence as shown in fig. 2 (B). In the first combined convolution module 1-B, the input is first subjected to a deep convolution, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the padding is used to make the input and output uniform in size, and the output is 13×13×1024. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 13 multiplied by 512. In the second combined convolution module 1-B, the input is first subjected to a depth convolution, a batch normalization layer and an activation layer, the convolution kernel is (3, 3), the step size is 1, the padding is used to make the input and output uniform in size, and the output is 13×13×512. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 13 multiplied by 1024. After the two different parameters of the combined convolution modules 1-B are alternately input, the output is 13 multiplied by 512.
The second layer is the combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through the deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the filling is used to make the input and output size consistent, and the output is 13×13×512. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 13 multiplied by 1024.
The third layer is a convolution layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 256, and the output is 13×13×256.
b2, medium-sized target prediction network
The input is the eighth layer output of the feature extraction network and the first layer output of the large-size target prediction network, and the medium-size target prediction network mainly comprises a plurality of combination convolution modules and convolution layers.
The input images are 26×26×512 and 13×13×512.
The medium-size target prediction network structure is as follows:
the first layer is the input fusion module, as shown in fig. 2 (e). The input 13×13×512 is first passed through a combined convolution module 1-B, where the depth convolution, batch normalization layer and activation layer are first passed, the convolution kernel is (1, 1), the step size is 1, the padding is used to make the input and output uniform in size, and the output is 13×13×512. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 13 multiplied by 512. And then through the up-sampling layer, the sampling factor is 2, and the output is 26 multiplied by 512. Finally, the output and input 26×26×512 pass through the tensor mosaic layer, and the output is 26×26×1024.
The second layer is the combined convolution module 1-D, as shown in fig. 2 (D). The modules pass through five combined convolution modules 1-B in sequence as shown in fig. 2 (B). In the first combined convolution module 1-B, the input is first subjected to a depth convolution, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the padding is used to make the input and output uniform in size, and the output is 26×26×1024. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 256, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 26 multiplied by 256. In the second combined convolution module 1-B, the input is first subjected to a depth convolution, a batch normalization layer and an activation layer, the convolution kernel is (3, 3), the step size is 1, the padding is used to make the input and output uniform in size, and the output is 26×26×256. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 26 multiplied by 512. After the two different parameters of the combined convolution modules 1-B are alternately input, the output is 26 multiplied by 256.
The third layer is the combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through the deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the filling is used to make the input and output size consistent, and the output is 26×26×256. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 26 multiplied by 512.
The fourth layer is a convolution layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 256, and the output is 26×26×256.
b3, small-sized target prediction network
The input is the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, and the small-size target prediction network mainly comprises a plurality of combination convolution modules and convolution layers.
The input images are 52×52×256 and 26×26×256.
The small-size target prediction network structure is as follows:
the first layer is the input fusion module, as shown in fig. 2 (e). The input 26×26×256 is first passed through a combined convolution module 1-B, where the depth convolution, batch normalization layer and activation layer are first passed, the convolution kernel is (1, 1), the step size is 1, the padding is used to make the input and output uniform in size, and the output is 26×26×256. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 256, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 26 multiplied by 256. And then through the up-sampling layer, the sampling factor is 2, and the output is 52 multiplied by 256. Finally, the output and input 52×52×256 pass through the tensor mosaic layer, and the output is 52×52×512.
The second layer is the combined convolution module 1-D, as shown in fig. 2 (D). The modules pass through five combined convolution modules 1-B in sequence as shown in fig. 2 (B). In the first combined convolution module 1-B, the input is first subjected to a deep convolution, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the padding is used to make the input and output uniform in size, and the output is 52×52×512. And then the convolution, the batch normalization layer and the activation layer are carried out, wherein the convolution kernel is (1, 1), the step length is 1, the number of filters is 128, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 52 multiplied by 128. In the second combined convolution module 1-B, the input is first subjected to a depth convolution, a batch normalization layer and an activation layer, the convolution kernel is (3, 3), the step size is 1, the padding is used to make the input and output uniform in size, and the output is 52×52×128. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 256, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 52 multiplied by 256. After the two different parameters of the combined convolution modules 1-B are alternately input, the output is 52 multiplied by 128.
The third layer is the combined convolution module 1-B, as shown in fig. 2 (B). The module first goes through the deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the filling is used to make the input and output size consistent, and the output is 52×52×128. And then the convolution, the batch normalization layer and the activation layer are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 256, the filling is used to ensure that the sizes of the input and the output are consistent, and the output is 52 multiplied by 256.
The fourth layer is a convolution layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 256, and the output is 52×52×256.
Finally, the output 13×13×256 of the large-size target prediction network, the output 26×26×256 of the medium-size target prediction network and the output 52×52×256 of the small-size target prediction network are processed by a non-maximum suppression layer to obtain the predicted target position and category.
c. Setting a loss function
The loss function is set as the sum average of the center coordinate loss function, the wide-high loss function, the confidence loss and the category loss function. The loss function formula is as follows:
Loss=(Loss xy +Loss wh +Loss confidence +Loss cls )/numf
wherein Loss represents total Loss, loss xy Representing center coordinate Loss, loss wh Representing the Loss of width and height, loss confidence Representing confidence Loss, loss cls Representing class loss, numf represents the floating point number of the total number of inputs. The respective loss function formulas are as follows:
Loss xy =mark object *(2-w*h)*Loss log (xy true ,xy predict )
Loss wh =0.5*mark object *(2-w*h)*(wh true -wh predict ) 2
Loss confidence =mark object *Loss log (mark object ,c predict )+(1-mark object )*Loss log (mark object ,c predict )*mark ignore
Loss cls =mark object *Loss log (cls true ,cls predict )
wherein mark is a mark object A flag bit representing whether an object exists in the anchor frame, w represents the width of the anchor frame, h represents the height of the anchor frame, and Loss log Representing a binary cross entropy loss, xy true Representing the true central coordinate value, xy predict Representing the predicted central coordinate value wh true Representing the true width and height values wh predict Representing the predicted width and height values, c predict Confidence value representing prediction frame, mark ignore Flag bit, cls, representing anchor block with IOU less than threshold true Representing true class, cls predict Representing the prediction category.
Training an improved YOLO network comprising the steps of:
d1, setting training parameters
Setting the training optimizer to Adam, initial learning rate of 0.001, iteration number of 500, batch size of 8, and K means clustering all tags to generate initial prior boxes of (38, 29), (65, 52), (94, 87), (142, 134), (195, 69), (216, 206), (337, 320), (397, 145), (638, 569).
d2, on-line data enhancement
The data enhancement is carried out on the input image, the data set is expanded, and the data enhancement method comprises the following steps: random mirror-inversion, random noise addition, random contrast adjustment.
d3, setting training completion mark
And setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark is 500 for reaching the maximum iteration times and the accuracy meets the requirements.
Inputting the test set into an optimal improved YOLO network to obtain the dial position and the dial image.
2) Marking the character positions in the dial images cut out in the step 1) to construct a character detection data set, dividing the character detection data set into a training set and a test set, recharging training parameters, training an improved EAST network by using the training set, obtaining an optimal improved EAST network after training, inputting the test set into the optimal improved EAST network, outputting the character positions in the character detection data set, and cutting the character detection data set into character images; the improved EAST network is to change the backbone network into VGG to improve network detection accuracy, and the output layer structure modification prediction module only uses head elements to predict vertexes so as to improve the prediction performance of long characters.
According to the specific application scene and the characteristics of the identification object, the EAST network is designed and improved, and the following activation layers are all Relu activation functions if not additionally stated. The method comprises the following steps:
a. constructing a feature extraction network
The feature extraction network has the following structure:
The input image is 256×256×3.
The first layer is a combined convolution module 2-B, which, as shown in fig. 3 (B), consists of two combined convolution modules 2-a and one maximally pooled layer. The first combined convolution module 2-a outputs 258×258×3 through zero padding layer, and outputs 256×256×64 through convolution layer and activation layer, with convolution kernel (3, 3), step size 1, number of filters 64. The second convolution module 2-a outputs 258×258×64 through zero padding layer, and outputs 256×256×64 through convolution layer and activation layer, convolution kernel is (3, 3), step size is 1, filter number is 64. And then passing through a maximum pooling layer, wherein the pooling core size is (2, 2) step length is 2, and the output is 128 multiplied by 64.
The second layer is a combined convolution module 2-B, and consists of two combined convolution modules 2-A and a maximum pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 128. The second convolution module 2-a passes through the zero-padding layer first, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 128. And then passing through a maximum pooling layer, wherein the pooling core size is (2, 2) step length is 2, and the output is 64 multiplied by 128.
The third layer is a combined convolution module 2-C, which, as shown in fig. 3 (C), consists of three combined convolution modules 2-a and one max pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 256. The second convolution module 2-a passes through the zero-padding layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 256. The third convolution module 2-a passes through the zero-padding layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 256. And then passing through a maximum pooling layer, wherein the pooling core size is (2, 2) step length is 2, and the output is 32 multiplied by 256.
The fourth layer is a combined convolution module 2-C, and consists of three combined convolution modules 2-A and a maximum pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 512. The second convolution module 2-a passes through the zero-padding layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 512. The third convolution module 2-a passes through the zero-padding layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 512. And then passing through a maximum pooling layer, wherein the pooling core size is (2, 2) step length is 2, and the output is 16 multiplied by 512.
The fifth layer is a combined convolution module 2-C, and consists of three combined convolution modules 2-A and a maximum pooling layer. The first combined convolution module 2-a passes through the zero-padding layer first, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 512. The second convolution module 2-a passes through the zero-padding layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 512. The third convolution module 2-a passes through the zero-padding layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step size is 1, and the number of filters is 512. And then passing through a maximum pooling layer, wherein the pooling core size is (2, 2), the step size is 2, and the output is 8 multiplied by 512.
b. Constructing feature fusion networks
The first layer is the input fusion module 2-G, as shown in FIG. 3 (G). The last layer of the feature extraction network outputs 8×8×512, first goes through the upsampling layer, with a sampling factor of 2, and outputs 16×16×512. The output and feature extraction network fourth layer output is 16×16×512, and the output is 16×16×1024 via tensor stitching layer.
The second layer is a combined convolution module 2-E, which, as shown in FIG. 3 (E), consists of two batch normalization layers, one combined convolution module 2-D and one combined convolution module 2-A. The combined convolution module 2-D consists of one zero padding layer, one convolution layer and one activation layer. Firstly, a batch normalization layer is passed, and then a combined convolution module 2-D is passed. The combined convolution module 2-D passes through the zero filling layer, then passes through the convolution layer and the activation layer, the convolution kernel is (1, 1), the step length is 1, the number of filters is 128, and the output is 16 multiplied by 128. Then passes through a batch normalization layer and then passes through a combined convolution module 2-A. The combined convolution module 2-a passes through the zero padding layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step length is 1, the number of filters is 64, and the output is 16×16×64.
And the third layer is an input fusion module 2-G. The second layer output of the feature fusion network is 16×16×64, first goes through the upsampling layer, the sampling factor is 2, and the output is 32×32×64. The output and feature extraction network third layer outputs 32×32×256 through the tensor stitching layer, and outputs 32×32×320.
The fourth layer is a combined convolution module 2-E. Firstly, a batch normalization layer is passed, and then a combined convolution module 2-D is passed. The combined convolution module 2-D passes through the zero filling layer, then passes through the convolution layer and the activation layer, the convolution kernel is (1, 1), the step length is 1, the number of filters is 128, and the output is 32 multiplied by 128. Then passes through a batch normalization layer and then passes through a combined convolution module 1-A. The combined convolution module 1-A passes through the zero filling layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step length is 1, the number of filters is 64, and the output is 32 multiplied by 64.
The fifth layer is the input fusion module 1-G. The feature fusion network second layer output 32 x 64 first goes through the upsampling layer, the sampling factor is set to be 2, the output is 64 x 64. The output and feature extraction network second layer output is 64×64×128, and the output is 64×64×192 via tensor stitching layer.
The sixth layer is a combined convolution module 1-F, which, as shown in FIG. 3 (F), consists of three batch normalization layers, one combined convolution module 1-D and two combined convolution modules 1-A. First through a batch normalization layer and then through the combined convolution module 1-D. The combined convolution module 1-D passes through the zero filling layer, then passes through the convolution layer and the activation layer, the convolution kernel is (1, 1), the step length is 1, the number of filters is 32, and the output is 64 multiplied by 32. Then passes through a batch normalization layer and then passes through a combined convolution module 1-A. The combined convolution module 1-A passes through the zero filling layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step length is 1, the number of filters is 32, and the output is 64 multiplied by 32. Then passes through a batch normalization layer and then passes through a combined convolution module 1-A. The combined convolution module 1-A passes through the zero filling layer, then passes through the convolution layer and the activation layer, the convolution kernel is (3, 3), the step length is 1, the number of filters is 32, and the output is 64 multiplied by 32.
c. Constructing a predictive network
The first layer has three branches, the first branch is composed of a combined convolution module 1-D, first, a zero filling layer is passed through, then, a convolution layer and an activation layer are passed through, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1, and the output is 64 multiplied by 1. The second branch consists of a combined convolution module 1-D, which passes through a zero filling layer, a convolution layer and an activation layer, the convolution kernel is (1, 1), the step length is 1, the number of filters is 2, and the output is 64 multiplied by 2. The third branch consists of a combined convolution module 1-D, which passes through a zero filling layer, a convolution layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 4, and the output is 64 multiplied by 4.
The second layer is an input fusion module which is formed by splicing three branches of the first layer of the prediction network, and the output is 64 multiplied by 7.
d. Setting a loss function
The loss function is set to be the sum of category loss, geometry loss, and angle loss.
The class loss function formula is as follows:
Figure BDA0003183841580000211
wherein L is S Representing the class loss, beta represents the weight,
Figure BDA0003183841580000212
is a predicted category, Y * Is a true category.
The geometry loss function formula is as follows:
Figure BDA0003183841580000213
wherein L is AABB Representing a geometric shape loss function,
Figure BDA0003183841580000214
representing the predicted AABB geometry, R * Representing real AABBIoU represents the cross-over ratio.
The angle loss function formula is as follows:
Figure BDA0003183841580000215
wherein L is θ
Figure BDA0003183841580000216
θ * ) Is an angle loss function, +>
Figure BDA0003183841580000217
Is the prediction of rotation angle, θ * Is the real case of the rotation angle.
Training an improved EAST network comprising the steps of:
e1, setting training parameters
Setting a training optimizer, an initial learning rate, iteration times, batch sizes and an initial priori frame.
e2, on-line data enhancement
The data enhancement is carried out on the input image, the data set is expanded, and the main method for data enhancement is as follows: noise is added randomly, and contrast is adjusted randomly.
e3, setting training completion mark
And setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition for achieving the maximum iteration times and meeting the accuracy requirement by a training completion mark.
Inputting the test set into the optimal improved EAST network to obtain the character image.
3) Labeling the character information in the character image cut in the step 2) to construct a character recognition data set, dividing the character recognition data set into a training set and a test set, recharging training parameters to train the CRNN network by using the training set, obtaining an optimal CRNN network after training, inputting the test set into the optimal CRNN network, and outputting the character information.
The CRNN network is constructed, which comprises the following steps:
a. constructing a feature extraction network
The input image is w×32×1, where w is the width of the input image, and is adaptively changed according to the input picture size.
The first layer is a combined convolution module 3-A, which passes through a zero filling layer, a convolution layer and an activation layer, the convolution kernel is (3, 3), the step length is 1, the filter is 64, and the output is w multiplied by 32 multiplied by 64.
The second layer is the largest pooling layer, the pooling core size is (2, 2), the step length is 2, and the output is
Figure BDA0003183841580000221
The third layer is a combined convolution module 3-A, which passes through a zero filling layer, a convolution layer and an activation layer, the convolution kernel is (3, 3), the step length is 1, the filter is 128, and the output is
Figure BDA0003183841580000222
The fourth layer is the largest pooling layer, the pooling core size is (2, 2), the step length is 2, and the output is
Figure BDA0003183841580000223
The fifth layer is a combined convolution module 3-B, which firstly passes through a zero filling layer, then passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (3, 3), the step length is 1, the filter is 256, and the output is
Figure BDA0003183841580000224
The sixth layer is a combined convolution module 3-A, which passes through a zero filling layer, a convolution layer and an activation layer, the convolution kernel is (3, 3), the step length is 1, the filter is 256, and the output is
Figure BDA0003183841580000225
The seventh layer is the largest pooling layer, the pooling core size is (2, 2), the step length is 2, and the output is
Figure BDA0003183841580000226
The eighth layer is a combined convolution module 3-B, firstly passes through a zero filling layer, then passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (3, 3), the step length is 1, the filter is 512, and the output is
Figure BDA0003183841580000227
The ninth layer is a combined convolution module 3-A, and passes through a zero filling layer, a convolution layer and an activation layer, wherein the convolution kernel is (3, 3), the step length is 1, the filter is 512, and the output is
Figure BDA0003183841580000228
The tenth layer is the largest pooling layer, the pooling core size is (2, 2), the step length is 2, and the output is
Figure BDA0003183841580000229
The eleventh layer is a combined convolution module 3-C, and is subjected to a convolution layer, a batch normalization layer and an activation layer, wherein the convolution kernel is (2, 2), the step length is 1, the filter is 512, and the output is
Figure BDA0003183841580000231
b. Constructing a predictive network
The first layer is a circular convolution module, which consists of a bidirectional LSTM, the output is
Figure BDA0003183841580000232
The second layer is a full-connection layer, and the output is
Figure BDA0003183841580000233
The third layer is a circular convolution module, which consists of a bidirectional LSTM, and the output is
Figure BDA0003183841580000234
The fourth layer is a full-connection layer, and the output is
Figure BDA0003183841580000235
/>
c. Setting up a decoder
Converting output of predictive network into
Figure BDA0003183841580000236
Each element ranging from 0 to 6735, respectively corresponding to an independent character (wherein 0 corresponds to a null character), corresponding to dividing a line of text into +. >
Figure BDA0003183841580000237
A block is predicted by the characters. And processing the sequence from left to right, and outputting character information according to the element value corresponding to the character library when the element is not 0 and is the same as the last element.
d. Setting a loss function
The loss function was set to CTC (Connectionist Temporal Classification) loss function.
The CTC loss function formula is as follows:
L CTC =-lnΠ (x,z) p(z|x)=-∑ (x,z)∈S lnp(z|x)
wherein L is CTC Representing CTC loss function, p (z|x) represents the probability of a given input x output sequence z, S being the training set.
Training a CRNN network comprising the steps of:
e1, setting training parameters
Setting a training optimizer, an initial learning rate, iteration times and batch sizes.
e2, setting training completion mark
And setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition for achieving the maximum iteration times and meeting the accuracy requirement by a training completion mark.
And inputting the test set into the optimal CRNN network to obtain character information.
4) Splicing the character information output in the step 3) into a text, marking instrument types corresponding to the text to construct a text classification data set, dividing the text classification data set into a training set and a test set, recharging training parameters to train the textCNN network by using the training set, obtaining an optimal textCNN network after training, and inputting the test set into the optimal textCNN network to output the instrument types corresponding to the text.
The textCNN network is constructed, comprising the following steps:
a. constructing a network structure:
the first layer is an embedded layer, the length m of an input text is input, and the word vector is 600 multiplied by 64, namely the input tensor;
the second layer is a convolution module, the convolution kernel is (5, 5), the step length is 1, the number of filters is 256, and the output is 596 multiplied by 256;
the third layer is the largest pooling layer, and the output is 1 multiplied by 256;
the fourth layer consists of a full connection layer, a Dropout layer and an activation layer, and the output is 1 multiplied by 128;
the fifth layer consists of a full-connection layer and an activation layer, the output is 1 Xcls, and cls is the category number;
b. setting a loss function
Setting the loss function as multi-class cross entropy
Figure BDA0003183841580000241
Wherein L is CrossEntropy Represents loss, n represents category number, y i Representing the true probability of the corresponding i-category,
Figure BDA0003183841580000242
representing the predicted probability for the corresponding i category.
Training TextCNN network, comprising the steps of:
c1, setting training parameters
Setting a training optimizer, an initial learning rate as the number of iterations and the batch size.
c2, setting training completion mark
And setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition for achieving the maximum iteration times and meeting the accuracy requirement by a training completion mark.
And inputting the test set into an optimal instrument type corresponding to the textCNN network output text.
In summary, after the scheme is adopted, the invention provides a new method for detecting and classifying the instrument images, and the neural network is used as an effective method for detecting and classifying the instruments, so that the problem that the instrument types are difficult to read can be effectively solved, the development of an automatic instrument identification technology is effectively promoted, and the method has practical popularization value and is worthy of popularization.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (5)

1. The instrument detection classification method based on the image text is characterized by comprising the following steps of:
1) Marking the positions of the dial plates by using the instrument images to construct an instrument positioning data set, dividing the instrument positioning data set into a training set and a testing set, recharging parameters, training an improved YOLO network by using the training set, obtaining an optimal improved YOLO network after training, inputting the testing set into the optimal improved YOLO network, outputting dial plate images and cutting out the dial plate images; wherein, the improved YOLO network optimizes the backbone network to be a mobilent lightweight network so as to reduce network parameters and calculated amount and improve operation speed;
2) Marking the character positions in the dial images cut out in the step 1) to construct a character detection data set, dividing the character detection data set into a training set and a test set, recharging training parameters, training an improved EAST network by using the training set, obtaining an optimal improved EAST network after training, inputting the test set into the optimal improved EAST network, outputting the character positions in the character detection data set, and cutting the character detection data set into character images; the improved EAST network is to change a backbone network into VGG to improve network detection accuracy, and the output layer structure modification prediction module only uses head elements to predict vertexes so as to improve the prediction performance of long characters;
the specific cases of the improved EAST network are as follows:
a. the method comprises the following steps of constructing a feature extraction network:
the first layer is a combined convolution module 2-B, which consists of two combined convolution modules 2-A and a maximum pooling layer, wherein the combined convolution module 2-A consists of a zero filling layer, a convolution layer and an activation layer;
the second layer is a combined convolution module 2-B, which consists of two combined convolution modules 2-A and a maximum pooling layer;
the third layer is a combined convolution module 2-C, which consists of three combined convolution modules 2-A and a maximum pooling layer;
The fourth layer is a combined convolution module 2-C, which consists of three combined convolution modules 2-A and a maximum pooling layer;
the fifth layer is a combined convolution module 2-C, which consists of three combined convolution modules 2-A and a maximum pooling layer;
b. the feature fusion network is constructed, and the structure is as follows:
the first layer is an input fusion module 2-G, which consists of an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 2-E, which consists of two batch normalization layers, a combined convolution module 2-D and a combined convolution module 2-A; wherein the combined convolution module 2-D consists of a zero padding layer, a convolution layer and an activation layer;
the third layer is an input fusion module 2-G, which consists of an up-sampling layer and a tensor splicing layer;
the fourth layer is a combined convolution module 2-E, which consists of two batch normalization layers, a combined convolution module 2-D and a combined convolution module 2-A;
the fifth layer is an input fusion module 2-G, which consists of an up-sampling layer and a tensor splicing layer;
the sixth layer is a combined convolution module 2-F, which consists of three batch normalization layers, one combined convolution module 2-D and two combined convolution modules 2-A;
c. The prediction network is constructed, and the structure is as follows:
the first layer is divided into three branches, and the first branch consists of a combined convolution module 2-D; the second branch consists of a combined convolution module 2-D; the third branch consists of a combined convolution module 2-D;
the second layer is an input fusion module which is formed by splicing three branches of the first layer;
d. the set loss function comprises a category loss function, a geometric shape loss function and an angle loss function;
the class loss function formula is as follows:
Figure FDA0004178469460000021
wherein L is S Representing the class loss, beta represents the weight,
Figure FDA0004178469460000022
is a predicted category, Y * Is a true category; />
The geometry loss function formula is as follows:
Figure FDA0004178469460000023
wherein L is AABB Representing the loss of geometry and,
Figure FDA0004178469460000024
representing the geometry of the predictive quadrilateral text box AABB, R * Representing the geometry of a real quadrilateral text box AABB, ioU representing the intersection ratio;
the angle loss function formula is as follows:
Figure FDA0004178469460000025
in the method, in the process of the invention,
Figure FDA0004178469460000026
is an angle loss->
Figure FDA0004178469460000027
Is the predicted value of the rotation angle theta * Is the actual value of the rotation angle;
training and improving EAST network by loading training parameters, wherein the training parameters are set as follows: setting a training optimizer to Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark is the maximum training period or the average crossing ratio meeting the requirement;
Inputting the test set into an optimal improved EAST network to obtain a text position, and cutting the text position into character images;
3) Marking the character information in the character image cut in the step 2) to construct a character recognition data set, dividing the character recognition data set into a training set and a test set, recharging training parameters to train the CRNN network by using the training set, obtaining an optimal CRNN network after training, inputting the test set into the optimal CRNN network, and outputting the character information;
4) Splicing the character information output in the step 3) into a text, marking instrument types corresponding to the text to construct a text classification data set, dividing the text classification data set into a training set and a test set, recharging training parameters to train the textCNN network by using the training set, obtaining an optimal textCNN network after training, and inputting the test set into the optimal textCNN network to output the instrument types corresponding to the text.
2. The method for detecting and classifying the instruments based on the image texts according to claim 1 is characterized in that in the step 1), various instrument images under different environments are collected through a camera, filtering and image enhancement preprocessing operations are carried out on the instrument images, abnormal data in the instrument images are removed, the abnormal data including data with surface dirt, illumination extreme and abnormal shooting are included, the rest data are marked, the marked content is the dial position, an instrument positioning data set is constructed, and the instrument positioning data set is divided into a training set and a test set.
3. The method of image text based meter test classification of claim 1, wherein in step 1), said improved YOLO network is specified as follows:
a. constructing a feature extraction network according to the real-time and high-precision requirements:
the first layer is a combined convolution module 1-A, which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the second layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the third layer is a combined convolution module 1-C, which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a combined convolution module 1-B, which consists of a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fifth layer is a combined convolution module 1-C, which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the sixth layer is a combined convolution module 1-B, which consists of a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;
The seventh layer is a combined convolution module 1-C, which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the eighth layer is a combined convolution module 1-D, which consists of five combined convolution modules 1-B;
the ninth layer is a combined convolution module 1-C, which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the tenth layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
b. constructing and outputting prediction networks for predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network;
b1, inputting a tenth layer of output of a feature extraction network, wherein the large-size target prediction network consists of a plurality of combination convolution modules and convolution layers, and has the following structure:
the first layer is a combined convolution module 1-D, which consists of five combined convolution modules 1-B;
the second layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
The third layer is a convolution layer;
b2, inputting an eighth layer output of a characteristic extraction network and a first layer output of a large-size target prediction network, wherein the medium-size target prediction network consists of a plurality of combination convolution modules and convolution layers, and the structure is as follows:
the first layer is an input fusion module 1-E, which consists of a combined convolution module 1-B, an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 1-D, which consists of five combined convolution modules 1-B;
the third layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
the fourth layer is a convolution layer;
b3, taking the input as the sixth layer output of the characteristic extraction network and the second layer output of the medium-size target prediction network, wherein the small-size target prediction network consists of a plurality of combination convolution modules and convolution layers and has the following structure:
the first layer is an input fusion module 1-E, which consists of a combined convolution module 1-B, an up-sampling layer and a tensor splicing layer;
the second layer is a combined convolution module 1-D, which consists of five combined convolution modules 1-B;
the third layer is a combined convolution module 1-B, which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;
The fourth layer is a convolution layer;
finally, the output of the large-size target prediction network, the medium-size target prediction network and the small-size target prediction network is processed through a non-maximum suppression layer to obtain the predicted target position and category;
c. the loss function is set to have a center coordinate loss function, a wide-high loss function, a confidence loss function and a category loss function;
the center coordinate loss function formula is as follows:
Loss xy =mark object *(2-w*h)*Loss log (xy true ,xy predict )
in the Loss xy Representing center coordinate loss, mark object A flag bit representing whether an object exists in the anchor frame, w represents the width of the anchor frame, h represents the height of the anchor frame, and Loss log Representing a binary cross entropy loss, xy true Representing the true central coordinate value, xy predict Representing a predicted central coordinate value;
the wide-high loss function formula is as follows:
Loss wh =0.5*mark object *(2-w*h)*(wh true -wh predict ) 2
in the Loss wh Represents the loss of width and height, wh true Representing the true width and height values wh predict Representing a predicted width and height value;
the confidence loss function formula is as follows:
Loss confidence =mark object *Loss log (mark object ,c predict )
+(1-mark object )*Loss log (mark object ,c predict )*mark ignore
in the Loss confidence Representing confidence loss, c predict Confidence value representing prediction frame, mark ignore A flag bit representing an anchor block having an IOU less than a threshold;
the class loss function formula is as follows:
Loss cls =mark object *Loss log (cls true ,cls predict )
in the Loss cls Representing class loss, cls true Representing true class, cls predict Representing a prediction category;
the total loss function formula is as follows:
Loss=(Loss xy +Loss wh +Loss confidence +Loss cls )/numf
Where Loss represents total Loss, numf represents floating point number of total input number;
loading training parameters to train the improved YOLO network, wherein the training parameters are set as follows: setting a training optimizer to Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark is the maximum training period or the average crossing ratio meeting the requirement;
inputting the test set into an optimal improved YOLO network to obtain the dial position and the dial image.
4. The method of claim 1, wherein in step 3), the CRNN network is as follows:
a. the method comprises the following steps of constructing a feature extraction network:
the first layer is a combined convolution module 3-A, which consists of a zero filling layer, a convolution layer and an activation layer;
the second layer is a maximum pooling layer;
the third layer is a combined convolution module 3-A, which consists of a zero filling layer, a convolution layer and an activation layer;
the fourth layer is the largest pooling layer;
the fifth layer is a combined convolution module 3-B, which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
The sixth layer is a combined convolution module 3-A, which consists of a zero filling layer, a convolution layer and an activation layer;
the seventh layer is the largest pooling layer;
the eighth layer is a combined convolution module 3-B, which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
the ninth layer is a combined convolution module 3-A, which consists of a zero filling layer, a convolution layer and an activation layer;
the tenth layer is the largest pooling layer;
the eleventh layer is a combined convolution module 3-C, which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;
b. the prediction network is constructed, and the structure is as follows:
the first layer is a cyclic convolution module, which consists of a bidirectional LSTM;
the second layer is a full-connection layer;
the third layer is a circular convolution module, which consists of a bidirectional LSTM;
the fourth layer is a full-connection layer;
c. setting a decoder to convert the output sequence into character information;
d. setting a loss function as a CTC (Connectionist Temporal Classification) loss function;
the CTC loss function formula is as follows:
L CTC +lnΠ (x,z) p(z|x)=-∑ (x,z)∈S lnp(z|x)
wherein L is CTC Representing CTC loss, p (z|x) represents the probability of a given input x output sequence z, S being the training set;
Training the CRNN network by loading training parameters, wherein the training parameters are set as follows: setting a training optimizer to Adam, an initial learning rate of 0.0001, a maximum training period of 100 and a batch size of 32; setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark meets the requirement for the maximum training period or the identification accuracy;
and inputting the test set into the optimal CRNN network to obtain character information.
5. The method of claim 1, wherein in step 4), the TextCNN network is as follows:
a. the network structure is constructed as follows:
the first layer is an embedded layer;
the second layer is a convolution module;
the third layer is a maximum pooling layer;
the fourth layer consists of a full connection layer, a Dropout layer and an activation layer;
the fifth layer consists of a full connection layer and an activation layer;
b. the loss function is set as multi-classification cross entropy, and the formula is as follows:
Figure FDA0004178469460000081
wherein L is CrossEntropy Represents loss, n represents category number, y i Representing the true probability of the corresponding i-category,
Figure FDA0004178469460000082
representing a prediction probability corresponding to the i category;
loading training parameters to train the textCNN network, wherein the training parameters are set as follows: setting a training optimizer adam, an initial learning rate of 0.001, iteration times of 1000 and batch size of 64; setting the interval detection training accuracy of the verification set, and storing a network after the training is completed to the optimal condition, wherein the training completion mark is that the maximum iteration number and the accuracy meet the requirements;
And inputting the test set into an optimal textCNN network to obtain a corresponding instrument type.
CN202110855223.6A 2021-07-28 2021-07-28 Instrument detection classification method based on image text Active CN113673509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110855223.6A CN113673509B (en) 2021-07-28 2021-07-28 Instrument detection classification method based on image text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110855223.6A CN113673509B (en) 2021-07-28 2021-07-28 Instrument detection classification method based on image text

Publications (2)

Publication Number Publication Date
CN113673509A CN113673509A (en) 2021-11-19
CN113673509B true CN113673509B (en) 2023-06-09

Family

ID=78540390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110855223.6A Active CN113673509B (en) 2021-07-28 2021-07-28 Instrument detection classification method based on image text

Country Status (1)

Country Link
CN (1) CN113673509B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936280B (en) * 2021-11-23 2024-04-05 河海大学 Automatic character recognition system and method for code disc of embedded instrument
CN115424121B (en) * 2022-07-30 2023-10-13 南京理工大学紫金学院 Electric power pressing plate switch inspection method based on computer vision
CN116416626B (en) * 2023-06-12 2023-08-29 平安银行股份有限公司 Method, device, equipment and storage medium for acquiring circular seal data
CN116958998B (en) * 2023-09-20 2023-12-26 四川泓宝润业工程技术有限公司 Digital instrument reading identification method based on deep learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision
CN110543878A (en) * 2019-08-07 2019-12-06 华南理工大学 pointer instrument reading identification method based on neural network
CN111062282A (en) * 2019-12-05 2020-04-24 武汉科技大学 Transformer substation pointer type instrument identification method based on improved YOLOV3 model
CN111368825A (en) * 2020-02-25 2020-07-03 华南理工大学 Pointer positioning method based on semantic segmentation
CN111401358A (en) * 2020-02-25 2020-07-10 华南理工大学 Instrument dial plate correction method based on neural network
CN111639643A (en) * 2020-05-22 2020-09-08 深圳市赛为智能股份有限公司 Character recognition method, character recognition device, computer equipment and storage medium
CN111814919A (en) * 2020-08-31 2020-10-23 江西小马机器人有限公司 Instrument positioning and identifying system based on deep learning
CN112801094A (en) * 2021-02-02 2021-05-14 中国长江三峡集团有限公司 Pointer instrument image inclination correction method
CN112861867A (en) * 2021-02-01 2021-05-28 北京大学 Pointer type instrument panel identification method, system and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision
CN110543878A (en) * 2019-08-07 2019-12-06 华南理工大学 pointer instrument reading identification method based on neural network
CN111062282A (en) * 2019-12-05 2020-04-24 武汉科技大学 Transformer substation pointer type instrument identification method based on improved YOLOV3 model
CN111368825A (en) * 2020-02-25 2020-07-03 华南理工大学 Pointer positioning method based on semantic segmentation
CN111401358A (en) * 2020-02-25 2020-07-10 华南理工大学 Instrument dial plate correction method based on neural network
CN111639643A (en) * 2020-05-22 2020-09-08 深圳市赛为智能股份有限公司 Character recognition method, character recognition device, computer equipment and storage medium
CN111814919A (en) * 2020-08-31 2020-10-23 江西小马机器人有限公司 Instrument positioning and identifying system based on deep learning
CN112861867A (en) * 2021-02-01 2021-05-28 北京大学 Pointer type instrument panel identification method, system and storage medium
CN112801094A (en) * 2021-02-02 2021-05-14 中国长江三峡集团有限公司 Pointer instrument image inclination correction method

Also Published As

Publication number Publication date
CN113673509A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN110543878B (en) Pointer instrument reading identification method based on neural network
CN113673509B (en) Instrument detection classification method based on image text
CN111368825B (en) Pointer positioning method based on semantic segmentation
CN109766873B (en) Pedestrian re-identification method based on hybrid deformable convolution
CN113239930A (en) Method, system and device for identifying defects of cellophane and storage medium
CN110490915B (en) Point cloud registration method based on convolution-limited Boltzmann machine
Roy et al. Script identification from handwritten document
CN111369526B (en) Multi-type old bridge crack identification method based on semi-supervised deep learning
CN111401358B (en) Instrument dial correction method based on neural network
CN105335760A (en) Image number character recognition method
CN105184225A (en) Multinational paper money image identification method and apparatus
CN105654042B (en) The proving temperature character identifying method of glass-stem thermometer
CN116052110B (en) Intelligent positioning method and system for pavement marking defects
Li et al. An efficient method for DPM code localization based on depthwise separable convolution
CN113837166B (en) Automatic pointer instrument reading method based on deep learning
Tan et al. An application of an improved FCOS algorithm in detection and recognition of industrial instruments
CN113705731A (en) End-to-end image template matching method based on twin network
CN113570542A (en) Coal gangue detection method based on machine vision under shielding condition
Li et al. Research on reading recognition of pointer meter based on improved U-net network
CN113673508B (en) Pointer instrument image data synthesis method
CN117593514B (en) Image target detection method and system based on deep principal component analysis assistance
CN117392440B (en) Textile fabric retrieval method and system based on tissue structure and color classification
CN111860519B (en) Method and system for segmenting pipeline image of aircraft engine
Zhao et al. A Practical Unified Network for Localization and Recognition of Arbitrary-oriented Container code and type
CN117576051A (en) Particle defect detection method and device for low-voltage sheath

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant