CN110147788B - Feature enhancement CRNN-based metal plate strip product label character recognition method - Google Patents

Feature enhancement CRNN-based metal plate strip product label character recognition method Download PDF

Info

Publication number
CN110147788B
CN110147788B CN201910448218.6A CN201910448218A CN110147788B CN 110147788 B CN110147788 B CN 110147788B CN 201910448218 A CN201910448218 A CN 201910448218A CN 110147788 B CN110147788 B CN 110147788B
Authority
CN
China
Prior art keywords
convolution
neural network
feature
training
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910448218.6A
Other languages
Chinese (zh)
Other versions
CN110147788A (en
Inventor
刘士新
郭文瑞
陈大力
赖峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201910448218.6A priority Critical patent/CN110147788B/en
Publication of CN110147788A publication Critical patent/CN110147788A/en
Application granted granted Critical
Publication of CN110147788B publication Critical patent/CN110147788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a feature-enhanced CRNN-based metal plate strip product label character recognition method, which comprises the following steps of: preparing a picture database; preparing a recognition dictionary; preprocessing and expanding a training library; designing and establishing a feature-enhanced deep convolution cyclic neural network aiming at the label characters of metal plate strip products in the application of the steel industry; for the feature-enhanced deep convolution cyclic neural network, training for multiple times by adopting training example pictures in the training library; and identifying characters on the metal plate strip product label in the steel industry application based on the obtained output value of the last-stage neural network architecture in the trained deep convolution cyclic neural network model. According to the invention, through analysis of a large number of metal plate strip product labels shot in the steel industry field, more accurate feature learning is realized by the original character recognition network CRNN through feature enhancement from the practical aspect, and the recognition result in a real scene has very high reliability.

Description

Feature enhancement CRNN-based metal plate strip product label character recognition method
Technical Field
The invention relates to the technical field of image processing and deep learning, in particular to a feature-enhanced CRNN-based metal plate strip product label character recognition method.
Background
Compared with the general traditional application, the iron and steel industry application has serious influence on the industrial field environment, the character recognition is very sensitive to the influence of the external environment, and the effect of the character recognition accuracy is very difficult to achieve the application expectation when the conditions of complex background, artistic fonts, low resolution, non-uniform illumination, image degradation, character deformation, multi-language mixing, complex text formats and the like of the product label pictures shot in industrial scenes are met. The appearance of the character recognition technology based on deep learning enables character recognition in industrial application to have a new opportunity, but a character recognition method aiming at the specific background of metal plate strip product labels is not provided. Due to the serious quality problems of metal plate strip product labels, such as close letters, fuzzy characters, poor recognition degree between numbers 1, lower case English letters l and upper case English letters I and the like, the existing character recognition technology is difficult to realize prediction and distinction, so that the recognition precision is low and the reliability is lacked; therefore, under the condition that the existing method is not good in performance and is not popularized, a new technology is urgently needed to fill the blank.
Disclosure of Invention
According to the problems in the prior art, the invention discloses a feature-enhanced CRNN-based metal plate strip product label character recognition method. The technical means adopted by the invention are as follows:
a feature enhancement CRNN-based metal plate strip product label character recognition method comprises the following steps:
s1, preparing a picture database, wherein pictures in the picture database are derived from metal plate strip product label pictures shot in an industrial field;
cutting a region with characters of a shot label picture of the metal plate strip product to obtain a plurality of small pictures, wherein the character row direction in each small picture is the horizontal direction;
each small picture corresponds to a txt file with the same name and is used for storing character information in the small picture;
each small picture and the corresponding txt file are called as training data, and all the training data are obtained to form a database;
s2, preparing a recognition dictionary, traversing each character in each txt file in the database, adding the character into the original recognition dictionary to ensure that each character in each training data can be recognized, and obtaining the recognition dictionary after deduplication processing;
the original recognition dictionary is 1050 characters in size, and mainly relates to English letters, Chinese and English symbols and some common Chinese characters.
S3, preprocessing each small picture in the database to obtain training example pictures to form a training library;
after preprocessing, each training example picture is subjected to processing such as contrast conversion, brightness conversion, length stretching and the like to expand a training library;
the pretreatment comprises the following steps:
processing each small picture in the database into a single-channel gray-scale image;
the height of the single-channel gray-scale image is forcibly zoomed to 32 pixels, and the width is freely zoomed according to the zoom ratio of the height;
s4, designing and establishing a feature-enhanced deep convolution cyclic neural network (CRNN) aiming at the label characters of metal plate strip products in the application of the steel industry; the deep convolution cyclic neural network with the enhanced features comprises a multi-stage neural network architecture, wherein the multi-stage neural network architecture is provided with two special neural network architectures so as to realize multi-scale feature enhancement;
s5: aiming at the characteristic-enhanced deep convolution cyclic neural network, training for multiple times by adopting training example pictures in the training library, and adjusting parameters of the multi-stage neural network architecture according to a set learning rate during training so as to obtain a deep convolution cyclic neural network model for performing character recognition on metal plate strip product label characters in the application of the steel industry;
s6: and identifying characters on the metal plate strip product label in the steel industry application based on the output value of the final-stage neural network architecture in the trained deep convolution cyclic neural network model obtained in the step S5.
The multi-stage neural network architecture comprises 10 modules: the 1 st to 7 th modules are conventional convolution modules, maximum pooling (MaxPool) operations are added in the 1 st, 2 nd, 4 th and 6 th modules, respectively, and Batch Normalization (BN) operations are added in the 3 rd, 5 th and 7 th modules, respectively;
the two special neural network architectures are a 8 th module and a 9 th module, and the 8 th module and the 9 th module are local feature enhancement convolution modules (EFEM) which are respectively called an EFEM _ a module and an EFEM _ b module;
and the 10 th module is a result output layer and consists of a bidirectional cyclic neural network.
The EFEM _ a module consists of a deformable convolution layer, a Relu activation layer and a maximum pooling layer, and the convolution process under the EFEM _ a module for the feature transferred from the previous layer is as follows:
firstly, extracting the features of the features transmitted from the previous layer by a deformable convolution kernel with the size of 3 multiplied by 3, and then inputting the output values into 4 parallel branches and a residual error branch for relearning;
the first of the 4 parallel branches consists of convolutional layers with convolutional kernel sizes of 1 × 1 and 3 × 3; the second branch consists of convolution layers with convolution kernel sizes of 1 × 01 and 3 × 11 and an expansion convolution with an expansion rate of 3 and a convolution kernel size of 3 × 23; the third branch consists of convolution layers with convolution kernel sizes of 1 × 31 and 1 × 43 and an expansion convolution with an expansion rate of 3 and a convolution kernel size of 3 × 3; the fourth branch consists of convolution layers with convolution kernel sizes of 1 × 1, 1 × 3 and 3 × 1 and an expansion convolution with an expansion rate of 5 and a convolution kernel size of 3 × 3; splicing the outputs of the 4 parallel branches, inputting the spliced outputs into a convolution layer with convolution kernel size of 3 multiplied by 3 for feature thinning processing, outputting the spliced outputs to a convolution layer with the deformable convolution kernel size of 3 multiplied by 3 and the convolution layer with the convolution kernel size of 1 multiplied by 1, and outputting the result of x0
The residual branch uses convolution layer with convolution kernel size of 1 × 1, and its output result is x1(ii) a Finally, the output results x of the 4 parallel branches are processed0And the output result x of the residual branch1According to scale1The ratio of (a) to (b) is added to obtain x, and the x satisfies the following formula:
x=x0·scale1+x1
and x, performing feature extraction on the convolution layer with the convolution kernel size of 1 x 1, the Relu activation layer and the maximum pooling layer.
The EFEM _ b module consists of a convolution layer, a Relu activation layer and a maximum pooling layer, and the convolution process of the features transferred from the upper layer under the EFEM _ b module is as follows:
the characteristics transferred from the upper layer are used as the input of the EFEM _ b module and are sent into 3 parallel branches and a residual error branch for relearning;
among the 3 parallel branchesThe first branch of (a) consists of convolutional layers with convolutional kernel sizes of 1 × 1 and 3 × 3; the second branch consists of convolution layers with convolution kernel sizes of 1 × 1 and 3 × 3 and an expansion convolution with an expansion rate of 3 and a convolution kernel size of 3 × 3; the third branch consists of convolution layers with convolution kernel sizes of 1 × 1 and 2 convolution kernels with convolution kernel sizes of 3 × 3 and expansion convolution with expansion rate of 3 and convolution kernel sizes of 3 × 3; splicing the outputs of the 3 branches, inputting the spliced outputs into a convolution layer with the convolution kernel size of 1 multiplied by 1 for carrying out characteristic refinement processing, and outputting the result of x2
The residual branch uses convolution layer with convolution kernel size of 1 × 1, and its output result is x3(ii) a Finally, the output results x of the 3 parallel branches are processed2And the output result x of the residual branch3According to scale2The ratio of (a) to (b) is added to obtain x, and the x satisfies the following formula:
x=x2·scale2+x3and x is subjected to feature extraction through a Relu activation layer and a maximum pooling layer.
In step S5, in training the multi-stage neural network architecture of the feature-enhanced deep convolutional recurrent neural network, a Connection Time Classification (CTC) is used as a loss function, and an Adam algorithm is used as a learning algorithm.
In step S5, when training the multi-stage neural network architecture of the feature-enhanced deep convolutional circular neural network, the learning rate of each training is less than or equal to the learning rate of the previous training.
In step S5, when training the multi-stage neural network architecture of the feature-enhanced deep convolutional recurrent neural network, the weight of the feature-enhanced deep convolutional recurrent neural network is initialized by using the "Xavier" method, and simultaneously, the offset layer values of all the deformable convolutional layers are all initialized to 0.
In step S5, when training the multi-stage neural network architecture of the feature-enhanced deep convolutional recurrent neural network, the output of each convolutional layer in the feature-enhanced deep convolutional recurrent neural network is transmitted to the next layer of neurons after the ReLu activation function operation.
Each character in the recognition dictionary has equal probability when appearing in a training example photo library, and the probability is according to 9: a scale of 1 is assigned to a training set and a validation set of the feature-enhanced deep convolutional recurrent neural network.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, through analysis of a large number of metal plate strip product labels shot in a steel industry field, more accurate feature learning is realized through feature enhancement of an original character recognition network CRNN from the practical standpoint, the method has very excellent prediction capability on adjacent characters, fuzzy characters and characters with high similarity and extremely poor recognition, and the recognition result in a real scene has very high reliability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a feature-enhanced CRNN-based metal plate strip product label character recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic view of a label of an original metal strip product before cutting in a text area according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a cut thumbnail and text information in a txt file according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating pre-and post-processing comparison of small pictures in an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an EFEM _ a module according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an EFEM _ b module in accordance with an embodiment of the present invention;
FIG. 7 is a schematic diagram of a feature enhanced deep convolutional recurrent neural network architecture used in an embodiment of the present invention;
FIG. 8 is a diagram illustrating the loss and accuracy of the training process according to an embodiment of the present invention;
fig. 9 is a schematic diagram illustrating comparison of recognition effects of characters with low recognition degree according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a feature-enhanced CRNN-based metal plate strip product label character recognition method is characterized in that: preparing a large number of product label training pictures according to actual conditions in an industrial environment; establishing a feature enhancement CRNN neural network; training a neural network based on a large number of prepared pictures; performing character recognition by using the trained feature enhancement CRNN model;
the identification method comprises the following steps:
s1, preparing a picture database, wherein pictures in the picture database are derived from metal plate strip product label pictures shot in an industrial field;
cutting a region with characters of a shot label picture (shown in figure 2) of the metal plate strip product to obtain a plurality of small pictures, wherein the character row direction in each small picture is the horizontal direction;
each small picture corresponds to a txt file with the same name and is used for storing character information in the small picture;
each small picture and the corresponding txt file are called as training data (as shown in fig. 3), all the training data are obtained to form a database, and 17386 training data databases are finally obtained;
s2, preparing a recognition dictionary, traversing each character in each txt file in the database, adding the character into the original recognition dictionary, and obtaining the recognition dictionary after duplication removal processing;
the original recognition dictionary is 1050 characters in size and mainly relates to English letters, Chinese and English symbols and some common Chinese characters;
s3, preprocessing each small picture in the database to obtain training example pictures to form a training library;
after preprocessing, each training example picture is subjected to processing such as transformation contrast, brightness and length stretching to expand a training library, and the method comprises the following steps of 9: 1 to a training set and a validation set of the feature-enhanced deep convolutional recurrent neural network;
the pretreatment comprises the following steps:
processing each small picture in the database into a single-channel gray-scale image;
the high-force scaling of the single-channel grayscale map to 32 pixels, the width scaling freely according to the scaling of the height, as shown in fig. 4;
s4, designing and establishing a feature-enhanced deep convolution cyclic neural network (CRNN) aiming at the label characters of metal plate strip products in the application of the steel industry; the deep convolution cyclic neural network with the enhanced features comprises a multi-level neural network architecture, and in view of the problems that characters of a printed form in the application of the steel industry are close and the quality of a picture shot in an actual environment is high, a special convolution module is added in the general feature extraction operation for feature enhancement, namely two special neural network architectures are arranged in the multi-level neural network architecture, so that the multi-scale feature enhancement is realized;
two special neural network architectures (EFEM _ a and EFEM _ b) feature enhancements:
EFEM _ a module: the values input to the block are first passed through a convolution kernel of size 3Performing feature extraction on the x 3 deformable convolution kernel, and then inputting the output value into 4 parallel branches and one residual error branch for relearning; the first of the 4 parallel branches consists of convolutional layers with convolutional kernel sizes of 1 × 1 and 3 × 03; the second branch consists of convolution layers with convolution kernel sizes of 1 × 11 and 3 × 21 and an expansion convolution with an expansion rate of 3 and a convolution kernel size of 3 × 33; the third branch consists of convolution layers with convolution kernel sizes of 1 × 41 and 1 × 53 and an expansion convolution with an expansion rate of 3 and a convolution kernel size of 3 × 3; the fourth branch consists of convolution layers with convolution kernel sizes of 1 × 1, 1 × 3 and 3 × 1 and an expansion convolution with an expansion rate of 5 and a convolution kernel size of 3 × 3; splicing the outputs of the 4 parallel branches, inputting the spliced outputs into a convolution layer with convolution kernel size of 3 multiplied by 3 for feature thinning processing, and then outputting the output to a convolution layer with the deformable convolution kernel size of 3 multiplied by 3 and the convolution layer with the convolution kernel size of 1 multiplied by 1 to output a result of x0(ii) a The residual branch uses convolution layer with convolution kernel size of 1 × 1, and its output result is x1(ii) a Finally, the output results x of the 4 parallel branches are processed0And the output result x of the residual branch1According to scale1Adding the two in a ratio of 0.3 to obtain x (x ═ x)0·scale1+x1) And then passing through a convolution layer with convolution kernel size of 1 × 1, a Relu activation layer and a maximum pooling layer. FIG. 5 is a schematic structural diagram of the EFEM _ a module.
EFEM _ b module: the values input to the module will be fed into 3 parallel branches and one residual branch for relearning; a first branch of the 3 parallel branches consists of convolutional layers with convolutional kernel sizes of 1 × 1 and 3 × 3; the second branch consists of convolution layers with convolution kernel sizes of 1 × 1 and 3 × 3 and an expansion convolution with an expansion rate of 3 and a convolution kernel size of 3 × 3; the third branch consists of convolution layers with convolution kernel sizes of 1 × 1 and 2 convolution kernels with convolution kernel sizes of 3 × 3 and expansion convolution with expansion rate of 3 and convolution kernel sizes of 3 × 3; splicing the outputs of the 3 branches, inputting the spliced outputs into a convolution layer with the convolution kernel size of 1 multiplied by 1 for carrying out characteristic refinement processing, and outputting the result of x2(ii) a The residual branch uses a convolution layer of size 1 × 1, and the output result is x3(ii) a Finally, the output results x of the 3 parallel branches are processed2And the output result x of the residual branch3According to scale2Adding the two in a ratio of 0.3 to obtain x (x ═ x)2·scale2+x3) And performing feature extraction through a Relu activation layer and maximum pooling. FIG. 6 is a schematic structural diagram of the EFEM _ b module.
As shown in fig. 7, the network structure of the feature enhanced CRNN used is described as follows:
for an input gray picture with the fixed height and the size of 32 pixels, firstly, the input gray picture passes through a convolution kernel with the size of 3 multiplied by 3 and the output channel number is 64, then, the input gray picture enters a maximum pooling layer through Relu activation, the obtained result is divided into two branches, the first branch is sent to a first regional feature enhancement module EFEM _ a to obtain a feature map x4The second branch enters a convolution layer with convolution kernel size of 3 multiplied by 3 to obtain a feature map of 128 channels and activates the feature map to enter a maximum pooling layer, the obtained feature map is divided into two branches again for feature extraction, and the first branch is sent to a second regional feature enhancement module EFEM _ b to obtain a feature map x5The second branch enters a convolution layer with convolution kernel size of 3 multiplied by 3 to obtain a characteristic diagram of 256 channels, and after activation, Batch Normalization (BN) is carried out to obtain a characteristic diagram x6Then scaled by the scale3By adding features, i.e. x ═ x4·scale3+x6Then sending the added features into a convolution layer with a convolution kernel size of 3 multiplied by 3, outputting a channel number of 256, and then activating by Relu to enter a maximum pooling layer to obtain a feature map x7Then scaled by the scale4By adding features, i.e. x ═ x6·scale4+x7And sequentially passing the summed characteristic diagram obtained at this time through 3 convolution layers with convolution kernels of 512 channels of 3 multiplied by 3 to obtain a characteristic sequence, and then passing through a double-circulation neural network layer twice to realize the prediction of corresponding characters in each frame of the characteristic sequence.
S5: and aiming at the network structure of the feature enhancement CRNN, training the network structure of the feature enhancement CRNN for multiple times by adopting the training example pictures in the training library, wherein the training of the network structure of the feature enhancement CRNN is divided into three stages in the example, and the learning rate is set differently in each stage until the loss and the precision are kept unchanged. The following modes are adopted when training the neural network architectures of the features enhanced CRNN:
A. classifying CTCs as loss functions using coupling time;
B. using Adam algorithm as an optimization algorithm of the model;
C. the first stage adjusts the parameters of the multi-stage neural network architecture using a learning rate of 0.001, with the variance of the Adam optimizer set to [0.9,0.99 ]],scale1、scale2、scale3And scale4The loss of training to the network and the precision of the verification set are both set to 0.1 and remain unchanged; in the second stage, the parameters of the multi-stage neural network architecture are adjusted by adopting a learning rate of 0.0001, and the variance of the Adam optimizer is set to be 0.89 and 0.99],scale1And scale2Remain unchanged, scale3And scale4The training loss and the precision of the verification set are kept unchanged, the precision of the verification set is greatly improved, and the training loss is greatly reduced; the third stage adopts a learning rate of 0.00001 to carry out fine tuning on the multi-stage neural network architecture, and the variance setting range of Adam is [0.88,0.99 ]],scale1And scale2Increase to 0.3, scale3And scale4The number of iterations is increased to 0.6 to obtain the final model.
D. After the 3 rd, 5 th and 7 th convolution layers, the values output by the convolution are normalized by using Batch Normalization (BN), so that the risk of overfitting is reduced, and the speed of network training is increased;
E. the method is derived from an article of 'unwinding stability of routing feedback neural networks' in 2010, and the weight initialization method is favorable for accelerating convergence of network training, and simultaneously initializes all offset bias layer values of all deformable convolutions to 0;
through the training process, a deep convolution cyclic neural network model for carrying out character recognition on the metal plate strip product label characters in the application of the steel industry is obtained;
s6: and (4) identifying characters on the metal plate strip product label in the steel industry application based on the output value of the final-stage neural network architecture in the trained deep convolution cyclic neural network model obtained in the step (S5), as shown in fig. 8.
The scheme provides a data-driven identification method for metal plate strip product label characters based on feature enhancement CRNN, which adopts a database obtained by processing field-shot metal plate strip product label pictures as a final model of training set training through Xavier initialization weight, CTC loss function and Adam optimization algorithm, and can further improve the accuracy through training by acquiring more training data.
The method automatically updates the network parameters in each step of iteration through deep learning, achieves the purposes of autonomously learning the required characteristics from the training data and completing character recognition, can continuously improve the accuracy along with the increase of the training data amount, and greatly improves the recognition accuracy and reliability in industrial application scenes. In tests, the recognition accuracy of the model in 1500 pictures can reach about 95%, and the model is increased by about 10% compared with the previous recognition network, and particularly, the model has good support for characters which are close to each other, fuzzy characters and characters which are high in character similarity and low in recognition degree (as shown in fig. 9). The character recognition method based on deep learning can be used for quickly and accurately recognizing the label characters of the metal plate and strip products in the steel industry, and can be applied to the field of the steel industry and even a wider range.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A feature enhancement CRNN-based metal plate strip product label character recognition method is characterized by comprising the following steps:
s1, preparing a picture database, wherein pictures in the picture database are derived from metal plate strip product label pictures shot in an industrial field;
cutting a region with characters of a shot label picture of the metal plate strip product to obtain a plurality of small pictures, wherein the character row direction in each small picture is the horizontal direction;
each small picture corresponds to a txt file with the same name and is used for storing character information in the small picture;
each small picture and the corresponding txt file are called as training data, and all the training data are obtained to form a database;
s2, preparing a recognition dictionary, traversing each character in each txt file in the database, adding the character into the original recognition dictionary, and obtaining the recognition dictionary after duplication removal processing;
s3, preprocessing each small picture in the database to obtain training example pictures to form a training library;
after preprocessing, each training example picture is subjected to transformation contrast, brightness and length stretching processing to expand a training library;
the pretreatment comprises the following steps:
processing each small picture in the database into a single-channel gray-scale image;
the height of the single-channel gray-scale image is forcibly zoomed to 32 pixels, and the width is freely zoomed according to the zoom ratio of the height;
s4, designing and establishing a feature-enhanced deep convolution cyclic neural network aiming at the label characters of metal plate strip products in the application of the steel industry; the deep convolution cyclic neural network with the enhanced features comprises a multi-stage neural network architecture, wherein the multi-stage neural network architecture is provided with two special neural network architectures so as to realize multi-scale feature enhancement;
s5: aiming at the characteristic-enhanced deep convolution cyclic neural network, training for multiple times by adopting training example pictures in the training library, and adjusting parameters of the multi-stage neural network architecture according to a set learning rate during training so as to obtain a deep convolution cyclic neural network model for performing character recognition on metal plate strip product label characters in the application of the steel industry;
s6: recognizing characters on a metal plate strip product label in the steel industry application based on the output value of the final-stage neural network architecture in the trained deep convolution cyclic neural network model obtained in the step S5;
the multi-stage neural network architecture comprises 10 modules: the 1 st to 7 th modules are conventional convolution modules, maximum pooling operations are added to the 1 st, 2 nd, 4 th and 6 th modules respectively, and batch normalization operations are added to the 3 rd, 5 th and 7 th modules respectively;
the two special neural network architectures are a 8 th module and a 9 th module, the 8 th module and the 9 th module are convolution modules with enhanced regional characteristics, and are respectively called an EFEM _ a module and an EFEM _ b module;
and the 10 th module is a result output layer and consists of a bidirectional cyclic neural network.
2. The feature-enhanced CRNN-based metal strip product label text recognition method of claim 1,
the EFEM _ a module consists of a deformable convolution layer, a Relu activation layer and a maximum pooling layer, and the convolution process under the EFEM _ a module for the feature transferred from the previous layer is as follows:
firstly, extracting the features of the features transmitted from the previous layer by a deformable convolution kernel with the size of 3 multiplied by 3, and then inputting the output values into 4 parallel branches and a residual error branch for relearning;
the first of the 4 parallel branches consists of convolutional layers with convolutional kernel sizes of 1 × 1 and 3 × 3; the second branch consists of convolution layers with convolution kernel sizes of 1 × 1 and 3 × 1 and convolution kernel sizes of 3 × 3 and with an expansion ratio of 3Expanding convolution composition; the third branch consists of convolution layers with convolution kernel sizes of 1 × 1 and 1 × 3 and expansion convolution with an expansion rate of 3 and a convolution kernel size of 3 × 3; the fourth branch consists of convolution layers with convolution kernel sizes of 1 × 1, 1 × 3 and 3 × 1 and an expansion convolution with an expansion rate of 5 and a convolution kernel size of 3 × 3; splicing the outputs of the 4 parallel branches, inputting the spliced outputs into a convolution layer with convolution kernel size of 3 multiplied by 3 for feature thinning processing, outputting the spliced outputs to a convolution layer with the deformable convolution kernel size of 3 multiplied by 3 and the convolution layer with the convolution kernel size of 1 multiplied by 1, and outputting the result of x0
The residual branch uses convolution layer with convolution kernel size of 1 × 1, and its output result is x1(ii) a Finally, the output results x of the 4 parallel branches are processed0And the output result x of the residual branch1According to scale1The ratio of (a) to (b) is added to obtain x, and the x satisfies the following formula:
x=x0·scale1+x1
and x, performing feature extraction on the convolution layer with the convolution kernel size of 1 x 1, the Relu activation layer and the maximum pooling layer.
3. The feature-enhanced CRNN-based metal plate strip product label text recognition method of claim 2,
the EFEM _ b module consists of a convolution layer, a Relu activation layer and a maximum pooling layer, and the convolution process of the features transferred from the upper layer under the EFEM _ b module is as follows:
the characteristics transferred from the upper layer are used as the input of the EFEM _ b module and are sent into 3 parallel branches and a residual error branch for relearning;
a first branch of the 3 parallel branches consists of convolutional layers with convolutional kernel sizes of 1 × 1 and 3 × 3; the second branch consists of convolution layers with convolution kernel sizes of 1 × 1 and 3 × 3 and an expansion convolution with an expansion rate of 3 and a convolution kernel size of 3 × 3; the third branch consists of convolution layers with convolution kernel sizes of 1 × 1 and 2 convolution kernels with convolution kernel sizes of 3 × 3 and expansion convolution with expansion rate of 3 and convolution kernel sizes of 3 × 3; input 3 branchesAfter splicing, inputting the obtained result into a convolution layer with convolution kernel size of 1 multiplied by 1 for feature thinning processing, and outputting the result of x2
The residual branch uses convolution layer with convolution kernel size of 1 × 1, and its output result is x3(ii) a Finally, the output results x of the 3 parallel branches are processed2And the output result x of the residual branch3According to scale2The ratio of (a) to (b) is added to obtain x, and the x satisfies the following formula:
x=x2·scale2+x3and x is subjected to feature extraction through a Relu activation layer and a maximum pooling layer.
4. The feature-enhanced CRNN-based metal plate strip product label character recognition method of claim 3, wherein,
in the step S5, when training the multi-stage neural network architecture of the feature-enhanced deep convolutional recurrent neural network, the join time classification is used as a loss function, and the Adam algorithm is used as a learning algorithm.
5. The feature-enhanced CRNN-based metal plate strip product label character recognition method of claim 3, wherein,
in step S5, when training the multi-stage neural network architecture of the feature-enhanced deep convolutional circular neural network, the learning rate of each training is less than or equal to the learning rate of the previous training.
6. The feature-enhanced CRNN-based metal plate strip product label character recognition method of claim 3, wherein,
in step S5, when training the multi-stage neural network architecture of the feature-enhanced deep convolutional recurrent neural network, the weight of the feature-enhanced deep convolutional recurrent neural network is initialized by using the "Xavier" method, and simultaneously, the offset layer values of all the deformable convolutional layers are all initialized to 0.
7. The feature-enhanced CRNN-based metal plate strip product label character recognition method of claim 3, wherein,
in step S5, when training the multi-stage neural network architecture of the feature-enhanced deep convolutional recurrent neural network, the output of each convolutional layer in the feature-enhanced deep convolutional recurrent neural network is transmitted to the next layer of neurons after the ReLu activation function operation.
8. The feature-enhanced CRNN-based metal plate strip product label character recognition method of claim 3, wherein,
each character in the recognition dictionary has equal probability when appearing in a training example photo library, and the probability is according to 9: a scale of 1 is assigned to a training set and a validation set of the feature-enhanced deep convolutional recurrent neural network.
CN201910448218.6A 2019-05-27 2019-05-27 Feature enhancement CRNN-based metal plate strip product label character recognition method Active CN110147788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910448218.6A CN110147788B (en) 2019-05-27 2019-05-27 Feature enhancement CRNN-based metal plate strip product label character recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910448218.6A CN110147788B (en) 2019-05-27 2019-05-27 Feature enhancement CRNN-based metal plate strip product label character recognition method

Publications (2)

Publication Number Publication Date
CN110147788A CN110147788A (en) 2019-08-20
CN110147788B true CN110147788B (en) 2021-09-21

Family

ID=67593348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910448218.6A Active CN110147788B (en) 2019-05-27 2019-05-27 Feature enhancement CRNN-based metal plate strip product label character recognition method

Country Status (1)

Country Link
CN (1) CN110147788B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027562B (en) * 2019-12-06 2023-07-18 中电健康云科技有限公司 Optical character recognition method based on multiscale CNN and RNN combined with attention mechanism
CN111414906B (en) * 2020-03-05 2024-05-24 北京交通大学 Data synthesis and text recognition method for paper bill pictures
CN111414908B (en) * 2020-03-16 2023-08-29 湖南快乐阳光互动娱乐传媒有限公司 Method and device for recognizing caption characters in video
CN111652108B (en) * 2020-05-28 2020-12-29 中国人民解放军32802部队 Anti-interference signal identification method and device, computer equipment and storage medium
CN112464845B (en) * 2020-12-04 2022-09-16 山东产研鲲云人工智能研究院有限公司 Bill recognition method, equipment and computer storage medium
CN112744439A (en) * 2021-01-15 2021-05-04 湖南镭目科技有限公司 Remote scrap steel monitoring system based on deep learning technology
TWI786946B (en) * 2021-11-15 2022-12-11 國立雲林科技大學 Method for detection and recognition of characters on the surface of metal
CN115661828B (en) * 2022-12-08 2023-10-20 中化现代农业有限公司 Character direction recognition method based on dynamic hierarchical nested residual error network

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739829B (en) * 2009-12-03 2014-04-23 北京中星微电子有限公司 Video-based vehicle overspeed monitoring method and system
CN104952448A (en) * 2015-05-04 2015-09-30 张爱英 Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks
CN104966517B (en) * 2015-06-02 2019-02-01 华为技术有限公司 A kind of audio signal Enhancement Method and device
CN105005764B (en) * 2015-06-29 2018-02-13 东南大学 The multi-direction Method for text detection of natural scene
CN105956469B (en) * 2016-04-27 2019-04-26 百度在线网络技术(北京)有限公司 File security recognition methods and device
WO2018067603A1 (en) * 2016-10-04 2018-04-12 Magic Leap, Inc. Efficient data layouts for convolutional neural networks
CN106709532B (en) * 2017-01-25 2020-03-10 京东方科技集团股份有限公司 Image processing method and device
US10163022B1 (en) * 2017-06-22 2018-12-25 StradVision, Inc. Method for learning text recognition, method for recognizing text using the same, and apparatus for learning text recognition, apparatus for recognizing text using the same
CN109214250A (en) * 2017-07-05 2019-01-15 中南大学 A kind of static gesture identification method based on multiple dimensioned convolutional neural networks
CN107292291B (en) * 2017-07-19 2020-04-03 北京智芯原动科技有限公司 Vehicle identification method and system
CN107886967B (en) * 2017-11-18 2018-11-13 中国人民解放军陆军工程大学 A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network
CN108510502B (en) * 2018-03-08 2020-09-22 华南理工大学 Melanoma image tissue segmentation method and system based on deep neural network
CN108648748B (en) * 2018-03-30 2021-07-13 沈阳工业大学 Acoustic event detection method under hospital noise environment
CN108549893B (en) * 2018-04-04 2020-03-31 华中科技大学 End-to-end identification method for scene text with any shape
CN108664996B (en) * 2018-04-19 2020-12-22 厦门大学 Ancient character recognition method and system based on deep learning
CN109086700B (en) * 2018-07-20 2021-08-13 杭州电子科技大学 Radar one-dimensional range profile target identification method based on deep convolutional neural network
CN109413411B (en) * 2018-09-06 2020-08-11 腾讯数码(天津)有限公司 Black screen identification method and device of monitoring line and server
CN109165697B (en) * 2018-10-12 2021-11-30 福州大学 Natural scene character detection method based on attention mechanism convolutional neural network
CN109460761A (en) * 2018-10-17 2019-03-12 福州大学 Bank card number detection and recognition methods based on dimension cluster and multi-scale prediction
CN109389091B (en) * 2018-10-22 2022-05-03 重庆邮电大学 Character recognition system and method based on combination of neural network and attention mechanism
CN109508655B (en) * 2018-10-28 2023-04-25 北京化工大学 SAR target recognition method based on incomplete training set of twin network

Also Published As

Publication number Publication date
CN110147788A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN110147788B (en) Feature enhancement CRNN-based metal plate strip product label character recognition method
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN109948714B (en) Chinese scene text line identification method based on residual convolution and recurrent neural network
CN112070768B (en) Anchor-Free based real-time instance segmentation method
CN109740679B (en) Target identification method based on convolutional neural network and naive Bayes
CN110674777A (en) Optical character recognition method in patent text scene
CN111860683B (en) Target detection method based on feature fusion
Cayamcela et al. Fine-tuning a pre-trained convolutional neural network model to translate American sign language in real-time
CN107564007B (en) Scene segmentation correction method and system fusing global information
CN113052775B (en) Image shadow removing method and device
CN111666937A (en) Method and system for recognizing text in image
CN114898472A (en) Signature identification method and system based on twin vision Transformer network
CN115205521A (en) Kitchen waste detection method based on neural network
CN110991515B (en) Image description method fusing visual context
Sethy et al. Off-line Odia handwritten numeral recognition using neural network: a comparative analysis
CN110826534B (en) Face key point detection method and system based on local principal component analysis
CN112818949A (en) Method and system for identifying delivery certificate characters
CN110503090B (en) Character detection network training method based on limited attention model, character detection method and character detector
CN112364883A (en) American license plate recognition method based on single-stage target detection and deptext recognition network
KR20200068073A (en) Improvement of Character Recognition for Parts Book Using Pre-processing of Deep Learning
CN113159071B (en) Cross-modal image-text association anomaly detection method
Goud et al. Text localization and recognition from natural scene images using ai
US11341758B1 (en) Image processing method and system
Ashiquzzaman et al. Applying data augmentation to handwritten arabic numeral recognition using deep learning neural networks
Chen et al. Design and Implementation of Second-generation ID Card Number Identification Model based on TensorFlow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant