CN111027562A - Optical character recognition method based on multi-scale CNN and RNN combined with attention mechanism - Google Patents

Optical character recognition method based on multi-scale CNN and RNN combined with attention mechanism Download PDF

Info

Publication number
CN111027562A
CN111027562A CN201911241447.7A CN201911241447A CN111027562A CN 111027562 A CN111027562 A CN 111027562A CN 201911241447 A CN201911241447 A CN 201911241447A CN 111027562 A CN111027562 A CN 111027562A
Authority
CN
China
Prior art keywords
layer
output end
convolution
convolutional
adder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911241447.7A
Other languages
Chinese (zh)
Other versions
CN111027562B (en
Inventor
李得元
代超
何帆
周振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Power Health Cloud Technology Co ltd
Original Assignee
China Power Health Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Power Health Cloud Technology Co ltd filed Critical China Power Health Cloud Technology Co ltd
Priority to CN201911241447.7A priority Critical patent/CN111027562B/en
Publication of CN111027562A publication Critical patent/CN111027562A/en
Application granted granted Critical
Publication of CN111027562B publication Critical patent/CN111027562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an optical character recognition method based on a multi-scale CNN and an RNN combined with an attention mechanism, which relates to the technical field of image optical character recognition, and comprises the steps of obtaining a plurality of pictures containing characters to construct a data set, preprocessing the pictures in the data set, and obtaining image data and a vector label; inputting image data and a vector label into a preset network model, and extracting characteristics through a convolution module, a recurrent neural network and an attention mechanism module in the network model in sequence to obtain a characteristic matrix; inputting the characteristic matrix into a CTC module in the network model for decoding, calculating a CTC loss function, performing optimization adjustment on the parameters of the preset network model through back propagation of the loss function until the network model is converged, and outputting a trained network model; the method and the device have the advantages of accurate recognition result and good recognition effect.

Description

Optical character recognition method based on multi-scale CNN and RNN combined with attention mechanism
Technical Field
The invention relates to the technical field of image optical character recognition, in particular to an optical character recognition method based on a multi-scale CNN and an RNN combined with an attention mechanism.
Background
Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into computer text by a Character Recognition method, i.e.: the method is a technology for converting characters in a paper document into an image file of a black-and-white dot matrix in an optical mode aiming at print characters, and converting the characters in the image into a text format through recognition software for further editing and processing by word processing software. Image optical character recognition is generally that image information is acquired by a scanner or a digital camera or the like and stored in an image file, and then OCR software reads and analyzes the image file and extracts a character string therein through character recognition.
In previous OCR tasks, the recognition process was divided into two steps: single word cutting and sorting tasks. Generally, a text file of a series of characters is first cut into a single font by using a projection method, and then sent to a CNN for character classification. However, this method is somewhat obsolete, and now, end-to-end character recognition based on deep learning is more popular, that is, we do not need to explicitly add a link of character segmentation, but convert character recognition into a sequence learning problem, although input images have different scales and different text lengths, after the input images are translated by CNN and RNN, the whole text image can be recognized after the input images are output, that is, the segmentation of characters is also merged into deep learning.
At present, end-to-end OCR based on deep learning has two main technologies: the two major methods are mainly different in a final output layer (translation layer), namely how to convert sequence feature information learned by a network into a final recognition result, and both the two technologies adopt a network structure of CNN + RNN in a feature learning stage, wherein CNN is a convolutional neural network and RNN is a cyclic neural network, but the CRNN OCR adopts a CTC algorithm during alignment or decoding, and the attention OCR adopts an attention mechanism.
Although both the two technologies achieve good recognition effect on OCR at present, aiming at the characteristic learning stage of CRNN OCR, in the CNN stage, because only the convolutional neural network is adopted for information extraction, the situation of incomplete extracted information is easy to occur, and the recognition result is wrong; in the RNN stage, the sequence features are extracted only by the recurrent neural network, which cannot ensure that the sequence features are completely extracted, resulting in poor recognition effect.
Disclosure of Invention
The invention aims to: in order to solve the problems that incomplete information extraction and incomplete sequence feature extraction are easy to occur in the feature learning stage of the conventional CRNN OCR, so that a recognition result is wrong, and the recognition effect is poor, the invention provides an optical character recognition method based on a multi-scale CNN and an RNN combined with an attention mechanism.
The invention specifically adopts the following technical scheme for realizing the purpose:
the optical character recognition method based on the multi-scale CNN and the RNN combined with the attention mechanism comprises the following steps:
s1: acquiring a plurality of picture construction data sets containing characters, and preprocessing pictures in the data sets to obtain image data and vector labels;
s2: inputting image data and a vector label into a preset network model, and extracting characteristics through a convolution module, a recurrent neural network and an attention mechanism module in the network model in sequence to obtain a characteristic matrix;
s3: inputting the characteristic matrix into a CTC module in a network model for decoding, calculating a CTCloss loss function, performing optimization adjustment on the parameters of the preset network model through back propagation of the loss function until the network model converges, and outputting a trained network model;
s4: and carrying out optical character recognition on the picture to be recognized by utilizing the trained network model to obtain a final recognition result.
Further, in S1, preprocessing the picture in the data set to obtain image data, specifically: the picture is read into RGB format, then the picture is scaled to (32,256,3), and the picture pixel value is normalized to obtain the image data.
Further, in S1, preprocessing the picture in the data set to obtain a vector tag, specifically: and transcoding the characters in the picture into 2-valued vectors according to the dictionary to obtain vector labels.
Further, the specific structure of the convolution module is as follows:
the input end of the first convolution layer is connected with a first convolution layer, the first convolution layer comprises 64 convolution kernels, the size of each convolution kernel is 3 x 3, the step length is 2, the activation function is Relu, and the output end of the first convolution layer is connected with the input end of a second convolution layer;
the second convolutional layer comprises 128 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is 2, the activation function is Relu, and the output end of the second convolutional layer is connected with the input end of the third convolutional layer;
the third convolution layer comprises four branches, and the output ends of the four branches are connected and then connected with the input end of the fourth convolution layer;
the fourth convolutional layer comprises 256 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is (2,1), the activation function is Relu, and the output end of the fourth convolutional layer is connected with the input end of the fifth convolutional layer;
the fifth convolutional layer comprises 512 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is (2,1), the activation function is Relu, and the output end of the fifth convolutional layer is connected with the input end of the sixth convolutional layer;
the sixth convolutional layer comprises 512 convolutional kernels, the size of each convolutional kernel is 2 x 1, the step length is 1, the activation function is Relu, and the output end of the sixth convolutional layer is connected with the input end of the seventh convolutional layer;
and the seventh convolution layer comprises a Squeeze module, and the Squeeze module performs Squeeze operation on the input characteristics to remove the first dimension so as to obtain the output characteristics of the convolution module.
Further, the four branches of the third convolutional layer are respectively:
the first branch is a convolution branch and comprises 128 convolution kernels, each convolution kernel is 1 x 1 in size, and the activation function is Relu;
the second branch is a depth separable convolution branch comprising 128 convolution kernels, each convolution kernel having a size of 3 x 3, a step size of 1, an expansion rate of 1, and an activation function of Relu;
the third branch is a depth separable convolution branch and comprises 128 convolution kernels, the size of each convolution kernel is 3 x 3, the step size is 1, the expansion rate is 3, and the activation function is Relu;
the fourth branch is a depth separable convolution branch comprising 128 convolution kernels, each convolution kernel having a size of 3 x 3, a step size of 1, an expansion rate of 5, and an activation function of Relu.
Further, the specific structure of the recurrent neural network and the attention mechanism module is as follows:
the first Layer comprises two branches, the first branch comprises a Position Embedding Layer, the output end of the Position Embedding Layer is respectively connected with an adder A and a first Multi-head attachment Layer, the output end of the first Multi-head attachment Layer is connected with the adder A, the output end of the adder A is connected with the first Layer Normalization Layer, the output end of the first Layer Normalization Layer is respectively connected with the adder B and the first Position-wise Feed-Forward Layer, the output end of the first Position-wise Feed-Forward Layer is connected with the adder B, the output end of the adder B is connected with the second Layer Normalization Layer, and the output end of the second Layer Normalization Layer is connected with the adder C;
the second branch comprises a first bidirectional LSTM layer, and the output end of the first bidirectional LSTM layer is connected with an adder C;
the second Layer comprises two branches, the first branch comprises a second Multi-head attachment Layer connected with the output end of the second Layer Normalization Layer, the output end of the second Multi-head attachment Layer and the output end of the second Layer Normalization Layer are connected with an adder D, the output end of the adder D is connected with a third Layer Normalization Layer, the output end of the third Layer Normalization Layer is respectively connected with a fourth Position-wise Feed-Forward Layer and an adder E, the output end of the fourth Position-wise Feed-Forward Layer is connected with the adder E, the output end of the adder E is connected with the fourth Layer Normalization Layer, and the output end of the fourth Layer Normalization Layer is connected with the adder F;
the second branch comprises a second bidirectional LSTM layer, the output end of the adder C is connected with the second bidirectional LSTM layer, and the output end of the second bidirectional LSTM layer is connected with the adder F;
the third Layer comprises a fifth Layer, the output end of the adder F is connected with the fifth Layer, the output end of the fifth Layer is connected with the third bidirectional LSTM Layer, the output end of the third bidirectional LSTM Layer is connected with the full-connection Layer, the neuron number of the full-connection Layer is the number of characters plus 1, and finally the characteristic matrix is output.
Further, in S3, the Adam gradient descent algorithm is used to calculate the CTC loss function.
Further, the S4 specifically includes:
s4.1: reading the picture to be recognized into an RGB format, zooming the size of the picture to be recognized to (32,256,3), and then normalizing the pixel value of the picture to be recognized to obtain the image data to be recognized;
s4.2: inputting image data to be recognized into a trained network model, and extracting features through a convolution module, a cyclic neural network and an attention mechanism module in the trained network model to obtain a feature matrix to be recognized;
s4.3: decoding the characteristic matrix to be recognized by utilizing a CTC module in the trained network model to obtain a decoding result;
s4.4: and comparing the decoding result with the dictionary to obtain a final recognition result.
The invention has the following beneficial effects:
1. in the characteristic learning stage, the method adds multi-scale convolution to the CNN stage, can obtain information with wider visual field, adopts the combination of the RNN and the attention mechanism to jointly extract sequence characteristics for the RNN stage, and can ensure that the sequence characteristics are fully extracted, so that the identification result is more accurate, and the identification effect is better.
Drawings
FIG. 1 is a schematic process flow diagram of an embodiment of the present invention.
Fig. 2 is a schematic diagram of a network model according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a convolution module according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of a recurrent neural network and attention mechanism module configuration in accordance with an embodiment of the present invention.
Fig. 5 is a schematic diagram of a picture to be recognized according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present invention by those skilled in the art, the present invention will be described in further detail below with reference to the accompanying drawings and the following examples.
Example 1
As shown in fig. 1, the present embodiment provides an optical character recognition method based on a multi-scale CNN and an RNN with attention mechanism, including:
s1: acquiring a plurality of picture construction data sets containing characters, and preprocessing pictures in the data sets to obtain image data and vector labels;
the image data is obtained by preprocessing the pictures in the data set, and specifically comprises the following steps: reading the picture into an RGB format, then scaling the picture to (32,256,3), and carrying out normalization processing on picture pixel values to obtain image data;
preprocessing the pictures in the data set to obtain a vector label, which specifically comprises the following steps: transcoding characters in the picture into 2-valued vectors according to a dictionary to obtain vector labels;
s2: inputting image data and a vector label into a preset network model shown in FIG. 2, and extracting features sequentially through a convolution module, a recurrent neural network and an attention mechanism module in the network model to obtain a feature matrix;
as shown in fig. 3, the specific structure of the convolution module is as follows:
the input characteristics of the convolution module are image data and vector labels, the image data and the vector labels are input into the convolution module from input ends, the input ends are connected with a first convolution layer, the first convolution layer comprises 64 convolution kernels, the size of each convolution kernel is 3 x 3, the step length is 2, the activation function is Relu, and the output end of the first convolution layer is connected with the input end of a second convolution layer;
the second convolutional layer comprises 128 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is 2, the activation function is Relu, and the output end of the second convolutional layer is connected with the input end of the third convolutional layer;
the third convolution layer comprises four branches, and the output ends of the four branches are connected and then connected with the input end of the fourth convolution layer;
the first branch is a convolution branch and comprises 128 convolution kernels, the size of each convolution kernel is 1 x 1, and the activation function is Relu;
the second branch is a depth separable convolution branch comprising 128 convolution kernels, each convolution kernel having a size of 3 x 3, a step size of 1, an expansion rate of 1, and an activation function of Relu;
the third branch is a depth separable convolution branch and comprises 128 convolution kernels, the size of each convolution kernel is 3 x 3, the step size is 1, the expansion rate is 3, and the activation function is Relu;
the fourth branch is a depth separable convolution branch and comprises 128 convolution kernels, the size of each convolution kernel is 3 x 3, the step size is 1, the expansion rate is 5, and the activation function is Relu;
after the output characteristics of the four branches are spliced, inputting the output characteristics into a fourth convolution layer;
the fourth convolutional layer comprises 256 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is (2,1), the activation function is Relu, and the output end of the fourth convolutional layer is connected with the input end of the fifth convolutional layer;
the fifth convolutional layer comprises 512 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is (2,1), the activation function is Relu, and the output end of the fifth convolutional layer is connected with the input end of the sixth convolutional layer;
the sixth convolutional layer comprises 512 convolutional kernels, the size of each convolutional kernel is 2 x 1, the step length is 1, the activation function is Relu, and the output end of the sixth convolutional layer is connected with the input end of the seventh convolutional layer;
the seventh convolution layer comprises a Squeeze module, the Squeeze module performs a Squeeze operation on the input features, the first dimension is removed, the size of an output feature matrix is changed from (1,32,512) to (32,512), and the output features of the convolution module are obtained;
as shown in fig. 4, the specific structure of the recurrent neural network and the attention mechanism module is as follows:
the output characteristics of the convolution module are used as the input characteristics of the cyclic neural network and the Attention mechanism module and input into the cyclic neural network and the Attention mechanism module, the first Layer comprises two branches, the first branch comprises a Position Embedding Layer, the output end of the Position Embedding Layer is respectively connected with an adder A and a first Multi-head attachment Layer, the output end of the first Multi-head attachment Layer is connected with the adder A, the output end of the adder A is connected with a first Layer Normalization Layer, the output end of the first Layer Normalization Layer is respectively connected with an adder B and a first Position-wise Feed-Forward Layer, the output end of the first Position-wise Feed-Forward Layer is connected with the adder B, the output end of the adder B is connected with a second Layer Normalization Layer, and the output end of the second Layer Normalization Layer is connected with an adder C;
the second branch comprises a first bidirectional LSTM layer, and the output end of the first bidirectional LSTM layer is connected with an adder C;
the second Layer comprises two branches, the first branch comprises a second Multi-head attachment Layer connected with the output end of the second Layer Normalization Layer, the output end of the second Multi-head attachment Layer and the output end of the second Layer Normalization Layer are connected with an adder D, the output end of the adder D is connected with a third Layer Normalization Layer, the output end of the third Layer Normalization Layer is respectively connected with a fourth Position-wise Feed-Forward Layer and an adder E, the output end of the fourth Position-wise Feed-Forward Layer is connected with the adder E, the output end of the adder E is connected with the fourth Layer Normalization Layer, and the output end of the fourth Layer Normalization Layer is connected with the adder F;
the second branch comprises a second bidirectional LSTM layer, the output end of the adder C is connected with the second bidirectional LSTM layer, and the output end of the second bidirectional LSTM layer is connected with the adder F;
the third Layer comprises a fifth Layer, the output end of the adder F is connected with the fifth Layer, the output end of the fifth Layer is connected with the third bidirectional LSTM Layer, the output end of the third bidirectional LSTM Layer is connected with the full connection Layer Dense, the neuron number of the full connection Layer is the number of characters plus 1, and finally a characteristic matrix is output;
s3: inputting the characteristic matrix into a CTC module in a network model for decoding, wherein the CTC module is a CTC decoder in the embodiment, calculating a CTC loss function by using an Adam gradient descent algorithm, performing optimization adjustment on the parameters of the preset network model through back propagation of the loss function until the network model converges, and outputting the trained network model;
s4: carrying out optical character recognition on the picture to be recognized shown in the figure 5 by utilizing the trained network model to obtain a final recognition result, which specifically comprises the following steps:
s4.1: reading the picture to be recognized into an RGB format, zooming the size of the picture to be recognized to (32,256,3), and then normalizing the pixel value of the picture to be recognized to obtain the image data to be recognized;
s4.2: inputting image data to be recognized into a trained network model, and extracting features through a convolution module, a cyclic neural network and an attention mechanism module in the trained network model to obtain a feature matrix to be recognized;
s4.3: decoding the characteristic matrix to be recognized by utilizing a CTC module in the trained network model to obtain a decoding result;
s4.4: and comparing the decoding result with the dictionary to obtain a final recognition result: [ 'healthy', 'body', 'check', 'knot', 'fruit' ].
The above description is only a preferred embodiment of the present invention, and not intended to limit the present invention, the scope of the present invention is defined by the appended claims, and all structural changes that can be made by using the contents of the description and the drawings of the present invention are intended to be embraced therein.

Claims (8)

1. The method for recognizing the optical characters based on the multi-scale CNN and the RNN combined with the attention mechanism is characterized by comprising the following steps:
s1: acquiring a plurality of picture construction data sets containing characters, and preprocessing pictures in the data sets to obtain image data and vector labels;
s2: inputting image data and a vector label into a preset network model, and extracting characteristics through a convolution module, a recurrent neural network and an attention mechanism module in the network model in sequence to obtain a characteristic matrix;
s3: inputting the characteristic matrix into a CTC module in the network model for decoding, calculating a CTC loss function, performing optimization adjustment on the parameters of the preset network model through back propagation of the loss function until the network model is converged, and outputting a trained network model;
s4: and carrying out optical character recognition on the picture to be recognized by utilizing the trained network model to obtain a final recognition result.
2. The method for recognizing optical characters according to claim 1, wherein in S1, the image in the data set is preprocessed to obtain image data, specifically: the picture is read into RGB format, then the picture is scaled to (32,256,3), and the picture pixel value is normalized to obtain the image data.
3. The method for recognizing optical characters according to claim 1, wherein in S1, the pictures in the data set are preprocessed to obtain vector labels, specifically: and transcoding the characters in the picture into 2-valued vectors according to the dictionary to obtain vector labels.
4. The method for recognizing optical characters according to claim 1, wherein the convolution module has a specific structure:
the input end of the first convolution layer is connected with a first convolution layer, the first convolution layer comprises 64 convolution kernels, the size of each convolution kernel is 3 x 3, the step length is 2, the activation function is Relu, and the output end of the first convolution layer is connected with the input end of a second convolution layer;
the second convolutional layer comprises 128 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is 2, the activation function is Relu, and the output end of the second convolutional layer is connected with the input end of the third convolutional layer;
the third convolution layer comprises four branches, and the output ends of the four branches are connected and then connected with the input end of the fourth convolution layer;
the fourth convolutional layer comprises 256 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is (2,1), the activation function is Relu, and the output end of the fourth convolutional layer is connected with the input end of the fifth convolutional layer;
the fifth convolutional layer comprises 512 convolutional kernels, the size of each convolutional kernel is 3 x 3, the step length is (2,1), the activation function is Relu, and the output end of the fifth convolutional layer is connected with the input end of the sixth convolutional layer;
the sixth convolutional layer comprises 512 convolutional kernels, the size of each convolutional kernel is 2 x 1, the step length is 1, the activation function is Relu, and the output end of the sixth convolutional layer is connected with the input end of the seventh convolutional layer;
and the seventh convolution layer comprises a Squeeze module, and the Squeeze module performs Squeeze operation on the input characteristics to remove the first dimension so as to obtain the output characteristics of the convolution module.
5. The optical character recognition method of claim 4, wherein the four branches of the third convolutional layer are respectively:
the first branch is a convolution branch and comprises 128 convolution kernels, each convolution kernel is 1 x 1 in size, and the activation function is Relu;
the second branch is a depth separable convolution branch comprising 128 convolution kernels, each convolution kernel having a size of 3 x 3, a step size of 1, an expansion rate of 1, and an activation function of Relu;
the third branch is a depth separable convolution branch and comprises 128 convolution kernels, the size of each convolution kernel is 3 x 3, the step size is 1, the expansion rate is 3, and the activation function is Relu;
the fourth branch is a depth separable convolution branch comprising 128 convolution kernels, each convolution kernel having a size of 3 x 3, a step size of 1, an expansion rate of 5, and an activation function of Relu.
6. The optical character recognition method of claim 1, wherein the recurrent neural network and attention mechanism module are specifically configured as follows:
the first Layer comprises two branches, the first branch comprises a Position Embedding Layer, the output end of the Position Embedding Layer is respectively connected with an adder A and a first Multi-head attachment Layer, the output end of the first Multi-head attachment Layer is connected with the adder A, the output end of the adder A is connected with the first Layer Normalization Layer, the output end of the first Layer Normalization Layer is respectively connected with the adder B and the first Position-wise Feed-Forward Layer, the output end of the first Position-wise Feed-Forward Layer is connected with the adder B, the output end of the adder B is connected with the second Layer Normalization Layer, and the output end of the second Layer Normalization Layer is connected with the adder C;
the second branch comprises a first bidirectional LSTM layer, and the output end of the first bidirectional LSTM layer is connected with an adder C;
the second Layer comprises two branches, the first branch comprises a second Multi-head attachment Layer connected with the output end of the second Layer Normalization Layer, the output end of the second Multi-head attachment Layer and the output end of the second Layer Normalization Layer are connected with an adder D, the output end of the adder D is connected with a third Layer Normalization Layer, the output end of the third Layer Normalization Layer is respectively connected with a fourth Position-wise Feed-Forward Layer and an adder E, the output end of the fourth Position-wise Feed-Forward Layer is connected with the adder E, the output end of the adder E is connected with the fourth Layer Normalization Layer, and the output end of the fourth Layer Normalization Layer is connected with the adder F;
the second branch comprises a second bidirectional LSTM layer, the output end of the adder C is connected with the second bidirectional LSTM layer, and the output end of the second bidirectional LSTM layer is connected with the adder F;
the third Layer comprises a fifth Layer, the output end of the adder F is connected with the fifth Layer, the output end of the fifth Layer is connected with the third bidirectional LSTM Layer, the output end of the third bidirectional LSTM Layer is connected with the full-connection Layer, the neuron number of the full-connection Layer is the number of characters plus 1, and finally the characteristic matrix is output.
7. The method for optical character recognition according to claim 1, wherein in S3, a CTC loss function is calculated using Adam gradient descent algorithm.
8. The method for optical character recognition according to any one of claims 1-7, wherein S4 is specifically:
s4.1: reading the picture to be recognized into an RGB format, zooming the size of the picture to be recognized to (32,256,3), and then normalizing the pixel value of the picture to be recognized to obtain the image data to be recognized;
s4.2: inputting image data to be recognized into a trained network model, and extracting features through a convolution module, a cyclic neural network and an attention mechanism module in the trained network model to obtain a feature matrix to be recognized;
s4.3: decoding the characteristic matrix to be recognized by utilizing a CTC module in the trained network model to obtain a decoding result;
s4.4: and comparing the decoding result with the dictionary to obtain a final recognition result.
CN201911241447.7A 2019-12-06 2019-12-06 Optical character recognition method based on multiscale CNN and RNN combined with attention mechanism Active CN111027562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911241447.7A CN111027562B (en) 2019-12-06 2019-12-06 Optical character recognition method based on multiscale CNN and RNN combined with attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911241447.7A CN111027562B (en) 2019-12-06 2019-12-06 Optical character recognition method based on multiscale CNN and RNN combined with attention mechanism

Publications (2)

Publication Number Publication Date
CN111027562A true CN111027562A (en) 2020-04-17
CN111027562B CN111027562B (en) 2023-07-18

Family

ID=70204520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911241447.7A Active CN111027562B (en) 2019-12-06 2019-12-06 Optical character recognition method based on multiscale CNN and RNN combined with attention mechanism

Country Status (1)

Country Link
CN (1) CN111027562B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898606A (en) * 2020-05-19 2020-11-06 武汉东智科技股份有限公司 Night imaging identification method for superimposing transparent time characters in video image
CN112052889A (en) * 2020-08-28 2020-12-08 西安电子科技大学 Laryngoscope image identification method based on double-gating recursive unit decoding
CN112183486A (en) * 2020-11-02 2021-01-05 中山大学 Method for rapidly identifying single-molecule nanopore sequencing base based on deep network
CN112508023A (en) * 2020-10-27 2021-03-16 重庆大学 Deep learning-based end-to-end identification method for code-spraying characters of parts
CN112836748A (en) * 2021-02-02 2021-05-25 太原科技大学 Casting identification character recognition method based on CRNN-CTC
CN112990181A (en) * 2021-04-30 2021-06-18 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and storage medium
CN113516124A (en) * 2021-05-29 2021-10-19 大连民族大学 Electric energy meter electricity consumption information identification algorithm based on computer vision technology
CN113537339A (en) * 2021-07-14 2021-10-22 中国地质大学(北京) Method and system for identifying symbiotic or associated minerals based on multi-label image classification
CN114724168A (en) * 2022-05-10 2022-07-08 北京百度网讯科技有限公司 Training method of deep learning model, text recognition method, text recognition device and text recognition equipment
CN116072274A (en) * 2023-03-06 2023-05-05 四川互慧软件有限公司 Automatic dispatch system for medical care of ambulance
CN116758544A (en) * 2023-08-17 2023-09-15 泓浒(苏州)半导体科技有限公司 Wafer code recognition system based on image processing

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368831A (en) * 2017-07-19 2017-11-21 中国人民解放军国防科学技术大学 English words and digit recognition method in a kind of natural scene image
CN108875722A (en) * 2017-12-27 2018-11-23 北京旷视科技有限公司 Character recognition and identification model training method, device and system and storage medium
CN109543681A (en) * 2018-11-20 2019-03-29 中国石油大学(华东) Character recognition method under a kind of natural scene based on attention mechanism
CN109753954A (en) * 2018-11-14 2019-05-14 安徽艾睿思智能科技有限公司 The real-time positioning identifying method of text based on deep learning attention mechanism
CN109992783A (en) * 2019-04-03 2019-07-09 同济大学 Chinese term vector modeling method
US20190251431A1 (en) * 2018-02-09 2019-08-15 Salesforce.Com, Inc. Multitask Learning As Question Answering
CN110147788A (en) * 2019-05-27 2019-08-20 东北大学 A kind of metal plate and belt Product labelling character recognition method based on feature enhancing CRNN
US20190318725A1 (en) * 2018-04-13 2019-10-17 Mitsubishi Electric Research Laboratories, Inc. Methods and Systems for Recognizing Simultaneous Speech by Multiple Speakers
CN110400275A (en) * 2019-07-22 2019-11-01 中电健康云科技有限公司 One kind being based on full convolutional neural networks and the pyramidal color calibration method of feature

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368831A (en) * 2017-07-19 2017-11-21 中国人民解放军国防科学技术大学 English words and digit recognition method in a kind of natural scene image
CN108875722A (en) * 2017-12-27 2018-11-23 北京旷视科技有限公司 Character recognition and identification model training method, device and system and storage medium
US20190251431A1 (en) * 2018-02-09 2019-08-15 Salesforce.Com, Inc. Multitask Learning As Question Answering
US20190318725A1 (en) * 2018-04-13 2019-10-17 Mitsubishi Electric Research Laboratories, Inc. Methods and Systems for Recognizing Simultaneous Speech by Multiple Speakers
CN109753954A (en) * 2018-11-14 2019-05-14 安徽艾睿思智能科技有限公司 The real-time positioning identifying method of text based on deep learning attention mechanism
CN109543681A (en) * 2018-11-20 2019-03-29 中国石油大学(华东) Character recognition method under a kind of natural scene based on attention mechanism
CN109992783A (en) * 2019-04-03 2019-07-09 同济大学 Chinese term vector modeling method
CN110147788A (en) * 2019-05-27 2019-08-20 东北大学 A kind of metal plate and belt Product labelling character recognition method based on feature enhancing CRNN
CN110400275A (en) * 2019-07-22 2019-11-01 中电健康云科技有限公司 One kind being based on full convolutional neural networks and the pyramidal color calibration method of feature

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BAOGUANG SHI: "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》, pages 2299 *
HAO WEI: "Biomedical Named Entity Recognition via A hybrid neural network model", 《2019 IEEE 14TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING. PROCEEDINGS》, pages 456 *
VASWANI, ASHISH: "attention is all you need", 《31ST ANNUALCONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》, pages 1 - 15 *
张冬梅: "基于LSTM与多头注意力机制的恶意域名检测算法", 《软件》, pages 83 - 90 *
邢吉亮: "结合注意力机制的Bi-LSTM循环神经网络对关系分类的研究", 《中国优秀硕士学位论文全文数据库》, pages 138 - 2026 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898606B (en) * 2020-05-19 2023-04-07 武汉东智科技股份有限公司 Night imaging identification method for superimposing transparent time characters in video image
CN111898606A (en) * 2020-05-19 2020-11-06 武汉东智科技股份有限公司 Night imaging identification method for superimposing transparent time characters in video image
CN112052889B (en) * 2020-08-28 2023-05-05 西安电子科技大学 Laryngoscope image recognition method based on double-gating recursion unit decoding
CN112052889A (en) * 2020-08-28 2020-12-08 西安电子科技大学 Laryngoscope image identification method based on double-gating recursive unit decoding
CN112508023A (en) * 2020-10-27 2021-03-16 重庆大学 Deep learning-based end-to-end identification method for code-spraying characters of parts
CN112183486A (en) * 2020-11-02 2021-01-05 中山大学 Method for rapidly identifying single-molecule nanopore sequencing base based on deep network
CN112183486B (en) * 2020-11-02 2023-08-01 中山大学 Method for rapidly identifying single-molecule nanopore sequencing base based on deep network
CN112836748A (en) * 2021-02-02 2021-05-25 太原科技大学 Casting identification character recognition method based on CRNN-CTC
CN112990181A (en) * 2021-04-30 2021-06-18 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and storage medium
CN113516124A (en) * 2021-05-29 2021-10-19 大连民族大学 Electric energy meter electricity consumption information identification algorithm based on computer vision technology
CN113516124B (en) * 2021-05-29 2023-08-11 大连民族大学 Electric energy meter electricity consumption identification algorithm based on computer vision technology
CN113537339B (en) * 2021-07-14 2023-06-02 中国地质大学(北京) Method and system for identifying symbiotic or associated minerals based on multi-label image classification
CN113537339A (en) * 2021-07-14 2021-10-22 中国地质大学(北京) Method and system for identifying symbiotic or associated minerals based on multi-label image classification
CN114724168A (en) * 2022-05-10 2022-07-08 北京百度网讯科技有限公司 Training method of deep learning model, text recognition method, text recognition device and text recognition equipment
CN116072274A (en) * 2023-03-06 2023-05-05 四川互慧软件有限公司 Automatic dispatch system for medical care of ambulance
CN116758544A (en) * 2023-08-17 2023-09-15 泓浒(苏州)半导体科技有限公司 Wafer code recognition system based on image processing
CN116758544B (en) * 2023-08-17 2023-10-20 泓浒(苏州)半导体科技有限公司 Wafer code recognition system based on image processing

Also Published As

Publication number Publication date
CN111027562B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN111027562A (en) Optical character recognition method based on multi-scale CNN and RNN combined with attention mechanism
CN107239801B (en) Video attribute representation learning method and video character description automatic generation method
CN112818951B (en) Ticket identification method
CN111738169A (en) Handwriting formula recognition method based on end-to-end network model
CN112686219B (en) Handwritten text recognition method and computer storage medium
US11915465B2 (en) Apparatus and methods for converting lineless tables into lined tables using generative adversarial networks
CN112836702B (en) Text recognition method based on multi-scale feature extraction
CN113901952A (en) Print form and handwritten form separated character recognition method based on deep learning
CN111539417B (en) Text recognition training optimization method based on deep neural network
CN111027553A (en) Character recognition method for circular seal
CN113961710B (en) Fine-grained thesis classification method and device based on multi-mode layered fusion network
He Research on text detection and recognition based on OCR recognition technology
CN112418225A (en) Offline character recognition method for address scene recognition
CN117251795A (en) Multi-mode false news detection method based on self-adaptive fusion
CN111242829A (en) Watermark extraction method, device, equipment and storage medium
CN115035531A (en) Retail terminal character recognition method and system
Chen et al. Scene text recognition based on deep learning: a brief survey
CN113901913A (en) Convolution network for ancient book document image binaryzation
CN116311275B (en) Text recognition method and system based on seq2seq language model
CN116994282B (en) Reinforcing steel bar quantity identification and collection method for bridge design drawing
CN115861663B (en) Document image content comparison method based on self-supervision learning model
CN114581906B (en) Text recognition method and system for natural scene image
Manzoor et al. A Novel System for Multi-Linguistic Text Identification and Recognition in Natural Scenes using Deep Learning
Sharma et al. Feature Extraction and Image Recognition of Cursive Handwritten English Words Using Neural Network and IAM Off‐Line Database
CN117079288B (en) Method and model for extracting key information for recognizing Chinese semantics in scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant