CN115497095A - OCR character recognition method and system based on attention mechanism - Google Patents

OCR character recognition method and system based on attention mechanism Download PDF

Info

Publication number
CN115497095A
CN115497095A CN202211182141.0A CN202211182141A CN115497095A CN 115497095 A CN115497095 A CN 115497095A CN 202211182141 A CN202211182141 A CN 202211182141A CN 115497095 A CN115497095 A CN 115497095A
Authority
CN
China
Prior art keywords
text
feature
network
layer
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211182141.0A
Other languages
Chinese (zh)
Inventor
张盛洪
张国慧
张志坚
罗瑞明
王硕君
英树祥
邓雄文
梁岸平
蒋秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Jiangmen Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Jiangmen Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Jiangmen Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN202211182141.0A priority Critical patent/CN115497095A/en
Publication of CN115497095A publication Critical patent/CN115497095A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides an OCR character recognition method and system based on an attention mechanism, which can reserve more text features by using a multi-scale feature fusion method with the attention mechanism, thereby improving the condition of text omission. In addition, when the final feature map is obtained, the coordinate attention is used for capturing the feature correlation at a long distance, and the detection of long texts is facilitated. Meanwhile, simple post-processing is adopted, so that the accuracy and reasoning speed of text detection are improved, and the recognition result of the text is more accurate.

Description

OCR character recognition method and system based on attention mechanism
Technical Field
The invention belongs to the technical field of text recognition, and particularly relates to an OCR character recognition method and system based on an attention mechanism.
Background
At present, a user uploads a business license is a means for obtaining authentication, generally, the content of the business license needs to be filled, for a text with more contents, the filling process is time-consuming and labor-consuming, and is easy to be filled in by mistake, and in the prior art, the step of text recognition of the business license is complicated and the calculation amount is large, so that the efficiency of text recognition is reduced.
OCR character recognition technology refers to the process of an electronic device (e.g., a scanner or digital camera) examining printed characters on paper and then translating the shape into computer text using character recognition methods. The existing recognition method based on OCR is mostly realized based on the traditional model, for the traditional model, the consumption of detection time is often larger, the effect of detecting long texts is not good, the detection omission often occurs for the small-scale texts of the multi-scale texts, or the accuracy is low under the condition of complex background, such as fuzzy images.
When the image text is detected, the prior art often detects missed detection of small-scale texts, usually detects a plurality of bounding boxes under the condition of long text lines, and has poor detection effect, poor robustness of a model and overlong reasoning time when the definition of the image is not enough.
Disclosure of Invention
In view of this, the present invention aims to solve the problems of text omission and poor detection effect of bounding boxes in the conventional OCR character recognition technology.
In order to solve the technical problems, the invention provides the following technical scheme:
in a first aspect, the present invention provides an OCR character recognition method based on an attention mechanism, including the following steps:
preprocessing an image of an input picture to be recognized, and constructing a required word bank;
sending the processed picture into a text detection network to obtain a text bounding box coordinate, and performing text feature detection on the processed picture by the text detection network based on an attention mechanism;
clipping the input image according to the coordinates of the text bounding box to obtain a series of pictures only containing one line of text;
and sending the cut pictures into a text recognition network in sequence, and obtaining a final text recognition result after word bank comparison.
Further, image preprocessing is performed on the input picture to be recognized, and a required word bank is constructed, specifically including:
reading an input image and decoding the image into an image matrix with an RGB format;
keeping the width-to-height ratio of the image, and scaling the short edge in the image to 736 pixels;
normalizing the image matrix;
and constructing a corresponding word bank for the characters to be identified.
Further, the processed picture is sent to a text detection network to obtain a text bounding box coordinate, and the text detection network performs text feature detection on the processed picture based on an attention mechanism, and specifically includes:
the processed image is sent to a residual backbone network for preliminary extraction of features;
the residual error network has four residual error modules, the last layer of characteristic diagram of each residual error module is taken out to construct a characteristic pyramid which is respectively marked as the 1 st, 2 nd, 3 rd and 4 th layers from top to bottom;
firstly, performing attention feature fusion on the features of the 1 st layer and the 2 nd layer and performing convolution operation to obtain corrected feature maps of the 1 st layer and the 2 nd layer;
performing the attention feature fusion operation on the corrected layer 2 feature map and the layer 3 feature map, and then performing the attention feature fusion operation on the obtained corrected layer 3 feature map and the layer 4 feature map;
sampling all layers of the corrected feature pyramid to the scale of a low-layer feature map for splicing;
performing secondary feature re-correction on the spliced feature map through coordinate attention;
setting a pixel threshold value to be 0.2, setting a value larger than 0.2 in the final characteristic diagram to be 1, and setting a value smaller than or equal to 0.2 to be 0 to obtain a binary diagram;
in the binary image, 1 represents a text area, 0 represents a non-text area, a text outline is obtained by using a function in opencv, and a text box with the maximum confidence is selected as a final text outline, so that the coordinates of a text boundary box are obtained.
Further, the method for clipping the input image according to the coordinates of the text bounding box to obtain a series of pictures only containing one line of text lines specifically comprises the following steps:
clipping the input image according to the coordinates of the text bounding box;
arranging the cut images from top to bottom and from left to right;
these pictures were scaled to a size of 32x100 pixels.
Further, the cut pictures are sequentially sent to a text recognition network, and a final text recognition result is obtained after word bank comparison, and the method specifically comprises the following steps:
sending the cut pictures into a text recognition network in sequence;
extracting text features through a CNN network, and converting a feature graph into a feature sequence;
sending the text to an RNN (radio network) circulating network for prediction and identification of the text;
and inputting the predicted recognition result into a CTC algorithm network, and obtaining a final recognition result after word bank comparison.
In a second aspect, the present invention provides an OCR character recognition system based on attention mechanism, including:
the preprocessing unit is used for preprocessing the image of the input picture to be recognized and constructing a required word bank;
the first processing unit is used for sending the processed picture into a text detection network to obtain a text boundary box coordinate, and the text detection network performs text feature detection on the processed picture based on an attention mechanism;
the second processing unit is used for cutting the input image according to the coordinates of the text bounding box to obtain a series of pictures only containing one line of text lines;
and the recognition unit is used for sequentially sending the cut pictures into a text recognition network and obtaining a final text recognition result after word bank comparison.
Further, in the preprocessing unit, image preprocessing is performed on the input picture to be recognized, and a required word bank is constructed, which specifically includes:
reading an input image and decoding the input image into an image matrix with an RGB format;
keeping the width-to-height ratio of the image, and scaling the short edge in the image to 736 pixels;
normalizing the image matrix;
and constructing a corresponding word bank for the characters to be identified.
Further, in the first processing unit, the processed picture is sent to a text detection network to obtain coordinates of a text bounding box, and the text detection network performs text feature detection on the processed picture based on an attention mechanism, and specifically includes:
the processed image is sent to a residual backbone network for preliminary extraction of features;
the residual error network has four residual error modules, the last layer of characteristic diagram of each residual error module is taken out to construct a characteristic pyramid which is respectively marked as the 1 st, 2 nd, 3 rd and 4 th layers from top to bottom;
firstly, performing attention feature fusion on the features of the 1 st layer and the 2 nd layer and performing convolution operation to obtain corrected feature maps of the 1 st layer and the 2 nd layer;
performing the attention feature fusion operation on the corrected layer 2 feature map and the layer 3 feature map, and performing the attention feature fusion operation on the obtained corrected layer 3 feature map and the layer 4 feature map;
sampling all layers of the corrected feature pyramid to the scale of a low-layer feature map for splicing;
performing secondary feature re-correction on the spliced feature map through coordinate attention;
setting a pixel threshold value to be 0.2, setting a value larger than 0.2 in the final characteristic diagram to be 1, and setting a value smaller than or equal to 0.2 to be 0 to obtain a binary diagram;
in the binary image, 1 represents a text area, 0 represents a non-text area, a text outline is obtained by using a function in opencv, and a text box with the maximum confidence is selected as a final text outline, so that the coordinates of a text boundary box are obtained.
Further, in the second processing unit, the input image is cropped according to the coordinates of the text bounding box to obtain a series of pictures only including a line of text lines, which specifically includes:
clipping the input image according to the coordinates of the text bounding box;
arranging the cut images from top to bottom and from left to right;
these pictures were scaled to a size of 32x100 pixels.
Further, in the recognition unit, the cut pictures are sequentially sent to a text recognition network, and a final text recognition result is obtained after word bank comparison, which specifically comprises:
sending the cut pictures into a text recognition network in sequence;
extracting text features through a CNN network, and converting a feature graph into a feature sequence;
sending the text into an RNN (radio network) to perform text prediction and identification;
and inputting the predicted recognition result into a CTC algorithm network, and obtaining a final recognition result after word bank comparison.
In conclusion, the invention provides an OCR character recognition method and system based on an attention mechanism, which can retain more text features by using a multi-scale feature fusion method with the attention mechanism, thereby improving the text omission condition. In addition, when the final feature map is obtained, the coordinate attention is used for capturing the feature correlation at a long distance, and the detection of long texts is facilitated. Meanwhile, simple post-processing is adopted, so that the accuracy and reasoning speed of text detection are improved, and the recognition result of the text is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive labor.
Fig. 1 is a schematic flowchart of an OCR character recognition method based on an attention mechanism according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a text detection network according to an embodiment of the present invention;
FIG. 3 is a block diagram of an attention feature fusion provided by an embodiment of the present invention;
FIG. 4 is a diagram illustrating a convolution operation according to an embodiment of the present invention;
fig. 5 is a diagram of a coordinate attention structure provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, a user uploads a business license is a means for obtaining authentication, generally, the content of the business license needs to be filled, for a text with more contents, the filling process is time-consuming and labor-consuming, and is easy to be filled in by mistake, and in the prior art, the step of text recognition of the business license is complicated and the calculation amount is large, so that the efficiency of text recognition is reduced.
OCR character recognition technology refers to the process of an electronic device (e.g., a scanner or digital camera) examining printed characters on paper and then translating the shape into computer text using character recognition methods. The conventional OCR-based recognition method is mostly realized based on a conventional model, for the conventional model, the detection time consumption is often large, the effect of detecting long texts is poor, detection omission often occurs for small-scale texts of multi-scale texts, or the accuracy is low under the condition of complex backgrounds, such as blurred images.
When the picture text is detected, the prior art usually detects missed detection of small-scale texts, usually detects a plurality of bounding boxes under the condition of long text lines, and has poor detection effect, poor robustness of a model and overlong reasoning time when the definition of the picture is insufficient.
Based on the method, the invention provides an OCR character recognition method and system based on an attention mechanism.
An embodiment of an OCR character recognition method based on the attention mechanism according to the present invention is described in detail below.
Referring to fig. 1, the present embodiment provides an OCR character recognition method based on an attention mechanism, including:
step 1: and preprocessing the image of the input picture to be recognized and constructing a required word bank.
And 2, step: and sending the processed picture into a text detection network to obtain the coordinates of the text bounding box, and carrying out text feature detection on the processed picture by the text detection network based on an attention mechanism.
And 3, step 3: and cutting the input image according to the coordinates of the text bounding box to obtain a series of pictures only containing one line of text lines.
And 4, step 4: and sending the cut pictures into a text recognition network in sequence, and obtaining a final text recognition result after word bank comparison.
In an alternative embodiment, the preprocessing and constructing the word stock in step 1 includes:
1.1: the read input image is decoded into an image matrix having an RGB format.
1.2: the width-to-height ratio of the image is maintained and the short edge in the image is scaled to 736 pixels.
1.3: and normalizing the image matrix.
1.4: and constructing a word bank for the characters to be identified.
In an alternative embodiment, the structure of the text detection network described in step 2 is shown in fig. 2. The process of further processing using the text detection network is as follows:
2.1: and sending the processed image into a residual backbone network for preliminary feature extraction.
2.2: the residual error network has four residual error modules, the last layer of feature map of each residual error module is taken out to construct a feature pyramid, and the feature pyramid is respectively marked as the 1 st, 2 nd, 3 rd and 4 th layers from top to bottom.
2.3: the features of layers 1 and 2 are first fused together by attention features, and the fused structure is shown in fig. 3.
And performing up-sampling on the feature map of the layer 1 to enable the feature map to have the same width as the feature map of the layer 2, performing pixel-by-pixel addition, and then performing convolution module operation by two branches, wherein one branch is used for compressing spatial pixels to 1 through global pooling firstly, and then performing convolution module operation, and the other branch is used for directly performing convolution module operation.
The convolution operation is shown in FIG. 4: the 1x1 convolution is firstly carried out to compress the channels so as to reduce the memory consumption, the activation function is carried out after the normalization to increase the nonlinear relation among the features, then the 1x1 convolution is carried out to expand the channels to the original channel number, and the normalization is carried out again.
And adding the feature graphs obtained after the two branches pixel by pixel, calculating attention weight by using a Sigmoid activation function, and multiplying the attention weight by the feature graphs of the 1 st layer and the 2 nd layer pixel by pixel respectively to obtain the feature graphs of the 1 st layer and the 2 nd layer after correction.
2.4: and performing the attention feature fusion operation on the corrected layer 2 feature map and the layer 3 feature map, and performing the attention feature fusion operation on the obtained corrected layer 3 feature map and the layer 4 feature map.
2.5: and (5) performing upsampling on each layer of the corrected feature pyramid to the scale of the feature map of the lower layer for splicing.
2.6: and performing secondary feature re-correction on the obtained spliced feature map through coordinate attention. Coordinate attention is shown in fig. 5:
the feature map is subjected to global pooling along an X axis and a Y axis respectively, the feature map is spliced along spatial dimension after Reshape operation, nonlinearity of the feature map is increased through a convolution module, then the feature map is split into two paths by using split operation, attention weights of the X axis and the Y axis are obtained through convolution operation and a Sigmoid activation function, and the feature map after secondary correction is obtained through pixel-by-pixel multiplication in sequence.
2.7: setting the pixel threshold value to be 0.2, setting the value which is greater than 0.2 in the final characteristic diagram to be 1, and setting the value which is less than or equal to 0.2 to be 0, so as to obtain a binary diagram.
2.8: in the binary image, 1 represents a text region, 0 represents a non-text region, a text contour is obtained by using a function in opencv, and a text box with the maximum confidence is selected as a final text contour. Whereby the coordinates of the text bounding box can be determined.
In an alternative embodiment, the specific process of step 3 is as follows:
3.1: and (4) cutting the input image according to the coordinate points obtained in the step (2).
3.2: the clipped images are arranged from top to bottom and from left to right.
3.3: these pictures were scaled to a size of 32x100 pixels.
In an alternative embodiment, the specific process of step 4 is as follows:
4.1: and (4) sequentially sending the pictures in the step (3) to a text recognition network.
4.2: text features are extracted through a CNN network, and feature graphs are converted into feature sequences.
4.3: and sending the text into an RNN circulating network for text prediction and identification.
The embodiment provides an OCR character recognition method based on an attention mechanism, more text features can be reserved by using a multi-scale feature fusion method with the attention mechanism, and therefore the text omission condition is improved. In addition, when the final feature map is obtained, the coordinate attention is used for capturing the feature correlation at a long distance, and the detection of long texts is facilitated. Meanwhile, simple post-processing is adopted, the accuracy and reasoning speed of text detection are improved, and the recognition result of the text is more accurate.
Compared with the prior art, the character recognition method provided by the embodiment has the following advantages:
1. compared with the traditional method adopting the optical character recognition, the method adopting the new deep learning model has the advantages of higher efficiency, less time consumption, less training amount required to be consumed and higher text recognition precision.
2. The fusion attention is embedded into the multi-scale feature pyramid, the inconsistency among scales is corrected through an attention mechanism during feature fusion, more scale information is reserved, and therefore the text detection effect on different scales is better
3. Finally, coordinate attention is used to obtain a final feature map, and the attention can capture the correlation among features at a longer distance, especially for long texts, the detection error of boundaries can be reduced, and therefore the detection effect is better for texts with different lengths.
4. Simple binarization post-processing operation is used, and the inference time of the model is improved
The foregoing is a detailed description of an embodiment of an OCR character recognition method based on attention mechanism according to the present invention, and the following is a detailed description of an embodiment of an OCR character recognition system based on attention mechanism according to the present invention.
The embodiment provides an attention mechanism-based OCR character recognition system, which includes: the device comprises a preprocessing unit, a first processing unit, a second processing unit and a recognition unit.
In this embodiment, the preprocessing unit is configured to perform image preprocessing on an input picture to be recognized, and construct a required lexicon.
Specifically, in the preprocessing unit, image preprocessing is performed on an input picture to be recognized, and a required word bank is constructed, which specifically includes:
reading an input image and decoding the image into an image matrix with an RGB format;
keeping the width-to-height ratio of the image, and scaling the short edge in the image to 736 pixels;
normalizing the image matrix;
and constructing a corresponding word stock for the characters to be recognized.
In this embodiment, the first processing unit is configured to send the processed picture to a text detection network to obtain a text bounding box coordinate, and the text detection network performs text feature detection on the processed picture based on an attention mechanism.
Specifically, in the first processing unit, the processed picture is sent to a text detection network to obtain a text bounding box coordinate, and the method specifically includes:
the processed image is sent to a residual backbone network for primary extraction of features;
the residual error network has four residual error modules, the last layer of characteristic diagram of each residual error module is taken out to construct a characteristic pyramid which is respectively marked as the 1 st, 2 nd, 3 rd and 4 th layers from top to bottom;
firstly, performing attention feature fusion on the features of the 1 st layer and the 2 nd layer and performing convolution operation to obtain corrected feature maps of the 1 st layer and the 2 nd layer;
performing the attention feature fusion operation on the corrected layer 2 feature map and the layer 3 feature map, and then performing the attention feature fusion operation on the obtained corrected layer 3 feature map and the layer 4 feature map;
sampling all layers of the corrected feature pyramid to the scale of a low-layer feature map for splicing;
performing secondary re-correction on the characteristics of the spliced characteristic diagram through coordinate attention;
setting a pixel threshold value to be 0.2, setting a value which is larger than 0.2 in the final characteristic diagram to be 1, and setting a value which is smaller than or equal to 0.2 to be 0 to obtain a binary diagram;
in the binary image, 1 represents a text area, 0 represents a non-text area, a text outline is obtained by using a function in opencv, and a text box with the maximum confidence is selected as a final text outline, so that the coordinates of a text boundary box are obtained.
In this embodiment, the second processing unit is configured to crop the input image according to the coordinates of the text bounding box, so as to obtain a series of pictures including only one line of text.
Specifically, in the second processing unit, the cropping is performed on the input image according to the coordinates of the text bounding box to obtain a series of pictures only including a line of text lines, and the method specifically includes:
clipping the input image according to the coordinates of the text bounding box;
arranging the cut images from top to bottom and from left to right;
these pictures were scaled to a size of 32x100 pixels.
In this embodiment, the recognition unit is configured to sequentially send the cut pictures to a text recognition network, and obtain a final text recognition result after word library comparison.
Specifically, in the recognition unit, the cut pictures are sequentially sent to a text recognition network, and a final text recognition result is obtained after word bank comparison, which specifically comprises:
sending the cut pictures into a text recognition network in sequence;
text features are extracted through a CNN network, and a feature graph is converted into a feature sequence;
sending the text into an RNN (radio network) to perform text prediction and identification;
and inputting the predicted recognition result into a CTC algorithm network, and obtaining a final recognition result after word bank comparison.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An OCR character recognition method based on an attention mechanism is characterized by comprising the following steps:
preprocessing an image of an input picture to be recognized, and constructing a required word bank;
sending the processed picture into a text detection network to obtain a text bounding box coordinate, wherein the text detection network performs text feature detection on the processed picture based on an attention mechanism;
cutting the input image according to the coordinates of the text bounding box to obtain a series of pictures only containing one line of text lines;
and sending the cut pictures into a text recognition network in sequence, and obtaining a final text recognition result after word bank comparison.
2. An OCR character recognition method based on an attention mechanism as claimed in claim 1, wherein the image preprocessing is performed on the input picture to be recognized, and a required lexicon is constructed, specifically comprising:
reading an input image and decoding the image into an image matrix with an RGB format;
keeping the width-to-height ratio of the image, and scaling the short edge in the image to 736 pixels;
normalizing the image matrix;
and constructing a corresponding word bank for the characters to be identified.
3. An OCR character recognition method based on an attention mechanism as claimed in claim 2, wherein the step of sending the processed picture to a text detection network to obtain coordinates of a text bounding box, wherein the text detection network performs text feature detection on the processed picture based on the attention mechanism specifically comprises:
the processed image is sent to a residual backbone network for preliminary extraction of features;
the residual error network has four residual error modules, the last layer of characteristic diagram of each residual error module is taken out to construct a characteristic pyramid which is respectively marked as the 1 st, 2 nd, 3 rd and 4 th layers from top to bottom;
firstly, performing attention feature fusion on the features of the 1 st layer and the 2 nd layer and performing convolution operation to obtain corrected feature maps of the 1 st layer and the 2 nd layer;
performing the attention feature fusion operation on the corrected layer 2 feature map and the layer 3 feature map, and then performing the attention feature fusion operation on the obtained corrected layer 3 feature map and the layer 4 feature map;
sampling all layers of the corrected feature pyramid to the scale of a low-layer feature map for splicing;
performing secondary re-correction on the characteristics of the spliced characteristic diagram through coordinate attention;
setting a pixel threshold value to be 0.2, setting a value larger than 0.2 in the final characteristic diagram to be 1, and setting a value smaller than or equal to 0.2 to be 0 to obtain a binary diagram;
and 1 in the binary image represents a text area, 0 represents a non-text area, a text outline is obtained by using a function in opencv, and a text box with the maximum confidence coefficient is selected as a final text outline, so that the coordinates of a text boundary box are obtained.
4. An OCR character recognition method based on an attention mechanism as claimed in claim 3, wherein the cropping of the input image according to the coordinates of the text bounding box to obtain a series of pictures containing only one line of text includes:
clipping the input image according to the coordinates of the text bounding box;
arranging the cut images from top to bottom and from left to right;
these pictures were scaled to a size of 32x100 pixels.
5. An OCR character recognition method based on an attention mechanism as claimed in claim 4, wherein the clipped pictures are sequentially sent to a text recognition network, and a final text recognition result is obtained after the word stock comparison, specifically comprising:
sending the cut pictures into a text recognition network in sequence;
extracting text features through a CNN network, and converting a feature graph into a feature sequence;
sending the text to an RNN (radio network) circulating network for prediction and identification of the text;
and inputting the predicted recognition result into a CTC algorithm network, and obtaining a final recognition result after word bank comparison.
6. An attention-based OCR character recognition system comprising:
the preprocessing unit is used for preprocessing the image of the input picture to be recognized and constructing a required word bank;
the first processing unit is used for sending the processed pictures into a text detection network to obtain the coordinates of a text bounding box, and the text detection network is used for carrying out text feature detection on the processed pictures based on an attention mechanism;
the second processing unit is used for cutting the input image according to the coordinates of the text bounding box to obtain a series of pictures only containing a line of text lines;
and the recognition unit is used for sequentially sending the cut pictures into a text recognition network and obtaining a final text recognition result after word bank comparison.
7. An OCR character recognition system based on an attention mechanism as claimed in claim 6, wherein in the preprocessing unit, the image preprocessing is performed on the input picture to be recognized, and a required lexicon is constructed, specifically comprising:
reading an input image and decoding the image into an image matrix with an RGB format;
keeping the width-to-height ratio of the image, and scaling the short edge in the image to 736 pixels;
normalizing the image matrix;
and constructing a corresponding word stock for the characters to be recognized.
8. An attention-based OCR character recognition system according to claim 7, wherein in the first processing unit, the processed picture is sent to a text detection network, and the text detection network performs text feature detection on the processed picture based on the attention mechanism to obtain coordinates of a text bounding box, specifically comprising:
the processed image is sent to a residual backbone network for primary extraction of features;
the residual error network has four residual error modules, the last layer of feature map of each residual error module is taken out to construct a feature pyramid, and the feature pyramid is respectively marked as the 1 st, 2 nd, 3 rd and 4 th layers from top to bottom;
firstly, performing attention feature fusion on the features of the 1 st layer and the 2 nd layer and performing convolution operation to obtain corrected feature maps of the 1 st layer and the 2 nd layer;
performing the attention feature fusion operation on the corrected layer 2 feature map and the layer 3 feature map, and then performing the attention feature fusion operation on the obtained corrected layer 3 feature map and the layer 4 feature map;
sampling all layers of the corrected feature pyramid to the scale of a low-layer feature map for splicing;
performing secondary re-correction on the characteristics of the spliced characteristic diagram through coordinate attention;
setting a pixel threshold value to be 0.2, setting a value larger than 0.2 in the final characteristic diagram to be 1, and setting a value smaller than or equal to 0.2 to be 0 to obtain a binary diagram;
and 1, a text area is represented in the binary image, 0 represents a non-text area, a text outline is obtained by using a function in opencv, and a text box with the maximum confidence is selected as a final text outline, so that the coordinates of a text boundary box are obtained.
9. An attention-based OCR character recognition system according to claim 8 and wherein said second processing unit is operative to crop the input image according to the coordinates of the text bounding box to obtain a series of pictures containing only one line of text, and specifically comprises:
clipping the input image according to the coordinates of the text bounding box;
arranging the cut images from top to bottom and from left to right;
these pictures were scaled to a size of 32x100 pixels.
10. An OCR character recognition system based on an attention mechanism as claimed in claim 9, wherein in the recognition unit, the clipped pictures are sequentially sent to a text recognition network, and a final text recognition result is obtained after the word stock comparison, specifically comprising:
sending the cut pictures into a text recognition network in sequence;
text features are extracted through a CNN network, and a feature graph is converted into a feature sequence;
sending the text into an RNN (radio network) to perform text prediction and identification;
and inputting the predicted recognition result into a CTC algorithm network, and obtaining a final recognition result after word bank comparison.
CN202211182141.0A 2022-09-27 2022-09-27 OCR character recognition method and system based on attention mechanism Pending CN115497095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211182141.0A CN115497095A (en) 2022-09-27 2022-09-27 OCR character recognition method and system based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211182141.0A CN115497095A (en) 2022-09-27 2022-09-27 OCR character recognition method and system based on attention mechanism

Publications (1)

Publication Number Publication Date
CN115497095A true CN115497095A (en) 2022-12-20

Family

ID=84471522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211182141.0A Pending CN115497095A (en) 2022-09-27 2022-09-27 OCR character recognition method and system based on attention mechanism

Country Status (1)

Country Link
CN (1) CN115497095A (en)

Similar Documents

Publication Publication Date Title
CN113111871B (en) Training method and device of text recognition model, text recognition method and device
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN110046529B (en) Two-dimensional code identification method, device and equipment
CN111814722A (en) Method and device for identifying table in image, electronic equipment and storage medium
CN112070649B (en) Method and system for removing specific character string watermark
CN112686219B (en) Handwritten text recognition method and computer storage medium
CN112883795B (en) Rapid and automatic table extraction method based on deep neural network
CN112926564B (en) Picture analysis method, system, computer device and computer readable storage medium
WO2024041032A1 (en) Method and device for generating editable document based on non-editable graphics-text image
CN111461070A (en) Text recognition method and device, electronic equipment and storage medium
CN111626145A (en) Simple and effective incomplete form identification and page-crossing splicing method
CN113591831A (en) Font identification method and system based on deep learning and storage medium
CN114429636B (en) Image scanning identification method and device and electronic equipment
CN116189162A (en) Ship plate detection and identification method and device, electronic equipment and storage medium
CN115578741A (en) Mask R-cnn algorithm and type segmentation based scanned file layout analysis method
CN115953744A (en) Vehicle identification tracking method based on deep learning
CN113807218B (en) Layout analysis method, device, computer equipment and storage medium
CN113065404B (en) Method and system for detecting train ticket content based on equal-width character segments
CN114067339A (en) Image recognition method and device, electronic equipment and computer readable storage medium
CN114758340A (en) Intelligent identification method, device and equipment for logistics address and storage medium
CN113657370A (en) Character recognition method and related equipment thereof
CN117115840A (en) Information extraction method, device, electronic equipment and medium
CN116311290A (en) Handwriting and printing text detection method and device based on deep learning
CN115497095A (en) OCR character recognition method and system based on attention mechanism
CN113065480B (en) Handwriting style identification method and device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination