CN112364873A

CN112364873A - Character recognition method and device for curved text image and computer equipment

Info

Publication number: CN112364873A
Application number: CN202011312589.0A
Authority: CN
Inventors: 朱锦祥; 臧磊
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-02-12
Also published as: WO2022105521A1

Abstract

The invention discloses a text recognition method, device and computer equipment for curved text images. The method includes: if a text image input by a user is received, preprocessing the text image to obtain a preprocessed text image; The feature extraction of the text image is performed to obtain the feature pyramid of the text image; the image segmentation process is performed on the feature map of each layer in the feature pyramid to obtain multiple segmentation mask maps of the text image; according to the breadth-first search algorithm, the smallest segmentation of the text area is performed. The mask map is expanded to obtain a first text box that can frame all characters in the text image; the text box is subjected to affine transformation to obtain a second text box after affine transformation; the second text box after affine transformation is The text is classified and recognized, and the text in the text image is obtained. The invention is based on the OCR recognition technology, and the method can not only accurately frame the text in the image, but also improve the accuracy of the text recognition.

Description

Character recognition method and device for curved text image and computer equipment

Technical Field

The invention belongs to the technical field of artificial intelligence character recognition, and particularly relates to a character recognition method and device for a curved text image and computer equipment.

Background

OCR technology refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer word using a character recognition method. In the prior art, when an image in a natural scene is subjected to character recognition, because a text in the image is usually designed into a curved shape, the text in the image cannot be framed through a rectangular bounding box in the existing OCR technology, and the accuracy of character recognition is seriously affected.

Disclosure of Invention

The embodiment of the invention provides a character recognition method and device of a curved text image and computer equipment, and aims to solve the problem that the accuracy of character recognition in the text image is low because a curved text in the existing text image cannot be framed through a rectangular bounding box.

In a first aspect, an embodiment of the present invention provides a method for recognizing characters in a curved text image, including:

if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image;

performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image;

performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image;

expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image;

carrying out affine transformation on the first text box to obtain a second text box after affine transformation;

and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.

In a second aspect, an embodiment of the present invention provides a text recognition apparatus for bending a text image, including:

the preprocessing unit is used for preprocessing the text image according to a preset first processing rule if the text image input by a user is received, so as to obtain a preprocessed text image;

the feature extraction unit is used for extracting features of the preprocessed text image to obtain a feature pyramid of the text image;

the segmentation unit is used for carrying out image segmentation processing on each layer of feature map in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask maps of the text image;

the first expansion unit is used for expanding the segmentation mask image with the smallest text area in the segmentation mask images according to the breadth-first search algorithm to obtain a first text box which frames all characters in the text image;

the affine transformation unit is used for carrying out affine transformation on the first text box to obtain a second text box after affine transformation;

and the identification unit is used for classifying and identifying the characters in the second text box after affine transformation to obtain the characters in the text image.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for recognizing words in a curved text image according to the first aspect when executing the computer program.

In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the method for recognizing words in a curved text image according to the first aspect.

The embodiment of the invention provides a character recognition method, a character recognition device and computer equipment for a curved text image, wherein the method comprises the following steps: if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image. By the method, not only can the text in the image be accurately framed, but also the accuracy of character recognition in the text image is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a text recognition method for a curved text image according to an embodiment of the present invention;

FIG. 2 is a sub-flow diagram of a text recognition method for a curved text image according to an embodiment of the present invention;

FIG. 3 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;

FIG. 4 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;

FIG. 5 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;

FIG. 6 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;

FIG. 7 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;

FIG. 8 is a schematic block diagram of a text recognition apparatus for warping a text image according to an embodiment of the present invention;

FIG. 9 is a block diagram of a sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;

FIG. 10 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;

FIG. 11 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;

FIG. 12 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;

FIG. 13 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;

FIG. 14 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;

FIG. 15 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a text recognition method for a curved text image according to an embodiment of the present invention. The character recognition method of the bent text image is built and operated in a server, and after the server receives that a text sent by intelligent terminal equipment such as a laptop, a tablet personal computer and the like is the bent text image, the text image is preprocessed and then is subjected to feature extraction to obtain a feature pyramid of the text image; then, carrying out image segmentation processing on each layer of feature images in the feature pyramid to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the minimum text area according to a breadth-first search algorithm to obtain a text box capable of framing the text in the text image; the text recognition is carried out after the text box is affine transformed, and then the characters in the bent text image can be recognized. The following describes the text recognition method of the curved text image in detail.

As shown in fig. 1, the method includes the following steps S110 to S160.

S110, if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image.

And if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image. Specifically, the text in the text image is a bent and deformed text, and the first processing rule is rule information for preprocessing the text image so that the text image is a square text image. And preprocessing the text image according to the first processing rule, so that after the text image is preprocessed, texts in the text image are more beneficial to text detection and recognition. In the embodiment of the invention, after a text image input by a user is received, the text image is read in an RGB mode, and then the text image is subjected to stretching processing according to the first processing rule, so that a square text image is obtained.

In another embodiment, as shown in fig. 2, step S110 includes sub-steps S111, S112 and S113.

And S111, obtaining a zooming factor for zooming the text image according to the long edge of the text image.

And obtaining a zooming factor for zooming the text image according to the long edge of the text image. Specifically, after receiving the text image input by the user, the terminal device reads the text image in an RGB mode, obtains the width and height of the text image, obtains the long edge of the text image from the width and height of the text image, and then scales the long edge of the text image in a certain proportion to obtain the scaling factor.

And S112, zooming the text image according to the zooming factor to obtain a zoomed text image.

And zooming the text image according to the zooming factor to obtain a zoomed text image. Specifically, after the scaling factor is obtained through the long edge of the text image, the width and the height of the text image are divided by the scaling factor respectively, so that the text image is scaled, and the scaled text image is obtained.

And S113, filling the short edges of the zoomed text image to obtain a square text image.

And filling the short side of the zoomed text image to obtain a square text image. Specifically, after the text image is scaled according to the scaling factor, the shape of the scaled text image is the same as the shape of the text image, and although only the size is changed and the text detection and recognition are easy to perform, the subsequent text detection is still affected to some extent, so that the short side of the scaled text image needs to be filled correspondingly. Since the terminal device reads the text image in the RGB model, when filling the scaled text image, only the RGB color (0,0,0) is needed to fill the short side of the scaled text image, and the square text image is obtained.

And S120, performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image.

And performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image. Specifically, in the process of extracting the features of the preprocessed text image, different convolution kernels are sequentially adopted to perform convolution from bottom to top to obtain a multilayer feature map of the text image, then the feature map of the top layer in the multilayer feature map of the text image is sampled to construct the top layer of the feature pyramid, sampling is performed from top to bottom on the basis of the top layer of the feature pyramid, meanwhile, corresponding feature maps in the multilayer feature map are transversely connected, and then the feature pyramid of the text image is constructed.

In another embodiment, as shown in fig. 3, step S120 includes sub-steps S121 and S122.

And S121, performing convolution processing on the preprocessed text image to obtain a multilayer characteristic diagram of the text image.

And performing convolution processing on the preprocessed text image to obtain a multilayer characteristic diagram of the text image. Specifically, the number of channels of each layer of feature map in the multilayer feature map from bottom to top gradually increases, the size gradually decreases, the features extracted from each layer are sent to the next layer as input, that is, the multilayer feature map is composed of feature maps of different convolution stages after the preprocessed text image is input into the convolutional neural network, the richness of semantic information of the multilayer feature map from bottom to top is gradually enhanced, and the resolution is gradually reduced. The semantic information in the feature map at the bottommost layer in the multilayer feature map is least, the resolution is highest, and the method is not suitable for detecting small targets; the top-most feature map in the multi-layer feature maps has the most abundant semantics and the lowest resolution, and is not suitable for detecting a large target. For example, when the convolution process of the convolutional neural network comprises four stages, conv1, conv2, conv3 and conv4, the feature map of the last layer of the four stages, conv1, conv2, conv3 and conv4, is extracted, so that a multi-layer feature map of the text image can be obtained.

And S122, constructing a feature pyramid of the text image according to the multilayer feature graph.

And constructing a characteristic gold tower of the text image according to the multilayer characteristic diagram. Specifically, firstly, feature maps of the top layer in the multi-layer feature maps of the text image are sampled to construct the top layer of the feature pyramid, then, based on the top-down process of the top layer of the feature pyramid, the more abstract and higher semantic features are sampled, and then the features are transversely connected to the features of the previous layer, so that the high-layer features are enhanced, the feature map used for predicting each layer is fused with the features with different resolutions and different semantic strengths, the detection of objects with corresponding resolutions can be completed, and each layer is ensured to have proper resolution and strong semantic features.

S130, performing image segmentation processing on each layer of feature map in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask maps of the text image.

And carrying out image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image. Specifically, the second processing rule is rule information used for performing image segmentation processing on each layer of feature map in the feature pyramid to obtain a segmentation mask map corresponding to each layer of feature map, that is, each layer of feature map in the feature pyramid corresponds to one segmentation mask map of the text image.

In another embodiment, as shown in fig. 4, step S130 includes sub-steps S131 and S132.

S131, inputting each layer of feature graph in the feature pyramid into a preset full convolution neural network model to obtain a semantic segmentation graph of each layer of feature graph in the feature pyramid.

And inputting each layer of feature graph in the feature pyramid into a preset full convolution neural network model to obtain a semantic segmentation graph of each layer of feature graph in the feature pyramid. Specifically, the full convolution neural network model is trained in advance and is used for performing pixel-level classification processing on each layer of feature map in the feature pyramid, so as to distinguish a background region from a text region in each layer of feature map in the feature pyramid. The full convolution neural network model adopts an online sampling strategy OHEM difficult to excavate in the training process. The core idea of the OHEM algorithm is to screen difficult cases according to the loss of input samples, and then apply the screened samples to random gradient descent training to complete the training of the full convolution neural network model.

S132, performing binarization processing on the semantic segmentation maps of each layer of feature map in the feature pyramid respectively to obtain a plurality of segmentation mask maps of the text image.

And respectively carrying out binarization processing on the semantic segmentation maps of each layer of feature map in the feature pyramid to obtain a plurality of segmentation mask maps of the text image. Specifically, the binarization processing is used for performing pixel processing on the semantic segmentation map so that only a background region and a text region are in the text image. The specific binarization processing process comprises the following steps: respectively carrying out sigmoid processing on the semantic segmentation graphs of each layer of the feature graph in the feature pyramid, enabling the value of a pixel point of the semantic segmentation graph of each layer of the feature graph in the feature pyramid to be between 0 and 1, and then realizing binarization of the pixel point by adding a preset threshold parameter, namely when the value of the pixel point is lower than or higher than 0.5, taking the pixel point as 0 or 1.

In another embodiment, as shown in fig. 5, step S132 includes sub-steps S1321, S1322, and S1323.

S1321, normalizing the semantic segmentation graph of each layer of feature graph in the feature pyramid to obtain the normalized semantic segmentation graph.

And normalizing the semantic segmentation graph of each layer of feature graph in the feature pyramid to obtain the normalized semantic segmentation graph. Specifically, pixel points of the semantic segmentation maps of each layer of the feature map in the feature pyramid are normalized by adopting a sigmoid function, so that the value of the pixel points of the semantic segmentation maps of each layer of the feature map in the feature pyramid is between 0 and 1.

And S1322, performing binarization processing on the semantic segmentation image after the normalization processing according to a preset threshold value to obtain a segmentation mask image of the semantic segmentation image after the normalization processing.

And carrying out binarization processing on the semantic segmentation image after the normalization processing according to a preset threshold value to obtain a segmentation mask image of the semantic segmentation image after the normalization processing. Specifically, after normalization processing is carried out on the semantic segmentation graph, the value of a pixel point of the semantic segmentation graph is between 0 and 1. Since there may be a plurality of pixel point values in the background region of the text image, a preset threshold needs to be set to perform binarization distinguishing between the background region and the text region of the text image. In the embodiment of the present invention, a preset threshold is set to be 0.5, and a region in the semantic segmentation map after the normalization processing, in which the pixel point value is higher than 0.5, is taken as a text region, the value of which is 1, and a region lower than 0.5 is taken as a background region, and the value of which is 0.

S1323, performing expansion and contraction processing on the segmentation mask image of the semantic segmentation image after the normalization processing to obtain the segmentation mask image of the text image.

And performing expansion and contraction processing on the segmentation mask image of the semantic segmentation image after the normalization processing to obtain the segmentation mask image of the text image. Specifically, after a corresponding segmentation mask image is obtained through each layer of feature images in the feature pyramid, the size of each segmentation mask image of the text image is smaller than that of the preprocessed text image, and therefore, it is necessary to perform scaling processing on each segmentation mask image in the text image, so that the size of each segmentation mask image is equal to that of the preprocessed text image.

And S140, expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image.

And expanding the segmentation mask image with the minimum text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image. Specifically, the breadth-first search algorithm is a blind search method, which aims to systematically expand and check all nodes in the graph to find the result, i.e., the breadth-first search algorithm does not consider the possible positions of the result, and thoroughly searches the whole graph until the result is found. And the segmentation mask image with the smallest text area in the segmentation mask images is the feature image at the top layer in the feature pyramid after the binarization processing, and the extension is performed by using a breadth-first search algorithm on the basis of the feature image, namely, the text areas of the feature images are extended by gradually adding more pixels into the text areas until all texts in the text image are covered. There may be conflicting pixels in the unfolding process, and the principle of dealing with conflicts in our practice is that aliased pixels can only be merged by a single kernel on a first come basis.

In another embodiment, as shown in fig. 6, step S140 includes sub-steps S141 and S142.

And S141, respectively acquiring a segmentation mask map with the minimum text region and a segmentation mask map with the maximum text region from the plurality of segmentation mask maps according to the top-layer feature map and the bottom-layer feature map in the feature pyramid.

And respectively acquiring a segmentation mask map with the minimum text region and a segmentation mask map with the maximum text region from the plurality of segmentation mask maps according to the top-layer feature map and the bottom-layer feature map in the feature pyramid. Specifically, since the top-level feature map in the feature pyramid has the smallest size, the richest semantic information, the lowest resolution, the bottom-level feature map has the smallest size, the smallest semantic information, and the highest resolution, after each division mask map is subjected to size scaling processing, the text box of the division mask map corresponding to the top-level feature map is the smallest, and the text box of the division mask map corresponding to the bottom-level feature map is the largest, so that the division mask map with the smallest text region can be obtained through the top-level feature map in the feature pyramid, and the division mask map with the largest text region can be obtained through the bottom-level feature map.

And S142, expanding the segmentation mask image with the minimum text region by adopting a breadth-first search algorithm based on the segmentation mask image with the maximum text region to obtain the first text box.

And expanding the segmentation mask map with the minimum text region by adopting a breadth-first search algorithm based on the segmentation mask map with the maximum text region to obtain the first text box. Because the text box of the segmentation mask map corresponding to the bottom layer feature map is the largest, but the semantic information of the segmentation mask map is the smallest, and the accuracy of character recognition through the segmentation mask map is not high, the text box in the segmentation mask map corresponding to the bottom layer feature map is used as a standard box, and the text box in the segmentation mask map corresponding to the top layer feature map is expanded by adopting a breadth-first search algorithm, so that the text box is expanded to be consistent with the size of the text box in the segmentation mask map corresponding to the bottom layer feature map, and the first text box of the text image can be obtained.

S150, carrying out affine transformation on the first text box to obtain a second text box after affine transformation.

And carrying out affine transformation on the first text box to obtain a second text box after affine transformation. Specifically, the first text box is subjected to affine transformation, and rectangular coordinates and minimum external rectangular coordinates are extracted by adopting a boudingRect function and a minAreaRect function of opencv based on a character connected domain respectively. For a rectangle with the width being more than or equal to 3 times of the height, the corresponding minimum external rectangle is converted into the rectangle through the radiation conversion module of opencv, and text recognition is easy to perform on a text region after affine conversion.

And S160, classifying and identifying the characters in the second text box after affine transformation to obtain the characters in the text image.

And classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image. In the implementation of the invention, the characters in the text image can be obtained by inputting the text box after affine transformation into a pre-trained recurrent neural network for classification and recognition.

In another embodiment, as shown in fig. 7, step S160 includes sub-steps S161 and S162.

S161, inputting the affine-transformed second text box into a preset bidirectional recurrent neural network model to obtain a feature vector sequence of the text image.

And inputting the affine-transformed second text box into a preset bidirectional recurrent neural network model to obtain a feature vector sequence of the text image. Specifically, the bidirectional recurrent neural network model is a pre-trained model composed of two reverse recurrent neural networks, a feature vector sequence with context information is obtained by inputting the affine-transformed second text box into the bidirectional recurrent neural network model, and then the feature vector sequence is classified and identified, so that characters in the text image can be obtained.

And S162, inputting the characteristic vector sequence of the text image into a preset classifier for classification and identification to obtain characters in the text image.

And inputting the characteristic vector sequence of the text image into a preset classifier for classification and identification to obtain characters in the text image. Specifically, the classifier is used for classifying and identifying the feature vector sequence of the text image, and then predicting and identifying the characters in the text image. In the embodiment of the invention, a BP neural network with three layers is used as a classifier and a hyperbolic tangent function is used as an activation function, and the calculation process is as follows:

h_id_ij＝tanh(w₁₁×h_j+w₁₂×s_t-1+b)

e_ij＝h_id_ij×w₂₁

wherein h is_id_ijIs i time to h_jHidden state of the circulating neural unit at the time of evaluation, e_ijTo score, w₁₁And w₁₂Is the second layer weight of BP neural network with three layers, b is the bias term, w₂₁Is the third layer weight of the BP neural network with three layers.

In the method for recognizing characters of a curved text image provided by the embodiment of the invention, if a text image input by a user is received, the text image is preprocessed according to a preset first processing rule to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image. By the method, not only can the text in the image be accurately framed, but also the accuracy of character recognition in the text image is improved.

The embodiment of the invention also provides a character recognition device 100 of the curved text image, which is used for executing any embodiment of the character recognition method of the curved text image. Specifically, referring to fig. 8, fig. 8 is a schematic block diagram of a text recognition apparatus 100 for a warped text image according to an embodiment of the present invention.

As shown in fig. 8, the text image character recognition apparatus 100 includes a preprocessing unit 110, a feature extraction unit 120, a segmentation unit 130, a first expansion unit 140, an affine transformation unit 150, and a recognition unit 160.

The preprocessing unit 110 is configured to, if a text image input by a user is received, preprocess the text image according to a preset first processing rule to obtain a preprocessed text image.

In other inventive embodiments, as shown in fig. 9, the preprocessing unit 110 includes: a first obtaining unit 111, a first scaling unit 112 and a filling unit 113.

A first obtaining unit 111, configured to obtain a scaling factor for scaling the text image according to a long edge of the text image.

The scaling unit 112 is configured to scale the text image according to the scaling factor to obtain a scaled text image.

A filling unit 113, configured to fill a short edge of the scaled text image to obtain a square text image.

A feature extraction unit 120, configured to perform feature extraction on the preprocessed text image to obtain a feature pyramid of the text image.

In another embodiment of the present invention, as shown in fig. 10, the feature extraction unit 120 includes: a convolution unit 121 and a construction unit 122.

And a convolution unit 121, configured to perform convolution processing on the preprocessed text image to obtain a multilayer feature map of the text image.

And the constructing unit 122 is configured to construct a feature pyramid of the text image according to the multilayer feature map.

And the segmentation unit 130 is configured to perform image segmentation processing on each layer of feature map in the feature pyramid according to a preset second processing rule, so as to obtain multiple segmentation mask maps of the text image.

In another embodiment of the present invention, as shown in fig. 11, the dividing unit 130 includes: a classification unit 131 and a processing unit 132.

The classifying unit 131 is configured to input each layer of feature map in the feature pyramid into a preset full convolution neural network model, so as to obtain a semantic segmentation map of each layer of feature map in the feature pyramid.

The processing unit 132 is configured to perform binarization processing on the semantic segmentation maps of each layer of feature map in the feature pyramid, respectively, to obtain a plurality of segmentation mask maps of the text image.

In another embodiment of the present invention, as shown in fig. 12, the processing unit 132 includes: a normalization processing unit 1321, a binarization processing unit 1322, and a scaling unit 1323.

The normalization processing unit 1321 is configured to perform normalization processing on the semantic segmentation map of each layer of the feature map in the feature pyramid to obtain a semantic segmentation map after the normalization processing.

A binarization processing unit 1322 is configured to perform binarization processing on the semantic segmentation map after the normalization processing according to a preset threshold value, so as to obtain a segmentation mask map of the semantic segmentation map after the normalization processing.

And a scaling unit 1323, configured to scale the segmentation mask map of the semantic segmentation map after the normalization processing to obtain the segmentation mask map of the text image.

The first expanding unit 140 is configured to expand the segmentation mask map with the smallest text area in the multiple segmentation mask maps according to a breadth-first search algorithm, so as to obtain a first text box in which all characters in the text image are framed.

In another embodiment of the present invention, as shown in fig. 13, the first extension unit 140 includes: a second acquisition unit 141 and a second expansion unit 142.

The second obtaining unit 141 is configured to obtain a segmentation mask map with a smallest text region and a segmentation mask map with a largest text region from the plurality of segmentation mask maps according to the top-level feature map and the bottom-level feature map in the feature pyramid.

A second expanding unit 142, configured to expand the segmentation mask map with the minimum text region by using a breadth-first search algorithm based on the segmentation mask map with the maximum text region, so as to obtain the first text box.

And the affine transformation unit 150 is configured to perform affine transformation on the first text box to obtain a second text box after affine transformation.

And the identifying unit 160 is configured to perform classification and identification on the characters in the second text box after affine transformation, so as to obtain the characters in the text image.

In another embodiment of the present invention, as shown in fig. 14, the identifying unit 160 includes: an input unit 161 and a classification recognition unit 162.

The input unit 161 is configured to input the affine-transformed second text box into a preset bidirectional recurrent neural network model, so as to obtain a feature vector sequence of the text image.

And the classification and identification unit 162 is configured to input the feature vector sequence of the text image into a preset classifier for classification and identification, so as to obtain characters in the text image.

The character recognition device 100 for a curved text image according to the embodiment of the present invention is configured to perform preprocessing on a text image according to a preset first processing rule if the text image input by a user is received, so as to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.

Referring to fig. 15, fig. 15 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Referring to fig. 15, the device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a method of text recognition of a warped text image.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall device 500.

The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 may be caused to execute a character recognition method for bending a text image.

The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 15 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the apparatus 500 to which aspects of the present invention may be applied, and that a particular apparatus 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following functions: if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.

Those skilled in the art will appreciate that the embodiment of the apparatus 500 shown in fig. 15 does not constitute a limitation on the specific construction of the apparatus 500, and in other embodiments, the apparatus 500 may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the apparatus 500 may only include the memory and the processor 502, and in such embodiments, the structure and function of the memory and the processor 502 are the same as those of the embodiment shown in fig. 15, and are not repeated herein.

It should be understood that in the present embodiment, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors 502, a Digital Signal Processor 502 (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general-purpose processor 502 may be a microprocessor 502 or the processor 502 may be any conventional processor 502 or the like.

In another embodiment of the present invention, a computer storage medium is provided. The storage medium may be a non-volatile computer-readable storage medium. The storage medium stores a computer program 5032, wherein the computer program 5032 when executed by the processor 502 performs the steps of: if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a device 500 (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. a character recognition method of curved text image, is characterized in that, comprises the following steps:

If the text image input by the user is received, the text image is preprocessed according to the preset first processing rule to obtain a preprocessed text image;

Perform feature extraction on the preprocessed text image to obtain a feature pyramid of the text image;

Perform image segmentation processing on each layer of feature maps in the feature pyramid according to a preset second processing rule to obtain multiple segmentation mask maps of the text image;

According to the breadth-first search algorithm, the segmentation mask map with the smallest text area in the plurality of segmentation mask images is expanded to obtain a first text box that has framed all characters in the text image;

performing affine transformation on the first text box to obtain an affine transformed second text box;

The text in the second text box after the affine transformation is classified and recognized to obtain the text in the text image.

2. The method for character recognition of curved text images according to claim 1, wherein the preprocessing is performed on the text images according to a preset first processing rule to obtain a preprocessed text image, comprising:

Obtain a scaling factor for scaling the text image according to the long side of the text image;

scaling the text image according to the scaling factor to obtain a scaled text image;

Filling the short side of the scaled text image to obtain a square text image.

3. The character recognition method of the curved text image according to claim 1, wherein the feature extraction is performed on the preprocessed text image to obtain a feature pyramid of the text image, comprising:

Performing convolution processing on the preprocessed text image to obtain a multi-layer feature map of the text image;

A feature pyramid of the text image is constructed according to the multi-layer feature map.

4. The character recognition method for curved text images according to claim 1, wherein the image segmentation process is performed on each layer of feature maps in the feature pyramid according to a preset second processing rule to obtain the text Multiple segmentation masks for the image, including:

Input the feature map of each layer in the feature pyramid into a preset fully convolutional neural network model, and obtain a semantic segmentation map of the feature map of each layer in the feature pyramid;

Binarization is performed on the semantic segmentation maps of the feature maps of each layer in the feature pyramid to obtain a plurality of segmentation mask maps of the text image.

5. The method for character recognition of curved text images according to claim 4, wherein the semantic segmentation map of each layer of feature maps in the feature pyramid is binarized respectively to obtain the text image of the text image. Multiple segmentation masks, including:

Normalizing the semantic segmentation map of each layer of feature maps in the feature pyramid to obtain a normalized semantic segmentation map;

Perform binarization processing on the normalized semantic segmentation map according to a preset threshold to obtain a segmentation mask map of the normalized semantic segmentation map;

The segmentation mask image of the normalized semantic segmentation image is stretched to obtain the segmentation mask image of the text image.

6. The method for character recognition of curved text images according to claim 5, characterized in that, according to the breadth-first search algorithm, the segmentation mask image with the smallest text area in the plurality of segmentation mask images is expanded to obtain A first text box that has framed all characters in the text image, including:

According to the top-level feature map and the bottom-level feature map in the feature pyramid, respectively obtain the segmentation mask map with the smallest text area and the segmentation mask map with the largest text area from the multiple segmentation mask maps;

Based on the segmentation mask image with the largest text area, the segmentation mask image with the smallest text area is expanded by using a breadth-first search algorithm to obtain the first text box.

7 . The method for character recognition of curved text images according to claim 1 , wherein the characters in the second text box after the affine transformation are classified and recognized to obtain the characters in the text image. 8 . ,include:

Inputting the second text box after the affine transformation into a preset bidirectional recurrent neural network model to obtain the feature vector sequence of the text image;

The feature vector sequence of the text image is input into a preset classifier for classification and recognition, and the text in the text image is obtained.

8. A character recognition device for curved text images, comprising:

a preprocessing unit, configured to preprocess the text image according to a preset first processing rule if a text image input by the user is received, to obtain a preprocessed text image;

a feature extraction unit, configured to perform feature extraction on the preprocessed text image to obtain a feature pyramid of the text image;

A segmentation unit, configured to perform image segmentation processing on each layer of feature maps in the feature pyramid according to a preset second processing rule, to obtain multiple segmentation mask maps of the text image;

The first expansion unit is used to expand the segmentation mask image with the smallest text area in the multiple segmentation mask images according to the breadth-first search algorithm, so as to obtain a first text box that has framed all characters in the text image;

an affine transformation unit, configured to perform affine transformation on the first text frame to obtain an affine transformed second text frame;

The recognition unit is used for classifying and recognizing the text in the second text box after the affine transformation to obtain the text in the text image.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the computer program as claimed in the claims The character recognition method for a curved text image according to any one of 1 to 7.

10. A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, the computer program, when executed by a processor, causes the processor to execute any one of claims 1 to 7 The character recognition method of the curved text image described in item.