CN112364873A - Character recognition method and device for curved text image and computer equipment - Google Patents

Character recognition method and device for curved text image and computer equipment Download PDF

Info

Publication number
CN112364873A
CN112364873A CN202011312589.0A CN202011312589A CN112364873A CN 112364873 A CN112364873 A CN 112364873A CN 202011312589 A CN202011312589 A CN 202011312589A CN 112364873 A CN112364873 A CN 112364873A
Authority
CN
China
Prior art keywords
text
text image
image
feature
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011312589.0A
Other languages
Chinese (zh)
Inventor
朱锦祥
臧磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202011312589.0A priority Critical patent/CN112364873A/en
Publication of CN112364873A publication Critical patent/CN112364873A/en
Priority to PCT/CN2021/125006 priority patent/WO2022105521A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a character recognition method, a character recognition device and computer equipment for a bent text image, wherein the method comprises the following steps: if a text image input by a user is received, preprocessing the text image to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the minimum text area according to a breadth-first search algorithm to obtain a first text box capable of framing all characters in the text image; carrying out affine transformation on the text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after affine transformation to obtain the characters in the text image. The method is based on the OCR recognition technology, and not only can accurately frame the text in the image, but also improves the accuracy of character recognition.

Description

Character recognition method and device for curved text image and computer equipment
Technical Field
The invention belongs to the technical field of artificial intelligence character recognition, and particularly relates to a character recognition method and device for a curved text image and computer equipment.
Background
OCR technology refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer word using a character recognition method. In the prior art, when an image in a natural scene is subjected to character recognition, because a text in the image is usually designed into a curved shape, the text in the image cannot be framed through a rectangular bounding box in the existing OCR technology, and the accuracy of character recognition is seriously affected.
Disclosure of Invention
The embodiment of the invention provides a character recognition method and device of a curved text image and computer equipment, and aims to solve the problem that the accuracy of character recognition in the text image is low because a curved text in the existing text image cannot be framed through a rectangular bounding box.
In a first aspect, an embodiment of the present invention provides a method for recognizing characters in a curved text image, including:
if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image;
performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image;
performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image;
expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image;
carrying out affine transformation on the first text box to obtain a second text box after affine transformation;
and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.
In a second aspect, an embodiment of the present invention provides a text recognition apparatus for bending a text image, including:
the preprocessing unit is used for preprocessing the text image according to a preset first processing rule if the text image input by a user is received, so as to obtain a preprocessed text image;
the feature extraction unit is used for extracting features of the preprocessed text image to obtain a feature pyramid of the text image;
the segmentation unit is used for carrying out image segmentation processing on each layer of feature map in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask maps of the text image;
the first expansion unit is used for expanding the segmentation mask image with the smallest text area in the segmentation mask images according to the breadth-first search algorithm to obtain a first text box which frames all characters in the text image;
the affine transformation unit is used for carrying out affine transformation on the first text box to obtain a second text box after affine transformation;
and the identification unit is used for classifying and identifying the characters in the second text box after affine transformation to obtain the characters in the text image.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for recognizing words in a curved text image according to the first aspect when executing the computer program.
In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the method for recognizing words in a curved text image according to the first aspect.
The embodiment of the invention provides a character recognition method, a character recognition device and computer equipment for a curved text image, wherein the method comprises the following steps: if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image. By the method, not only can the text in the image be accurately framed, but also the accuracy of character recognition in the text image is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a text recognition method for a curved text image according to an embodiment of the present invention;
FIG. 2 is a sub-flow diagram of a text recognition method for a curved text image according to an embodiment of the present invention;
FIG. 3 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;
FIG. 4 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;
FIG. 5 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;
FIG. 6 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;
FIG. 7 is a schematic view of another sub-flow of a text recognition method for a curved text image according to an embodiment of the present invention;
FIG. 8 is a schematic block diagram of a text recognition apparatus for warping a text image according to an embodiment of the present invention;
FIG. 9 is a block diagram of a sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;
FIG. 10 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;
FIG. 11 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;
FIG. 12 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;
FIG. 13 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;
FIG. 14 is a schematic block diagram of another sub-unit of a text recognition apparatus for warping a text image according to an embodiment of the present invention;
FIG. 15 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a text recognition method for a curved text image according to an embodiment of the present invention. The character recognition method of the bent text image is built and operated in a server, and after the server receives that a text sent by intelligent terminal equipment such as a laptop, a tablet personal computer and the like is the bent text image, the text image is preprocessed and then is subjected to feature extraction to obtain a feature pyramid of the text image; then, carrying out image segmentation processing on each layer of feature images in the feature pyramid to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the minimum text area according to a breadth-first search algorithm to obtain a text box capable of framing the text in the text image; the text recognition is carried out after the text box is affine transformed, and then the characters in the bent text image can be recognized. The following describes the text recognition method of the curved text image in detail.
As shown in fig. 1, the method includes the following steps S110 to S160.
S110, if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image.
And if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image. Specifically, the text in the text image is a bent and deformed text, and the first processing rule is rule information for preprocessing the text image so that the text image is a square text image. And preprocessing the text image according to the first processing rule, so that after the text image is preprocessed, texts in the text image are more beneficial to text detection and recognition. In the embodiment of the invention, after a text image input by a user is received, the text image is read in an RGB mode, and then the text image is subjected to stretching processing according to the first processing rule, so that a square text image is obtained.
In another embodiment, as shown in fig. 2, step S110 includes sub-steps S111, S112 and S113.
And S111, obtaining a zooming factor for zooming the text image according to the long edge of the text image.
And obtaining a zooming factor for zooming the text image according to the long edge of the text image. Specifically, after receiving the text image input by the user, the terminal device reads the text image in an RGB mode, obtains the width and height of the text image, obtains the long edge of the text image from the width and height of the text image, and then scales the long edge of the text image in a certain proportion to obtain the scaling factor.
And S112, zooming the text image according to the zooming factor to obtain a zoomed text image.
And zooming the text image according to the zooming factor to obtain a zoomed text image. Specifically, after the scaling factor is obtained through the long edge of the text image, the width and the height of the text image are divided by the scaling factor respectively, so that the text image is scaled, and the scaled text image is obtained.
And S113, filling the short edges of the zoomed text image to obtain a square text image.
And filling the short side of the zoomed text image to obtain a square text image. Specifically, after the text image is scaled according to the scaling factor, the shape of the scaled text image is the same as the shape of the text image, and although only the size is changed and the text detection and recognition are easy to perform, the subsequent text detection is still affected to some extent, so that the short side of the scaled text image needs to be filled correspondingly. Since the terminal device reads the text image in the RGB model, when filling the scaled text image, only the RGB color (0,0,0) is needed to fill the short side of the scaled text image, and the square text image is obtained.
And S120, performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image.
And performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image. Specifically, in the process of extracting the features of the preprocessed text image, different convolution kernels are sequentially adopted to perform convolution from bottom to top to obtain a multilayer feature map of the text image, then the feature map of the top layer in the multilayer feature map of the text image is sampled to construct the top layer of the feature pyramid, sampling is performed from top to bottom on the basis of the top layer of the feature pyramid, meanwhile, corresponding feature maps in the multilayer feature map are transversely connected, and then the feature pyramid of the text image is constructed.
In another embodiment, as shown in fig. 3, step S120 includes sub-steps S121 and S122.
And S121, performing convolution processing on the preprocessed text image to obtain a multilayer characteristic diagram of the text image.
And performing convolution processing on the preprocessed text image to obtain a multilayer characteristic diagram of the text image. Specifically, the number of channels of each layer of feature map in the multilayer feature map from bottom to top gradually increases, the size gradually decreases, the features extracted from each layer are sent to the next layer as input, that is, the multilayer feature map is composed of feature maps of different convolution stages after the preprocessed text image is input into the convolutional neural network, the richness of semantic information of the multilayer feature map from bottom to top is gradually enhanced, and the resolution is gradually reduced. The semantic information in the feature map at the bottommost layer in the multilayer feature map is least, the resolution is highest, and the method is not suitable for detecting small targets; the top-most feature map in the multi-layer feature maps has the most abundant semantics and the lowest resolution, and is not suitable for detecting a large target. For example, when the convolution process of the convolutional neural network comprises four stages, conv1, conv2, conv3 and conv4, the feature map of the last layer of the four stages, conv1, conv2, conv3 and conv4, is extracted, so that a multi-layer feature map of the text image can be obtained.
And S122, constructing a feature pyramid of the text image according to the multilayer feature graph.
And constructing a characteristic gold tower of the text image according to the multilayer characteristic diagram. Specifically, firstly, feature maps of the top layer in the multi-layer feature maps of the text image are sampled to construct the top layer of the feature pyramid, then, based on the top-down process of the top layer of the feature pyramid, the more abstract and higher semantic features are sampled, and then the features are transversely connected to the features of the previous layer, so that the high-layer features are enhanced, the feature map used for predicting each layer is fused with the features with different resolutions and different semantic strengths, the detection of objects with corresponding resolutions can be completed, and each layer is ensured to have proper resolution and strong semantic features.
S130, performing image segmentation processing on each layer of feature map in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask maps of the text image.
And carrying out image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image. Specifically, the second processing rule is rule information used for performing image segmentation processing on each layer of feature map in the feature pyramid to obtain a segmentation mask map corresponding to each layer of feature map, that is, each layer of feature map in the feature pyramid corresponds to one segmentation mask map of the text image.
In another embodiment, as shown in fig. 4, step S130 includes sub-steps S131 and S132.
S131, inputting each layer of feature graph in the feature pyramid into a preset full convolution neural network model to obtain a semantic segmentation graph of each layer of feature graph in the feature pyramid.
And inputting each layer of feature graph in the feature pyramid into a preset full convolution neural network model to obtain a semantic segmentation graph of each layer of feature graph in the feature pyramid. Specifically, the full convolution neural network model is trained in advance and is used for performing pixel-level classification processing on each layer of feature map in the feature pyramid, so as to distinguish a background region from a text region in each layer of feature map in the feature pyramid. The full convolution neural network model adopts an online sampling strategy OHEM difficult to excavate in the training process. The core idea of the OHEM algorithm is to screen difficult cases according to the loss of input samples, and then apply the screened samples to random gradient descent training to complete the training of the full convolution neural network model.
S132, performing binarization processing on the semantic segmentation maps of each layer of feature map in the feature pyramid respectively to obtain a plurality of segmentation mask maps of the text image.
And respectively carrying out binarization processing on the semantic segmentation maps of each layer of feature map in the feature pyramid to obtain a plurality of segmentation mask maps of the text image. Specifically, the binarization processing is used for performing pixel processing on the semantic segmentation map so that only a background region and a text region are in the text image. The specific binarization processing process comprises the following steps: respectively carrying out sigmoid processing on the semantic segmentation graphs of each layer of the feature graph in the feature pyramid, enabling the value of a pixel point of the semantic segmentation graph of each layer of the feature graph in the feature pyramid to be between 0 and 1, and then realizing binarization of the pixel point by adding a preset threshold parameter, namely when the value of the pixel point is lower than or higher than 0.5, taking the pixel point as 0 or 1.
In another embodiment, as shown in fig. 5, step S132 includes sub-steps S1321, S1322, and S1323.
S1321, normalizing the semantic segmentation graph of each layer of feature graph in the feature pyramid to obtain the normalized semantic segmentation graph.
And normalizing the semantic segmentation graph of each layer of feature graph in the feature pyramid to obtain the normalized semantic segmentation graph. Specifically, pixel points of the semantic segmentation maps of each layer of the feature map in the feature pyramid are normalized by adopting a sigmoid function, so that the value of the pixel points of the semantic segmentation maps of each layer of the feature map in the feature pyramid is between 0 and 1.
And S1322, performing binarization processing on the semantic segmentation image after the normalization processing according to a preset threshold value to obtain a segmentation mask image of the semantic segmentation image after the normalization processing.
And carrying out binarization processing on the semantic segmentation image after the normalization processing according to a preset threshold value to obtain a segmentation mask image of the semantic segmentation image after the normalization processing. Specifically, after normalization processing is carried out on the semantic segmentation graph, the value of a pixel point of the semantic segmentation graph is between 0 and 1. Since there may be a plurality of pixel point values in the background region of the text image, a preset threshold needs to be set to perform binarization distinguishing between the background region and the text region of the text image. In the embodiment of the present invention, a preset threshold is set to be 0.5, and a region in the semantic segmentation map after the normalization processing, in which the pixel point value is higher than 0.5, is taken as a text region, the value of which is 1, and a region lower than 0.5 is taken as a background region, and the value of which is 0.
S1323, performing expansion and contraction processing on the segmentation mask image of the semantic segmentation image after the normalization processing to obtain the segmentation mask image of the text image.
And performing expansion and contraction processing on the segmentation mask image of the semantic segmentation image after the normalization processing to obtain the segmentation mask image of the text image. Specifically, after a corresponding segmentation mask image is obtained through each layer of feature images in the feature pyramid, the size of each segmentation mask image of the text image is smaller than that of the preprocessed text image, and therefore, it is necessary to perform scaling processing on each segmentation mask image in the text image, so that the size of each segmentation mask image is equal to that of the preprocessed text image.
And S140, expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image.
And expanding the segmentation mask image with the minimum text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image. Specifically, the breadth-first search algorithm is a blind search method, which aims to systematically expand and check all nodes in the graph to find the result, i.e., the breadth-first search algorithm does not consider the possible positions of the result, and thoroughly searches the whole graph until the result is found. And the segmentation mask image with the smallest text area in the segmentation mask images is the feature image at the top layer in the feature pyramid after the binarization processing, and the extension is performed by using a breadth-first search algorithm on the basis of the feature image, namely, the text areas of the feature images are extended by gradually adding more pixels into the text areas until all texts in the text image are covered. There may be conflicting pixels in the unfolding process, and the principle of dealing with conflicts in our practice is that aliased pixels can only be merged by a single kernel on a first come basis.
In another embodiment, as shown in fig. 6, step S140 includes sub-steps S141 and S142.
And S141, respectively acquiring a segmentation mask map with the minimum text region and a segmentation mask map with the maximum text region from the plurality of segmentation mask maps according to the top-layer feature map and the bottom-layer feature map in the feature pyramid.
And respectively acquiring a segmentation mask map with the minimum text region and a segmentation mask map with the maximum text region from the plurality of segmentation mask maps according to the top-layer feature map and the bottom-layer feature map in the feature pyramid. Specifically, since the top-level feature map in the feature pyramid has the smallest size, the richest semantic information, the lowest resolution, the bottom-level feature map has the smallest size, the smallest semantic information, and the highest resolution, after each division mask map is subjected to size scaling processing, the text box of the division mask map corresponding to the top-level feature map is the smallest, and the text box of the division mask map corresponding to the bottom-level feature map is the largest, so that the division mask map with the smallest text region can be obtained through the top-level feature map in the feature pyramid, and the division mask map with the largest text region can be obtained through the bottom-level feature map.
And S142, expanding the segmentation mask image with the minimum text region by adopting a breadth-first search algorithm based on the segmentation mask image with the maximum text region to obtain the first text box.
And expanding the segmentation mask map with the minimum text region by adopting a breadth-first search algorithm based on the segmentation mask map with the maximum text region to obtain the first text box. Because the text box of the segmentation mask map corresponding to the bottom layer feature map is the largest, but the semantic information of the segmentation mask map is the smallest, and the accuracy of character recognition through the segmentation mask map is not high, the text box in the segmentation mask map corresponding to the bottom layer feature map is used as a standard box, and the text box in the segmentation mask map corresponding to the top layer feature map is expanded by adopting a breadth-first search algorithm, so that the text box is expanded to be consistent with the size of the text box in the segmentation mask map corresponding to the bottom layer feature map, and the first text box of the text image can be obtained.
S150, carrying out affine transformation on the first text box to obtain a second text box after affine transformation.
And carrying out affine transformation on the first text box to obtain a second text box after affine transformation. Specifically, the first text box is subjected to affine transformation, and rectangular coordinates and minimum external rectangular coordinates are extracted by adopting a boudingRect function and a minAreaRect function of opencv based on a character connected domain respectively. For a rectangle with the width being more than or equal to 3 times of the height, the corresponding minimum external rectangle is converted into the rectangle through the radiation conversion module of opencv, and text recognition is easy to perform on a text region after affine conversion.
And S160, classifying and identifying the characters in the second text box after affine transformation to obtain the characters in the text image.
And classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image. In the implementation of the invention, the characters in the text image can be obtained by inputting the text box after affine transformation into a pre-trained recurrent neural network for classification and recognition.
In another embodiment, as shown in fig. 7, step S160 includes sub-steps S161 and S162.
S161, inputting the affine-transformed second text box into a preset bidirectional recurrent neural network model to obtain a feature vector sequence of the text image.
And inputting the affine-transformed second text box into a preset bidirectional recurrent neural network model to obtain a feature vector sequence of the text image. Specifically, the bidirectional recurrent neural network model is a pre-trained model composed of two reverse recurrent neural networks, a feature vector sequence with context information is obtained by inputting the affine-transformed second text box into the bidirectional recurrent neural network model, and then the feature vector sequence is classified and identified, so that characters in the text image can be obtained.
And S162, inputting the characteristic vector sequence of the text image into a preset classifier for classification and identification to obtain characters in the text image.
And inputting the characteristic vector sequence of the text image into a preset classifier for classification and identification to obtain characters in the text image. Specifically, the classifier is used for classifying and identifying the feature vector sequence of the text image, and then predicting and identifying the characters in the text image. In the embodiment of the invention, a BP neural network with three layers is used as a classifier and a hyperbolic tangent function is used as an activation function, and the calculation process is as follows:
hidij=tanh(w11×hj+w12×st-1+b)
eij=hidij×w21
wherein h isidijIs i time to hjHidden state of the circulating neural unit at the time of evaluation, eijTo score, w11And w12Is the second layer weight of BP neural network with three layers, b is the bias term, w21Is the third layer weight of the BP neural network with three layers.
In the method for recognizing characters of a curved text image provided by the embodiment of the invention, if a text image input by a user is received, the text image is preprocessed according to a preset first processing rule to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image. By the method, not only can the text in the image be accurately framed, but also the accuracy of character recognition in the text image is improved.
The embodiment of the invention also provides a character recognition device 100 of the curved text image, which is used for executing any embodiment of the character recognition method of the curved text image. Specifically, referring to fig. 8, fig. 8 is a schematic block diagram of a text recognition apparatus 100 for a warped text image according to an embodiment of the present invention.
As shown in fig. 8, the text image character recognition apparatus 100 includes a preprocessing unit 110, a feature extraction unit 120, a segmentation unit 130, a first expansion unit 140, an affine transformation unit 150, and a recognition unit 160.
The preprocessing unit 110 is configured to, if a text image input by a user is received, preprocess the text image according to a preset first processing rule to obtain a preprocessed text image.
In other inventive embodiments, as shown in fig. 9, the preprocessing unit 110 includes: a first obtaining unit 111, a first scaling unit 112 and a filling unit 113.
A first obtaining unit 111, configured to obtain a scaling factor for scaling the text image according to a long edge of the text image.
The scaling unit 112 is configured to scale the text image according to the scaling factor to obtain a scaled text image.
A filling unit 113, configured to fill a short edge of the scaled text image to obtain a square text image.
A feature extraction unit 120, configured to perform feature extraction on the preprocessed text image to obtain a feature pyramid of the text image.
In another embodiment of the present invention, as shown in fig. 10, the feature extraction unit 120 includes: a convolution unit 121 and a construction unit 122.
And a convolution unit 121, configured to perform convolution processing on the preprocessed text image to obtain a multilayer feature map of the text image.
And the constructing unit 122 is configured to construct a feature pyramid of the text image according to the multilayer feature map.
And the segmentation unit 130 is configured to perform image segmentation processing on each layer of feature map in the feature pyramid according to a preset second processing rule, so as to obtain multiple segmentation mask maps of the text image.
In another embodiment of the present invention, as shown in fig. 11, the dividing unit 130 includes: a classification unit 131 and a processing unit 132.
The classifying unit 131 is configured to input each layer of feature map in the feature pyramid into a preset full convolution neural network model, so as to obtain a semantic segmentation map of each layer of feature map in the feature pyramid.
The processing unit 132 is configured to perform binarization processing on the semantic segmentation maps of each layer of feature map in the feature pyramid, respectively, to obtain a plurality of segmentation mask maps of the text image.
In another embodiment of the present invention, as shown in fig. 12, the processing unit 132 includes: a normalization processing unit 1321, a binarization processing unit 1322, and a scaling unit 1323.
The normalization processing unit 1321 is configured to perform normalization processing on the semantic segmentation map of each layer of the feature map in the feature pyramid to obtain a semantic segmentation map after the normalization processing.
A binarization processing unit 1322 is configured to perform binarization processing on the semantic segmentation map after the normalization processing according to a preset threshold value, so as to obtain a segmentation mask map of the semantic segmentation map after the normalization processing.
And a scaling unit 1323, configured to scale the segmentation mask map of the semantic segmentation map after the normalization processing to obtain the segmentation mask map of the text image.
The first expanding unit 140 is configured to expand the segmentation mask map with the smallest text area in the multiple segmentation mask maps according to a breadth-first search algorithm, so as to obtain a first text box in which all characters in the text image are framed.
In another embodiment of the present invention, as shown in fig. 13, the first extension unit 140 includes: a second acquisition unit 141 and a second expansion unit 142.
The second obtaining unit 141 is configured to obtain a segmentation mask map with a smallest text region and a segmentation mask map with a largest text region from the plurality of segmentation mask maps according to the top-level feature map and the bottom-level feature map in the feature pyramid.
A second expanding unit 142, configured to expand the segmentation mask map with the minimum text region by using a breadth-first search algorithm based on the segmentation mask map with the maximum text region, so as to obtain the first text box.
And the affine transformation unit 150 is configured to perform affine transformation on the first text box to obtain a second text box after affine transformation.
And the identifying unit 160 is configured to perform classification and identification on the characters in the second text box after affine transformation, so as to obtain the characters in the text image.
In another embodiment of the present invention, as shown in fig. 14, the identifying unit 160 includes: an input unit 161 and a classification recognition unit 162.
The input unit 161 is configured to input the affine-transformed second text box into a preset bidirectional recurrent neural network model, so as to obtain a feature vector sequence of the text image.
And the classification and identification unit 162 is configured to input the feature vector sequence of the text image into a preset classifier for classification and identification, so as to obtain characters in the text image.
The character recognition device 100 for a curved text image according to the embodiment of the present invention is configured to perform preprocessing on a text image according to a preset first processing rule if the text image input by a user is received, so as to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.
Referring to fig. 15, fig. 15 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Referring to fig. 15, the device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a method of text recognition of a warped text image.
The processor 502 is used to provide computing and control capabilities that support the operation of the overall device 500.
The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 may be caused to execute a character recognition method for bending a text image.
The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 15 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the apparatus 500 to which aspects of the present invention may be applied, and that a particular apparatus 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following functions: if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.
Those skilled in the art will appreciate that the embodiment of the apparatus 500 shown in fig. 15 does not constitute a limitation on the specific construction of the apparatus 500, and in other embodiments, the apparatus 500 may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the apparatus 500 may only include the memory and the processor 502, and in such embodiments, the structure and function of the memory and the processor 502 are the same as those of the embodiment shown in fig. 15, and are not repeated herein.
It should be understood that in the present embodiment, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors 502, a Digital Signal Processor 502 (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general-purpose processor 502 may be a microprocessor 502 or the processor 502 may be any conventional processor 502 or the like.
In another embodiment of the present invention, a computer storage medium is provided. The storage medium may be a non-volatile computer-readable storage medium. The storage medium stores a computer program 5032, wherein the computer program 5032 when executed by the processor 502 performs the steps of: if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image; performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image; performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image; expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image; carrying out affine transformation on the first text box to obtain a second text box after affine transformation; and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a device 500 (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A character recognition method of a curved text image is characterized by comprising the following steps:
if a text image input by a user is received, preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image;
performing feature extraction on the preprocessed text image to obtain a feature pyramid of the text image;
performing image segmentation processing on each layer of feature images in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask images of the text image;
expanding the segmentation mask image with the smallest text area in the segmentation mask images according to a breadth-first search algorithm to obtain a first text box framing all characters in the text image;
carrying out affine transformation on the first text box to obtain a second text box after affine transformation;
and classifying and identifying the characters in the second text box after the affine transformation to obtain the characters in the text image.
2. The method for recognizing words in a curved text image according to claim 1, wherein the preprocessing the text image according to a preset first processing rule to obtain a preprocessed text image comprises:
obtaining a zooming factor for zooming the text image according to the long edge of the text image;
zooming the text image according to the zooming factor to obtain a zoomed text image;
and filling the short side of the zoomed text image to obtain a square text image.
3. The method of claim 1, wherein the extracting features of the preprocessed text image to obtain a feature pyramid of the text image comprises:
performing convolution processing on the preprocessed text image to obtain a multilayer characteristic diagram of the text image;
and constructing a characteristic gold tower of the text image according to the multilayer characteristic diagram.
4. The method for recognizing words in a curved text image according to claim 1, wherein the step of performing image segmentation processing on each layer of feature map in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask maps of the text image comprises:
inputting each layer of feature map in the feature pyramid into a preset full convolution neural network model to obtain a semantic segmentation map of each layer of feature map in the feature pyramid;
and respectively carrying out binarization processing on the semantic segmentation maps of each layer of feature map in the feature pyramid to obtain a plurality of segmentation mask maps of the text image.
5. The method for recognizing words in a curved text image according to claim 4, wherein the obtaining of the plurality of segmentation mask maps of the text image by respectively performing binarization processing on the semantic segmentation maps of each layer of the feature map in the feature pyramid comprises:
normalizing the semantic segmentation graph of each layer of feature graph in the feature pyramid to obtain a normalized semantic segmentation graph;
carrying out binarization processing on the semantic segmentation image after the normalization processing according to a preset threshold value to obtain a segmentation mask image of the semantic segmentation image after the normalization processing;
and performing expansion and contraction processing on the segmentation mask image of the semantic segmentation image after the normalization processing to obtain the segmentation mask image of the text image.
6. The method for recognizing words in a curved text image according to claim 5, wherein the expanding the segmentation mask map with the smallest text area in the plurality of segmentation mask maps according to the breadth-first search algorithm to obtain a first text box in which all characters in the text image are framed comprises:
respectively acquiring a segmentation mask map with the smallest text region and a segmentation mask map with the largest text region from the plurality of segmentation mask maps according to the top-layer feature map and the bottom-layer feature map in the feature pyramid;
and expanding the segmentation mask map with the minimum text region by adopting a breadth-first search algorithm based on the segmentation mask map with the maximum text region to obtain the first text box.
7. The method for recognizing words in a curved text image according to claim 1, wherein the classifying and recognizing the words in the affine-transformed second text box to obtain the words in the text image comprises:
inputting the second text box after affine transformation into a preset bidirectional recurrent neural network model to obtain a feature vector sequence of the text image;
and inputting the characteristic vector sequence of the text image into a preset classifier for classification and identification to obtain characters in the text image.
8. A character recognition apparatus for a curved text image, comprising:
the preprocessing unit is used for preprocessing the text image according to a preset first processing rule if the text image input by a user is received, so as to obtain a preprocessed text image;
the feature extraction unit is used for extracting features of the preprocessed text image to obtain a feature pyramid of the text image;
the segmentation unit is used for carrying out image segmentation processing on each layer of feature map in the feature pyramid according to a preset second processing rule to obtain a plurality of segmentation mask maps of the text image;
the first expansion unit is used for expanding the segmentation mask image with the smallest text area in the segmentation mask images according to the breadth-first search algorithm to obtain a first text box which frames all characters in the text image;
the affine transformation unit is used for carrying out affine transformation on the first text box to obtain a second text box after affine transformation;
and the identification unit is used for classifying and identifying the characters in the second text box after affine transformation to obtain the characters in the text image.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a method of word recognition of a curved text image according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the method of character recognition of a curved text image as claimed in any one of claims 1 to 7.
CN202011312589.0A 2020-11-20 2020-11-20 Character recognition method and device for curved text image and computer equipment Pending CN112364873A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011312589.0A CN112364873A (en) 2020-11-20 2020-11-20 Character recognition method and device for curved text image and computer equipment
PCT/CN2021/125006 WO2022105521A1 (en) 2020-11-20 2021-10-20 Character recognition method and apparatus for curved text image, and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011312589.0A CN112364873A (en) 2020-11-20 2020-11-20 Character recognition method and device for curved text image and computer equipment

Publications (1)

Publication Number Publication Date
CN112364873A true CN112364873A (en) 2021-02-12

Family

ID=74533088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011312589.0A Pending CN112364873A (en) 2020-11-20 2020-11-20 Character recognition method and device for curved text image and computer equipment

Country Status (2)

Country Link
CN (1) CN112364873A (en)
WO (1) WO2022105521A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733822A (en) * 2021-03-31 2021-04-30 上海旻浦科技有限公司 End-to-end text detection and identification method
CN113033543A (en) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 Curved text recognition method, device, equipment and medium
CN114187603A (en) * 2021-11-09 2022-03-15 北京百度网讯科技有限公司 Image processing method and device, electronic equipment and storage medium
CN114418001A (en) * 2022-01-20 2022-04-29 北方工业大学 Character recognition method and system based on parameter reconstruction network
WO2022105521A1 (en) * 2020-11-20 2022-05-27 深圳壹账通智能科技有限公司 Character recognition method and apparatus for curved text image, and computer device
WO2023109086A1 (en) * 2021-12-15 2023-06-22 深圳前海微众银行股份有限公司 Character recognition method, apparatus and device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681962A (en) * 2023-05-05 2023-09-01 江苏宏源电气有限责任公司 Power equipment thermal image detection method and system based on improved YOLOv5

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160352A (en) * 2019-12-27 2020-05-15 创新奇智(北京)科技有限公司 Workpiece metal surface character recognition method and system based on image segmentation
US20200250459A1 (en) * 2019-01-11 2020-08-06 Capital One Services, Llc Systems and methods for text localization and recognition in an image of a document
CN111553351A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Semantic segmentation based text detection method for arbitrary scene shape

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476067B (en) * 2019-01-23 2023-04-07 腾讯科技(深圳)有限公司 Character recognition method and device for image, electronic equipment and readable storage medium
CN110598690B (en) * 2019-08-01 2023-04-28 达而观信息科技(上海)有限公司 End-to-end optical character detection and recognition method and system
CN111062389A (en) * 2019-12-10 2020-04-24 腾讯科技(深圳)有限公司 Character recognition method and device, computer readable medium and electronic equipment
CN111598055A (en) * 2020-06-19 2020-08-28 上海眼控科技股份有限公司 Text detection method and device, computer equipment and storage medium
CN112364873A (en) * 2020-11-20 2021-02-12 深圳壹账通智能科技有限公司 Character recognition method and device for curved text image and computer equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250459A1 (en) * 2019-01-11 2020-08-06 Capital One Services, Llc Systems and methods for text localization and recognition in an image of a document
CN111160352A (en) * 2019-12-27 2020-05-15 创新奇智(北京)科技有限公司 Workpiece metal surface character recognition method and system based on image segmentation
CN111553351A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Semantic segmentation based text detection method for arbitrary scene shape

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王涛;江加和;: "基于语义分割技术的任意方向文字识别", 应用科技, no. 03, 4 July 2017 (2017-07-04) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022105521A1 (en) * 2020-11-20 2022-05-27 深圳壹账通智能科技有限公司 Character recognition method and apparatus for curved text image, and computer device
CN112733822A (en) * 2021-03-31 2021-04-30 上海旻浦科技有限公司 End-to-end text detection and identification method
CN112733822B (en) * 2021-03-31 2021-07-27 上海旻浦科技有限公司 End-to-end text detection and identification method
CN113033543A (en) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 Curved text recognition method, device, equipment and medium
CN113033543B (en) * 2021-04-27 2024-04-05 中国平安人寿保险股份有限公司 Curve text recognition method, device, equipment and medium
CN114187603A (en) * 2021-11-09 2022-03-15 北京百度网讯科技有限公司 Image processing method and device, electronic equipment and storage medium
WO2023109086A1 (en) * 2021-12-15 2023-06-22 深圳前海微众银行股份有限公司 Character recognition method, apparatus and device, and storage medium
CN114418001A (en) * 2022-01-20 2022-04-29 北方工业大学 Character recognition method and system based on parameter reconstruction network

Also Published As

Publication number Publication date
WO2022105521A1 (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN112364873A (en) Character recognition method and device for curved text image and computer equipment
CN110097051B (en) Image classification method, apparatus and computer readable storage medium
TWI744283B (en) Method and device for word segmentation
CN111488826A (en) Text recognition method and device, electronic equipment and storage medium
US20080193020A1 (en) Method for Facial Features Detection
US11915500B2 (en) Neural network based scene text recognition
KR20160143494A (en) Saliency information acquisition apparatus and saliency information acquisition method
CN111860309A (en) Face recognition method and system
JP2009211178A (en) Image processing apparatus, image processing method, program and storage medium
CN106372624B (en) Face recognition method and system
CN104036284A (en) Adaboost algorithm based multi-scale pedestrian detection method
CN111401293A (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN112836625A (en) Face living body detection method and device and electronic equipment
CN111414913B (en) Character recognition method, recognition device and electronic equipment
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN112926652A (en) Fish fine-grained image identification method based on deep learning
CN114299383A (en) Remote sensing image target detection method based on integration of density map and attention mechanism
CN112733942A (en) Variable-scale target detection method based on multi-stage feature adaptive fusion
CN111626134A (en) Dense crowd counting method, system and terminal based on hidden density distribution
CN116843971A (en) Method and system for detecting hemerocallis disease target based on self-attention mechanism
CN117994573A (en) Infrared dim target detection method based on superpixel and deformable convolution
CN115424293A (en) Living body detection method, and training method and device of living body detection model
CN115223173A (en) Object identification method and device, electronic equipment and storage medium
CN114445916A (en) Living body detection method, terminal device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination