CN109753971B

CN109753971B - Correction method and device for distorted text lines, character recognition method and device

Info

Publication number: CN109753971B
Application number: CN201711078947.4A
Authority: CN
Inventors: 程孟力; 施兴
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2017-11-06
Filing date: 2017-11-06
Publication date: 2023-04-28
Anticipated expiration: 2037-11-06
Also published as: CN109753971A

Abstract

The invention discloses a correction method and device for distorted text lines, a character recognition method and device, and the correction method for distorted text lines comprises the following steps: receiving a document image to be identified; determining a deformation curve of a distorted text line in the document image; and straightening the distorted text line in the document image by utilizing the deformation curve so as to identify the document image after straightening. The method and the device can at least effectively improve the recognition effect of the document image with distorted text lines.

Description

Correction method and device for distorted text lines, character recognition method and device

Technical Field

The present invention relates to the field of character recognition technologies, and in particular, to a method and apparatus for correcting distorted text lines, and a method and apparatus for character recognition.

Background

In the related art, an optical character recognition (OCR, optical Character Recognition) engine can obtain a better recognition effect on a character line similar to a straight line, usually, the recognition is performed by a method based on lstm+ctc or lstm+seq2Seq, so that the method has better robustness and can well cope with quality problems such as exposure, blurring, degradation and the like of an image. However, for distorted text lines, such as text in ring marks, curved documents, etc., related art OCR recognition engines generally have low recognition accuracy and cannot achieve good recognition results.

Disclosure of Invention

The present application aims to solve at least one of the technical problems in the related art.

The application provides a correction method and device for distorted text lines, and a character recognition method and device, which can at least effectively improve the recognition effect of document images with distorted text lines.

The application adopts the following technical scheme:

a method of correcting distorted text lines, comprising:

receiving a document image to be identified;

determining a deformation curve of a distorted text line in the document image;

and straightening the distorted text line in the document image by utilizing the deformation curve so as to identify the document image after straightening.

The determining a deformation curve of the distorted text line in the document image comprises the following steps: and estimating parameters of deformation curves of distorted text lines in the document image by using a convolutional neural network CNN and a bidirectional long-short-term memory neural network BLSTM.

Before determining the deformation curve of the distorted text line in the document image, the method further comprises the following steps: establishing a calculation model comprising a visual geometry group VGG convolutional neural network, a BLSTM layer and a long-short-term memory neural network LSTM layer, and training the calculation model; the determining the deformation curve of the distorted text line in the document image comprises the following steps: and estimating parameters of deformation curves of distorted text lines in the document image by using the calculation model.

The estimating, by using the computing model, parameters of a deformation curve of a distorted text line in the document image includes: and after the document image is scaled, inputting the document image into the calculation model for forward calculation, and obtaining the position parameters of the control points of the deformation curve of the distorted text line in the document image.

Wherein the training the computing model comprises: inputting sample data into the calculation model, and training by using a random gradient descent method SGD; the sample data comprises a document image and a target value of a control point position parameter of a deformation curve of a distorted text line of the document image.

Wherein the training the computing model comprises: minimizing the difference between the predicted value and the target value of the control point position parameter of the deformation curve by performing the following procedure a plurality of times: after scaling the document image to a preset size, inputting the calculation model, extracting a feature map by a VGG convolutional neural network of the calculation model, and outputting the feature map to the BLSTM layer; after the BLSTM layer processes the feature map, outputting the feature map to the LSTM layer for decoding to obtain a predicted value of a control point position parameter of a deformation curve of a distorted text line in the document image and outputting the predicted value to a smoothL1Loss layer; and the SmoothL1Loss layer calculates the difference between the predicted value of the control point position parameter of the deformation curve and the target value in the sample data.

The determining a deformation curve of the distorted text line in the document image comprises the following steps: estimating the position parameters of control points of deformation curves of distorted text lines in the document image; and obtaining the position parameters of points on the deformation curve of the distorted text line in the document image based on the position parameters of the control points.

The determining a deformation curve of the distorted text line in the document image comprises the following steps: and estimating the position parameters of points on the deformation curve of the distorted text line in the document image by binarizing the document image.

The straightening processing of the distorted text line in the document image by using the deformation curve comprises the following steps: and discretizing the deformation curve into line segments according to a preset pixel scale, forming the line segments into polygons, and mapping the polygons into quadrangles of horizontal text lines by using anti-radiation conversion.

Wherein the deformation curve is a B-spline curve.

A character recognition method, comprising:

receiving a document image to be identified;

straightening the distorted text lines in the document image by utilizing deformation curves of the distorted text lines in the document image;

And recognizing the document image after the straightening processing.

Wherein the method further comprises: establishing a calculation model comprising a visual geometry group VGG convolutional neural network, a bidirectional long short time memory neural network BLSTM layer and a long short time memory neural network LSTM layer, and training the calculation model; and estimating parameters of deformation curves of distorted text lines in the document image by using the calculation model.

The straightening processing of the distorted text line in the document image by using the deformation curve of the distorted text line in the document image comprises the following steps: and discretizing the deformation curve into line segments according to a preset pixel scale, forming the line segments into polygons, and mapping the polygons into quadrangles of horizontal text lines by using anti-radiation conversion.

Wherein the deformation curve is a B-spline curve.

An orthotic device for twisting rows of text, comprising:

the determining module is used for receiving the document image to be identified and determining a deformation curve of a distorted text line in the document image;

and the straightening module is used for straightening the distorted text lines in the document image by utilizing the deformation curve so as to identify the document image after the straightening.

Wherein, still include: the training module is used for establishing a calculation model comprising a visual geometry group VGG convolutional neural network, a bidirectional long-short-time memory neural network BLSTM layer and a long-short-time memory neural network LSTM layer, and training the calculation model; the determining module is specifically configured to estimate parameters of a deformation curve of a distorted text line in the document image by using the computing model.

An orthotic device for twisting rows of text, comprising: a memory storing a distorted text line correction program; and the processor is configured to read the distorted word line correction program to execute the operation of the distorted word line correction method.

A character recognition apparatus comprising: a memory storing a character recognition program; and a processor configured to read the character recognition program to perform operations such as the character recognition method described above.

A computer storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of the above method for correcting distorted text lines.

A computer storage medium having stored thereon a computer program which when executed by a processor performs the steps of the character recognition method described above.

The application comprises the following advantages:

according to the method and the device, the deformation curve of the distorted text line in the document image is determined, and the distorted text line is straightened by using the deformation curve, so that the distorted text line in the document image is effectively corrected, and the recognition accuracy of the document image is improved.

Of course, it is not necessary for any of the products of the present application to be practiced to achieve all of the advantages described above at the same time.

Drawings

FIG. 1 is a schematic diagram of an exemplary architecture of an OCR system applicable to the present application;

FIG. 2 is a flow chart of a method for correcting distorted text lines according to the first embodiment;

FIG. 3a is an exemplary graph of a B-spline curve in accordance with the first embodiment;

FIG. 3B is a graph illustrating an example B-spline curve formed when a B-spline is fitted to a text center line in a document image according to the first embodiment;

FIG. 4 is a diagram showing an example of the structure of a neural network for calculating a model in the first embodiment;

FIG. 5a is an exemplary original graph prior to binarization;

FIG. 5b is a diagram showing the result of binarization of FIG. 5 a;

FIG. 5c is a schematic diagram of the result after the closing operation of FIG. 5 b;

FIG. 5d is a schematic diagram showing the result of the gradient operation of FIG. 5 c;

FIG. 5e is a graph showing the result of fitting an edge line to FIG. 5 d;

FIG. 6 is an exemplary diagram of a straightening process;

FIG. 7 is a schematic diagram of a distorted word line correction device according to the first embodiment;

FIG. 8 is a schematic diagram of another device for correcting distorted text lines according to the first embodiment;

FIG. 9 is a diagram showing an example of the processing effect of the first embodiment;

FIG. 10 is a flowchart of a two-character recognition method according to an embodiment;

FIG. 11 is a schematic diagram of a two-character recognition device according to an embodiment;

FIG. 12 is an exemplary diagram of an application scenario of example 1;

fig. 13 is an application scenario illustration of example 2.

Detailed Description

The technical scheme of the present application will be described in more detail with reference to the accompanying drawings and examples.

It should be noted that, if not conflicting, the embodiments of the present application and the features of the embodiments may be combined with each other, which are all within the protection scope of the present application. In addition, while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in a different order than is shown.

In one typical configuration, a computing device of a client or server may include one or more processors (CPUs), input/output interfaces, network interfaces, and memory (memory).

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media. The memory may include module 1, module 2, … …, module N (N is an integer greater than 2).

Computer readable media include both non-transitory and non-transitory, removable and non-removable storage media. The storage medium may implement information storage by any method or technique. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.

In the related art, an OCR engine generally includes two parts: the text detection and the text recognition are realized by applying the deep learning in consideration of the fact that the deep learning makes a relatively large breakthrough in the image detection and recognition. Text detection is typically implemented using a similar framework based on a master-rcnn (master region-based convolutional neural networks), and text recognition is typically implemented using a long and short memory recurrent neural network + time domain connection classifier (LSTM + CTC). The introduction of deep learning enables the OCR engine to well cope with various image quality problems, so that the recognition accuracy is also greatly improved. However, the OCR engine in the related art is only suitable for the character line of the approximate level, and the character line is not on the same level, and the character line needs to be corrected before being sent to the OCR engine for recognition.

Most pictures typically introduce angle and viewing angle variations when taken. For this type of picture, in the deformation caused by rotation or viewing angle transformation, the point on a straight line is usually still on a straight line, so that the point can be sent to an OCR engine for recognition after inclination correction or viewing angle transformation. Tilt correction is typically a binary image of a document that is then generated, then the angle of the document is estimated using a Hough transform, and finally the correction is performed by a rotational transform. The view transformation is usually to perform Hough transformation on different areas on a binary image of a document, find some straight lines, estimate a transformation matrix by the straight lines, and finally perform inverse transformation on the image according to the transformation matrix.

However, for distorted text lines, such as text in a ring sign, curved documents, etc., these distortions are usually not introduced by the angle of the shot, but rather due to the distortion of the original picture, the distorted text lines in such distortion are usually not on the same line and thus cannot be described by perspective transformation, and cannot be recovered by perspective transformation. In other words, distorted text lines cannot be corrected by tilt correction or perspective transformation.

Because the methods such as tilt correction or visual angle transformation are not suitable for distorted text lines, the distorted text lines cannot be corrected by the tilt correction or visual angle transformation, and the recognition accuracy of recognizing the distorted text lines by the OCR recognition engine in the related art is very low, so that a good recognition effect cannot be obtained.

Aiming at the technical problems, the application provides the following technical scheme. The following describes the technical scheme of the present application in detail.

As shown in fig. 1, an exemplary architecture of an OCR system to which the present technical solution is applicable is shown. Wherein, this OCR system includes: the image processing unit and the OCR engine are configured such that the document with the image is input to the image processing unit for image processing (for example, correction, etc.), and then is input to the OCR engine for character recognition.

Example 1

A method for correcting distorted text lines, as shown in fig. 2, may include:

step 201, receiving a document image to be identified;

step 202, determining a deformation curve of a distorted text line in the document image;

and 203, straightening the distorted text line in the document image by using the deformation curve so as to identify the document image after the straightening.

According to the method, the deformation curve is used for describing the character line deformation of the document image, the character lines in the document image are straightened through the deformation curve, and the distorted character lines can be effectively corrected, so that the recognition effect of the document image containing the distorted character lines is improved.

In this embodiment, the deformation curve may be of various types. In one implementation, the deformation curve may be a B-spline (B-spline) curve. In addition, the deformation curve may be of other types, and the specific type of deformation curve is not limited herein.

In this embodiment, a B-spline curve is used to fit the center curve of the text line, i.e., the distortion of the distorted text line is described by B-spline. Here, a B-spline curve is a curve approximately described using a combination of several polynomial functions. The B-spline curve can approximate a curve of any shape with few parameters and has better smoothness. Fig. 3a and 3B are examples of B-spline curves, fig. 3a is an example graph of control points of the B-spline curves, wherein P0-P8 are control points respectively, and fig. 3B is an example graph of the B-spline curves formed when the B-spline is fitted to a text center line in a document image.

For example, the B-spline curve may be composed of an n-degree B-spline basis function as shown in the following formula (1):

wherein S (t) is the position parameter of the point on the B-spline curve, m is the number of control points of the B-spline curve, and P _i Is a control point, N _i,n Is an n-degree B-spline basis function.

Wherein N is _i,n Can be calculated by the Cox-de Boor recursive formula as shown in the following formulas (2) and (3):

Wherein u is _i (1 is more than or equal to i is less than or equal to m) is node, and the node is distributed in [0,1 from small to large]Between, i.e. 0.ltoreq.u ₁ ≤u ₂ ≤u ₃ ≤...≤u _m Less than or equal to 1, half open interval [ u ] _i ,u _i+1 ) Constituting the (i+1) node interval, j (0.ltoreq.j.ltoreq.n) is the order of the B-spline curve.

In this embodiment, there may be various ways to determine the deformation curve of the distorted text line in the document image.

In one implementation, the determining the deformation curve of the distorted text line in the document image may include: parameters of deformation curves of distorted text lines in the document image are estimated by using convolutional neural networks (CNN, convolutional Neural Network) and bidirectional long-short-time memory neural networks (BLSTM). In other words, the deformation curve of the document image can be determined by the CNN+BLSTM, so that the CNN and the BLSTM are combined to be used with good robustness, and parameters of the deformation curve can be estimated conveniently in an end-to-end (end 2 end) training mode.

In this embodiment, when determining the deformation curve of the distorted text line in the document image through cnn+blstm, the method may include: establishing a calculation model comprising a visual geometry group (Visual Geometry Group, VGG) convolutional neural network, a BLSTM layer and a long-short-time memory neural network (LSTM) layer, and training the calculation model; the determining a deformation curve of the distorted text line in the document image may include: and estimating parameters of deformation curves of distorted text lines in the document image by using the calculation model.

In this embodiment, the determining a deformation curve of the distorted text line in the document image may include: estimating the position parameters of control points of deformation curves of distorted text lines in the document image; and obtaining the position parameters of points on the deformation curve of the distorted text line in the document image based on the position parameters of the control points.

Here, estimating parameters of the deformation curve of the distorted text line in the document image using the calculation model may include: and after the document image is scaled, inputting the document image into the calculation model for forward calculation, and obtaining the position parameters of the control points of the deformation curve of the distorted text line in the document image. After the position parameters of the control points of the deformation curve of the distorted text line in the document image are obtained, the position parameters of the points on the deformation curve of the distorted text line in the document image can be obtained based on the position parameters of the control points, so that the deformation curve of the distorted text line in the document image is determined.

In one implementation, the control point position of the B-spline curve of the distorted text line in the document image may be obtained by using the calculation model, and then, the position parameter of each point on the B-spline curve may be calculated by the above formulas (1) to (3), so as to obtain the B-spline curve of the distorted text line in the document image. Here, using a uniform B-spline curve, the number of nodes m and the order n of the curve need to be determined by test in a specific dataset.

It should be noted that, the above-mentioned calculation model of the present embodiment is mainly used for calculating the position parameter P of the control point of the deformation curve of the distorted text line in the document image _i . In other words, the role of the computational model is: the method comprises the steps of taking a document image as input and taking a control point position parameter of a deformation curve of a distorted text line in the document image as output. Specifically, the control point position parameter of the deformation curve of the distorted text line in the document image is taken as output.

In one implementation manner, the control point position parameter output by the calculation model may be a normalized value of the control point position parameter, and the position parameter of the deformation curve control point may be obtained after the normalized value is processed, so as to obtain the position parameter of the point on the deformation curve based on the position parameter of the control point. In other words, estimating parameters of deformation curve of distorted text line in the document image using the calculation model may include: and (3) inputting the document image into the calculation model for forward calculation after scaling to obtain a normalized value of the control point position parameter of the deformation curve of the distorted text line in the document image, obtaining the control point position parameter of the deformation curve based on the normalized value of the control point position parameter of the deformation curve, and calculating the position parameter of each point on the deformation curve based on the control point position parameter of the deformation curve.

In this embodiment, the backbone (backbone) of the computational model uses a VGG convolutional neural network. Considering that text lines are typically longer, to contain more Context information in the lateral direction, the BLSTM layer is added, followed by the LSTM layer for decoding (decoding). In particular, the computational model may include a VGG convolutional neural network, a BLSTM layer, and an LSTM layer. Wherein a BLSTM layer follows the VGG convolutional neural network and an LSTM layer follows the BLSTM layer. In other words, the output of the VGG convolutional neural network is used as the input of the BLSTM layer, the output of the BLSTM layer is used as the input of the LSTM layer, and the LSTM layer outputs the control point position parameters of the deformation curve (such as B spline curve) of the distorted text line in the document image.

Here, the VGG convolutional neural network includes a plurality of convolutional layers (Convolution Layer), each of which is followed by a corresponding Pooling layer.

Here, the BLSTM has both a forward LSTM and a reverse LSTM, which capture more characteristic information than a unidirectional LSTM, and typically performs better than a unidirectional LSTM.

In one implementation, the process of estimating the parameters of the deformation curve of the distorted text line in the document image by using the calculation model may include: after scaling the document image to a preset size, inputting the calculation model, and extracting a feature map by a VGG network of the calculation model; and after the BLSTM layer processes the feature map, outputting the feature map to the LSTM layer for decoding to obtain the position parameters of the control points of the deformation curve of the distorted text line in the document image.

In this embodiment, the process of estimating the parameters of the deformation curve using the calculation model may be referred to as a prediction process. Before the prediction process, training is required so that the computational model has the above-mentioned functions. Here, the training the computing model may include: sample data including target values of control point position parameters of a document image and a deformation curve of a distorted text line thereof are input to the calculation model and trained using a random gradient descent method (Stochastic Gradient Descent, SGD). The sample data may also be referred to herein as training data. In other words, the training of the computing model may include: a batch of training data is prepared in advance, and each piece of training data in the batch of training data comprises: picture comprising document image and corresponding parameter value (i.e. control point position parameter P of deformation curve of distorted text line in picture _i Or normalized value P thereof _i ' 0 is less than or equal to i is less than or equal to n), a model can be trained on the batch of training data through CNN+BLSTM, and the model is the calculation model.

Specifically, for the calculation model of the above structure, the training the calculation model may include: minimizing the difference between the predicted value and the target value of the control point position parameter of the deformation curve by performing the following procedure a plurality of times: after scaling the document image to a preset size, inputting the calculation model, extracting a feature map by a VGG convolutional neural network of the calculation model, and outputting the feature map to the BLSTM layer; after the BLSTM layer processes the feature map, outputting the feature map to the LSTM layer for decoding to obtain a predicted value of a control point position parameter of a deformation curve of a distorted text line in the document image and outputting the predicted value to a smoothL1Loss layer; and the SmoothL1Loss layer calculates the difference between the predicted value of the control point position parameter of the deformation curve and the target value in the sample data. In practice, the calculation model may be trained by a large amount of sample data to minimize the gap between the predicted value and the target value.

The process of determining the B-spline of a document image by CNN+BLSTM in this embodiment is described in detail below with reference to a specific example.

In this example, the process of estimating B-spline parameters using a computational model may be referred to as a prediction process, and the process of training the computational model prior to prediction may be referred to as a training process.

The implementation procedure in this example may include:

firstly, establishing a calculation model;

the computational model architecture established in this example is shown in fig. 4, where each box represents a set of neurons of the same type. VGG convolutional neural networks include five convolutional layers: conv1_1& Conv1_2, conv2_1& Conv2_2, conv3_1& Conv3_2& Conv3_3, conv4_1& Conv4_2& Conv4_3, conv5_1& Conv5_2, and a corresponding pooling layer is connected to each of the first four convolution layers, and the pooling layers are respectively: pool1, pool2, pool3, pool4. A BLSTM layer is connected after the fifth convolution layers conv5_1& conv5_2, and an LSTM layer is connected after the BLSTM layer, and the LSTM layer includes a Decoder (Decoder) for decoding.

In the network architecture shown in fig. 4, a SmoothL1Loss layer is further included after the LSTM layer, where the SmoothL1Loss layer is used to calculate a difference between the predicted value and the target value, so as to complete training of the calculation model. The system also comprises a training data layer (B-Spline Parameter) which is used for providing training data for the calculation model and providing a target value for the SmoothL1Loss layer so that the SmoothL1Loss layer can calculate.

In the network structure shown in fig. 4, the LSTM layer can output the control point position parameters (or the normalized values thereof) of the B-spline curves of the text lines in the document image after decoding in the prediction process. In the training process, the LSTM layer outputs a predicted value (namely a control point position parameter of a B-spline curve of a text line in a predicted document image) to the SmoothL1Loss layer for calculation by the SmoothL1Loss layer. Wherein the SmoothL1Loss layer calculates the difference between the predicted value and the target value through SmoothL1Loss and SoftmaxLoss.

Secondly, preparing training data;

here, a large amount of training data needs to be prepared in advance before training the calculation model. However, since training of the CNN network requires a large amount of labeling data, which is difficult to obtain, labeling costs are high. Thus, in this example, training data may be generated in the inverse process of being straightened. In other words, the process of generating data may be: the horizontal text lines and B-lines are respectively discretized into a rectangle and a polygon according to a predetermined pixel scale (e.g., step=1 pixel scale), and then the rectangle is transformed onto the polygon using a perspective transformation. Here, for a pixel point mapped (map) a plurality of times, an average value of a plurality of pixel points may be taken.

Thirdly, training the calculation model to enable the calculation model to have the capability of estimating the position parameters of the control points of the B-spline curves of the text lines in the input document image.

Here, in the training process, the SGD may be used for training, the learning rate (learning rate) of the training may be set to 1e-6, and the weight decay (weight decay) may be set to 1e-5.

Fourthly, estimating the position parameters of the control points of the B spline curve of the text line in a document image by using a calculation model, and determining the position parameters of each point on the B spline curve based on the position parameters of the control points of the B spline curve.

In this example, the original image of the document image is normalized to a preset value in equal proportion, and then forward calculation is performed through the network structure in fig. 4 to obtain a normalized value P of the control point position parameter of the B-spline curve _i ' i is more than or equal to 0 and less than or equal to n, and further obtains the control point position parameter P of the B spline curve _i 。

In this example, the document image is uniformly resized to 64 in height (size) and equal in width (size) during training and testing. In practice, this value may be adjusted as desired, and "64" is merely an example value.

In this example, to reduce the difficulty of CNN network optimization, the B-spline curve control point position parameter P in the training data may be first used before training _i (x _i ,y _i ) Normalized to [0,1 ]]The position parameter P of the point is controlled by using a B spline curve _i (x _i ,y _i ) Training is performed on the normalized value of (a). Here, normalization can be performed using the following formula (4):

wherein tx is _i 、ty _i Respectively represent P _i (x _i ,y _i ) Is included in the above formula (c).

In the present example, the normalized value P of the control point position parameter of the B-spline curve is obtained by calculation model prediction _i After' (i is more than or equal to 0 and less than or equal to n), the control point position parameter P of the B spline curve can be further obtained through the inverse operation of the formula (4) _i 。

In this example, the objective equation of the calculation model may employ the following formulas (5) to (7):

dx _i ＝px _i -tx _i dy _i ＝py _i -ty _i (5)

wherein px is _i ,py _i Is the result of the LSTM layer input to the SmoothL1Loss layer. The above equations (5) to (7) define the result that the CNN network needs to predict, and the objective equation used when training the network.

In another implementation manner of this embodiment, the determining a deformation curve of a distorted text line in the document image may include: and estimating the position parameters of points on the deformation curve of the distorted text line in the document image by binarizing the document image. Here, binarization is a method of separating a front background, by which a character line can be separated from a background and a deformation curve of the character line can be estimated in the case where a color difference between the character and the background in a document image is relatively large.

Here, estimating the position parameter of the point on the deformation curve of the distorted text line in the document image by binarizing the document image may include: and binarizing the document image, performing closed operation on the obtained binary image to form a character line shape, finding out an edge line of the character line shape through gradient operation, and fitting the edge line to obtain the position parameters of points on a deformation curve of the distorted character line in the document image. In other words, by binarizing the document image, estimating the position parameter of the point on the deformation curve of the distorted text line in the document image may include: binarizing an original picture of a document image to obtain a binary image, performing a closing operation on the binary image to form a text line strip, performing a gradient operation, finding out an edge line of the text line strip, fitting the edge line to obtain a deformation curve of the text line, and obtaining the deformation curve to obtain the position parameters of points on the deformation curve.

Since the manner of binarization is not robust enough, noise is also easily introduced during binarization. Therefore, in the present embodiment, "binarizing the document image" may specifically be: a binary map of the document image is generated using a full convolution network (Full Convolutional Network).

The binarization process is shown in fig. 5a to 5 e. Fig. 5a is an original graph, fig. 5b is a binarized result, fig. 5c is a closed operation result, fig. 5d is a gradient operation result, and fig. 5e is a fitting edge line result.

In this embodiment, the straightening processing for the distorted text line in the document image by using the deformation curve may include: and discretizing the B spline curve into line segments according to a preset pixel scale, forming the line segments into polygons, and mapping the polygons to quadrangles of horizontal text lines by using anti-radiation conversion. In addition, the straightening process may be accomplished in other ways, which are not limited herein.

As shown in fig. 6, an example diagram of a distorted text line straightening process is shown. As shown in fig. 6, the process of the straightening process may include: firstly, a B-spline curve is discretized into line segments according to a preset pixel scale (for example, the scale of step=1 pixel), then the corresponding line segments form polygons, and each polygon is mapped onto a rectangle of a horizontal text line by using anti-radiation transformation. Here, for a pixel point mapped (map) a plurality of times in this processing, an average value of a plurality of pixel points may be taken.

The above-described methods of the present embodiments may be performed by any computing device capable of implementing the above-described functions. In actual applications, the computing device may be a mobile terminal (e.g., a cell phone), a computer (e.g., a PC), a server (e.g., a virtual server, a physical server, etc.), and so on. In one implementation, the above-described method of the present embodiment may be implemented by a computing device having graphics processor (Graphics Processing Unit, GPU) hardware. In another implementation, the method of the present embodiment may be implemented by a computing device that supports c++ and that has GPU hardware.

The embodiment also provides an apparatus for correcting distorted text lines, as shown in fig. 7, which may include:

a determining module 71, configured to receive a document image to be identified, and determine a deformation curve of a distorted text line in the document image;

and the straightening module 72 is configured to perform a straightening process on the distorted text line in the document image by using the deformation curve, so as to identify the document image after the straightening process.

Wherein, the correcting device for distorted text lines can further comprise: a training module 73, configured to build a calculation model including a VGG convolutional neural network, a BLSTM layer, and an LSTM layer, and train the calculation model; the determining module 71 may be specifically configured to estimate parameters of a deformation curve of a distorted text line in the document image by using the calculation model.

For further technical details of the correction device for distorted text lines according to this embodiment, reference is made to the method section above.

The correction device for distorted text lines of the present embodiment may be software, hardware or a combination of both. In other words, the correction device for the distorted text line may be provided in the computing device, or may be implemented by any computing device capable of implementing the above functions. Specifically, the determining module 71, the straightening module 72, and the training module 73 may be software, hardware, or a combination of both. For example, the above-mentioned correction device for distorted text lines may be implemented by combining a c++ program language and GPU hardware.

The embodiment also provides an apparatus for correcting distorted text lines, as shown in fig. 8, which may include:

a memory 81 in which a distorted word line correction program is stored;

the processor 82 is configured to read the distorted text line correction program to perform the operations of the distorted text line correction method shown in fig. 2.

In this embodiment, reference is made to the method section above for additional technical details of the correction device for distorted text lines shown in fig. 8.

In practice, the correction device for distorted text lines shown in fig. 8 may be implemented by any computing device capable of implementing the above functions. The processor 82 in the orthotic device may be a GPU or a CPU with graphics processing functions. The distorted text line correction program can adopt a C++ programming language. Other forms may be used in addition to this.

It should be noted that, the distorted text line correction device shown in fig. 8 may further include other components, for example, the distorted text line correction device may further include a data memory (such as an original picture storing a document image) for storing user data; for another example, the correction device for distorted text lines may further include a communication circuit for communicating with an external device; for another example, the correction device for distorted text lines may further include a bus for coupling the portions. For another example, the orthotic device may further include a display or the like for displaying the corrected document image. In addition, the correction device may further include other components, which may be configured according to the needs of practical applications, and this is not limited herein.

The present embodiment also provides a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of the above-described method for correcting distorted text lines shown in fig. 2. For further technical details reference is made here to the method section above.

As shown in fig. 9, an exemplary diagram of the processing effect of the present embodiment is shown.

In this embodiment, a deformation curve is used to fit a center curve of a text line, a depth network is used to estimate parameters of B-spline, and then the text line is straightened through the deformation curve, so that distortion of any shape can be described with fewer parameters, and any distorted text line can be corrected. In addition, the embodiment estimates the parameters of the deformation curve through the depth network, has continuity larger than first order, and has better robustness.

Example two

A character recognition method, as shown in fig. 10, may include:

step 1001, receiving a document image to be identified;

step 1002, straightening the distorted text line in the document image by using a deformation curve of the distorted text line in the document image;

and step 1003, recognizing the document image after the straightening processing.

In the embodiment, deformation of the distorted text lines in the document image is described by using the deformation curve, and the distorted text lines in the document image are straightened by using the deformation curve, so that text recognition is performed after the distorted text lines are corrected, and the recognition effect of the document image containing the distorted text lines can be effectively improved.

In practical applications, the character recognition method of the present embodiment may be implemented by the OCR system shown in fig. 1.

In this embodiment, before the straightening process is performed on the distorted text line in the document image by using the deformation curve of the distorted text line in the document image, the method may further include: establishing a calculation model comprising a VGG convolutional neural network, a BLSTM layer and an LSTM layer, and training the calculation model; and estimating parameters of deformation curves of distorted text lines in the document image by using the calculation model.

In one implementation manner, the straightening processing of the distorted text line in the document image by using the deformation curve of the distorted text line in the document image may include: and discretizing the deformation curve into line segments according to a preset pixel scale, forming the line segments into polygons, and mapping the polygons into quadrangles of horizontal text lines by using anti-radiation conversion.

In this embodiment, the deformation curve of the distorted text line may be a B-spline curve. Other types of curves are possible in addition to this, and are not limited in this regard.

As shown in fig. 11, the present embodiment further provides a character recognition apparatus, which may include:

a memory 111 storing a character recognition program;

the processor 112 is configured to read the character recognition program to perform the operations of the character recognition method shown in fig. 10.

The character recognition apparatus of the present embodiment may be executed by any computing device capable of realizing the above-described functions. In one implementation, the character recognition apparatus of the present embodiment may be implemented by a computing device including the OCR system shown in fig. 1, where the image processing unit of the OCR system includes the correction apparatus for twisting the text line in the first embodiment. In actual applications, the computing device may be a mobile terminal (e.g., a cell phone), a computer (e.g., a PC), a server (e.g., a virtual server, a physical server, etc.), and so on. In one implementation, the character recognition apparatus of the present embodiment may be implemented by a computing device provided with graphics processor (Graphics Processing Unit, GPU) hardware. In another implementation, the character recognition apparatus of the present embodiment may be implemented by a computing device supporting c++ and having GPU hardware.

The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the character recognition method shown in fig. 10 described above. For further technical details, reference is made here to the method section above and to the method section of the first embodiment. Here, the computer program may be implemented as the OCR system shown in fig. 1.

For further details of this embodiment reference is made to embodiment one.

The character recognition method and the character recognition device of the embodiment are applicable to the following application scenes: identification card character recognition, shop sign recognition, driving license recognition, business card recognition and other certificate character recognition scenes. In addition, the method can be applied to other scenes.

Exemplary implementations of the above embodiments are described in detail below. It should be noted that the following examples may be combined with each other. In addition, each flow, execution process, etc. in the following examples may also be adjusted according to the needs of practical applications. In addition, in practical applications, other implementations of the embodiments described above are also possible.

Example 1

In this example, the character recognition apparatus above may be provided to the user as a cloud service.

As shown in fig. 12, the character recognition apparatus is provided to an exemplary application scenario of a user in the form of a cloud service. The user can carry out image processing and recognition processing on the document by calling the corresponding API and providing the document to be recognized to the character recognition device, and the character recognition device returns the finally obtained recognition result to the user. Here, the character recognition device may include a distorted text line correction device and an OCR engine.

As shown in fig. 12, the process of character recognition may include:

step 1, receiving a request from user equipment, wherein the request carries a document with an image to be identified;

step 2, inputting the document into a distorted text line correction device, performing image processing (for example, straightening the distorted text line in the document) on the document, and sending the processed document image into an OCR engine;

step 3, the OCR engine carries out character recognition based on the document image processed by the image processing part to obtain character information of the document;

step 4, the character information obtained by the OCR engine is returned to the user equipment;

and 5, the user equipment displays the character information.

In practical applications, the user device may be any computing device that needs to invoke a cloud service for character recognition, and the computing device may be a terminal, an application server, or the like. For example, for an individual user, the user device may be a computer, a mobile terminal, or the like, through which the individual user may submit a request to the OCR system via the electronic device such as a computer, a cell phone, or the like. For another example, for an application provider, the user device may be an application server that provides an application, and after receiving a character recognition request input by a user of the application, the application server sends a request to the OCR system, completes character recognition of a document by calling a cloud service for character recognition provided by the OCR system, and provides a result of character recognition to the user of the application for viewing.

Example 2

In this example, the character recognition apparatus may be used as software.

As shown in fig. 13, a user may install software of the character recognition device on an electronic device, and after the electronic device runs the character recognition device, input a document to be recognized with an image into the character recognition device, and the character recognition device performs image processing (e.g., straightening processing on distorted text lines in the document) and character recognition on the document, and displays the recognized text for the user to view.

It should be noted that fig. 12 to 13 are only examples, and are not intended to limit the present application. In other application scenarios, it may also be implemented in other ways.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the methods described above may be implemented by a program that instructs associated hardware, and the program may be stored on a computer readable storage medium such as a read-only memory, a magnetic or optical disk, etc. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment may be implemented in the form of hardware, or may be implemented in the form of a software functional module. The present application is not limited to any specific form of combination of hardware and software.

Of course, various other embodiments of the present application are possible, and those skilled in the art will recognize that various changes and modifications can be made in light of the application without departing from the spirit and substance of the application, but that such changes and modifications are intended to be within the scope of the claims of the application.

Claims

1. A method of correcting distorted text lines, comprising:

receiving a document image to be identified;

determining a deformation curve of a distorted text line in the document image;

straightening the distorted text line in the document image by utilizing the deformation curve so as to identify the document image after straightening;

before determining the deformation curve of the distorted text line in the document image, the method further comprises the following steps: establishing a calculation model comprising a visual geometry group VGG convolutional neural network, a BLSTM layer and a long-short-term memory neural network LSTM layer, and training the calculation model;

the determining the deformation curve of the distorted text line in the document image comprises the following steps: and estimating parameters of deformation curves of distorted text lines in the document image by using the calculation model.

2. The method for correcting distorted text lines according to claim 1, wherein determining a deformation curve of distorted text lines in the document image includes:

And estimating parameters of deformation curves of distorted text lines in the document image by using a convolutional neural network CNN and a bidirectional long-short-term memory neural network BLSTM.

3. The method for correcting distorted text lines according to claim 1, wherein estimating parameters of deformation curves of distorted text lines in the document image using the calculation model includes:

and after the document image is scaled, inputting the document image into the calculation model for forward calculation, and obtaining the position parameters of the control points of the deformation curve of the distorted text line in the document image.

4. A method of correcting a distorted text line according to claim 3, wherein the training the computational model comprises:

inputting sample data into the calculation model, and training by using a random gradient descent method SGD;

the sample data comprises a document image and a target value of a control point position parameter of a deformation curve of a distorted text line of the document image.

5. The method of claim 4, wherein the distortion correction is performed,

the training of the computing model comprises: minimizing the difference between the predicted value and the target value of the control point position parameter of the deformation curve by performing the following procedure a plurality of times:

After scaling the document image to a preset size, inputting the calculation model, extracting a feature map by a VGG convolutional neural network of the calculation model, and outputting the feature map to the BLSTM layer;

after the BLSTM layer processes the feature map, outputting the feature map to the LSTM layer for decoding to obtain a predicted value of a control point position parameter of a deformation curve of a distorted text line in the document image and outputting the predicted value to a smoothL1 Loss layer;

and the SmoothL1 Loss layer calculates the difference between the predicted value of the control point position parameter of the deformation curve and the target value in the sample data.

6. The method for correcting distorted text lines according to claim 1, wherein determining a deformation curve of distorted text lines in the document image includes:

estimating the position parameters of control points of deformation curves of distorted text lines in the document image;

and obtaining the position parameters of points on the deformation curve of the distorted text line in the document image based on the position parameters of the control points.

7. The method for correcting distorted text lines according to claim 1, wherein determining a deformation curve of distorted text lines in the document image includes:

and estimating the position parameters of points on the deformation curve of the distorted text line in the document image by binarizing the document image.

8. The method for correcting distorted text lines according to claim 1, wherein the straightening the distorted text lines in the document image by using the deformation curve includes:

and discretizing the deformation curve into line segments according to a preset pixel scale, forming the line segments into polygons, and mapping the polygons into quadrangles of horizontal text lines by affine transformation.

9. A method of correcting distorted text lines according to any one of claims 1 to 8, wherein: the deformation curve is a B spline curve.

10. A character recognition method, comprising:

receiving a document image to be identified;

establishing a calculation model comprising a visual geometry group VGG convolutional neural network, a bidirectional long short time memory neural network BLSTM layer and a long short time memory neural network LSTM layer, and training the calculation model;

estimating parameters of deformation curves of distorted text lines in the document image by using the calculation model;

and recognizing the document image after the straightening processing.

11. The character recognition method according to claim 10, wherein the straightening process of the distorted text line in the document image using the deformation curve of the distorted text line in the document image comprises:

12. The character recognition method according to claim 10 or 11, characterized in that: the deformation curve is a B spline curve.

13. An orthotic device for twisting rows of text, comprising:

the straightening module is used for straightening distorted text lines in the document image by utilizing the deformation curve so as to identify the document image after the straightening process;

the training module is used for establishing a calculation model comprising a visual geometry group VGG convolutional neural network, a bidirectional long-short-time memory neural network BLSTM layer and a long-short-time memory neural network LSTM layer, and training the calculation model;

the determining module is specifically configured to estimate parameters of a deformation curve of a distorted text line in the document image by using the computing model.

14. An orthotic device for twisting rows of text, comprising:

a memory storing a distorted text line correction program;

A processor configured to read the distorted text line correction program to perform the operations of the distorted text line correction method of any one of claims 1 to 9.

15. A character recognition apparatus comprising:

a memory storing a character recognition program;

a processor configured to read the character recognition program to perform the operations of the character recognition method of any one of claims 10 to 12.