CN114187437B

CN114187437B - Text recognition method, image correction method, electronic device, and storage medium

Info

Publication number: CN114187437B
Application number: CN202210128043.2A
Authority: CN
Inventors: 龙如蛟; 姜祥威; 杨志博; 姚聪; 夏桂松
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-05-13
Anticipated expiration: 2042-02-11
Also published as: CN114187437A

Abstract

The embodiment of the application provides a text recognition method, an image correction method, electronic equipment and a storage medium, wherein the text recognition method comprises the following steps: acquiring an image to be identified, wherein the image to be identified comprises a document image, a bill image or a card image; acquiring boundary information of a text in an image to be recognized, wherein the boundary information is used for indicating the boundary of the text in the image to be recognized; acquiring text line information of a text in an image to be recognized, wherein the text line information is used for indicating the position of each text line in the image to be recognized; determining pixel point mapping information according to the boundary information and the text line information, wherein the pixel point mapping information is used for indicating the corresponding relation between the pixel points in the image to be identified and the pixel points in the template image; filling pixel values of pixel points in the image to be identified into corresponding pixel points in the template image according to the pixel point mapping information to obtain a corrected image; and performing text recognition on the corrected image to obtain text information. The scheme can achieve the effect of text recognition.

Description

Text recognition method, image correction method, electronic device, and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a text recognition method, an image correction method, electronic equipment and a storage medium.

Background

Optical Character Recognition (OCR) is a technique for recognizing characters from an image, and is widely used for text Recognition of paper documents, extraction of bill information, and the like. The image used for text recognition may have noise such as rotation, folding, and wrinkling, which affects the accuracy of text recognition, and therefore, the image needs to be corrected before text recognition, and the noise included in the image is reduced, so as to improve the accuracy of text recognition.

At present, after the features of an image are extracted through a convolutional neural network, the offset between each pixel and the corrected pixel is directly regressed in each pixel, so that the image is corrected.

However, when an image is corrected by a convolutional neural network, because the local features of the image do not change obviously, there is no strict one-to-one correspondence between pixel points in the image before correction and pixel points in the image after correction, which causes problems of pixel overlapping, holes and the like of many pixel points in the image after correction, and thus the effect of correcting the image including text information is poor, and further the text recognition effect is poor.

Disclosure of Invention

In view of the above, embodiments of the present application provide a text recognition method, an image rectification method, an electronic device, and a storage medium to at least solve or alleviate the above problems.

According to a first aspect of embodiments of the present application, there is provided a text recognition method, including: acquiring an image to be identified, wherein the image to be identified comprises a document image, a bill image or a card image; acquiring boundary information of a text in the image to be recognized, wherein the boundary information is used for indicating the boundary of the text in the image to be recognized; acquiring text line information of a text in the image to be recognized, wherein the text line information is used for indicating the position of each text line in the image to be recognized; determining pixel point mapping information according to the boundary information and the text line information, wherein the pixel point mapping information is used for indicating the corresponding relation between the pixel points in the image to be identified and the pixel points in the template image; filling pixel values of pixel points in the image to be identified into corresponding pixel points in the template image according to the pixel point mapping information to obtain a corrected image; and performing text recognition on the corrected image to obtain text information.

According to a second aspect of embodiments of the present application, there is provided an image rectification method including: acquiring boundary information of a text in an image to be corrected, wherein the boundary information is used for indicating the boundary of the text in the image to be corrected; acquiring text line information of a text in the image to be corrected, wherein the text line information is used for indicating the position of each text line in the image to be corrected; determining pixel point mapping information according to the boundary information and the text line information, wherein the pixel point mapping information is used for indicating the corresponding relation between pixel points in the image to be corrected and pixel points in a template image; and filling the pixel values of the pixels in the image to be corrected into the corresponding pixels in the template image according to the pixel mapping information to obtain a corrected image.

According to a third aspect of embodiments of the present application, there is provided a text recognition apparatus including: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an image to be identified, and the image to be identified comprises a document image, a bill image or a card image; the second acquisition module is used for acquiring boundary information of the text in the image to be recognized, wherein the boundary information is used for indicating the boundary of the text in the image to be recognized; the third acquisition module is used for acquiring text line information of a text in the image to be recognized, wherein the text line information is used for indicating the position of each text line in the image to be recognized; the optimization module is used for determining pixel point mapping information according to the boundary information and the text line information, wherein the pixel point mapping information is used for indicating the corresponding relation between the pixel points in the image to be identified and the pixel points in the template image; the filling module is used for filling the pixel values of the pixels in the image to be identified into the corresponding pixels in the template image according to the pixel mapping information to obtain a corrected image; and the recognition module is used for performing text recognition on the corrected image to obtain text information.

According to a fourth aspect of embodiments of the present application, there is provided an electronic apparatus, including: the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the text recognition method according to the first aspect or the operation corresponding to the image correction method according to the second aspect.

According to a fifth aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a text recognition method as described in the first aspect above or an image rectification method as described in the second aspect above.

According to a sixth aspect of embodiments herein, there is provided a computer program product comprising computer instructions for instructing a computing device to perform a text recognition method as described in the first aspect above or an image rectification method as described in the second aspect above.

By the technical scheme, the boundary information can indicate the boundary of the text in the image to be recognized, the text line information can indicate the position of each text line in the image to be recognized, due to the existence of noise such as folds, wrinkles and the like in the image to be recognized, the boundary indicated by the boundary information and the text line indicated by the text line information may be distorted, and the boundary of the text in the corrected image and the text line are straight, so that the pixel point mapping information can be determined according to the boundary information and the text line information, the pixel point mapping information can indicate the corresponding relation between the pixel point in the image to be identified and the pixel point in the template image, and then filling pixel values of the pixels in the image to be recognized into corresponding pixels in the template image according to the pixel mapping information to obtain a corrected image corresponding to the image to be recognized, and further performing text recognition on the corrected image to obtain text information. The image to be recognized is corrected according to the boundary information and the text line information, the corresponding relation between the pixel points in the image to be recognized and the pixel points in the corrected image can be restrained, the problems of pixel overlapping and cavities in the corrected image are avoided, the effect of correcting the image to be recognized can be improved, and the effect of recognizing the text is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to these drawings.

FIG. 1 is a schematic diagram of an exemplary system in which one embodiment of the present application may be implemented;

FIG. 2 is a flow diagram of a text recognition method according to one embodiment of the present application;

FIG. 3 is a flow chart of an image rectification method according to an embodiment of the present application;

FIG. 4 is a schematic illustration of an image rectification process according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a text recognition apparatus according to one embodiment of the present application;

FIG. 6 is a schematic view of an electronic device of an embodiment of the application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

and (3) image to be corrected: the image containing text information and having noise affecting text recognition such as rotation, folding, and wrinkling is the image to be recognized having noise such as rotation, folding, and wrinkling in the text recognition scene.

Correcting the image: after the image to be corrected with noises such as rotation, folding, wrinkles and the like is processed through the image correction algorithm, the obtained image with higher quality can improve the accuracy of text recognition compared with the method for directly recognizing the text of the image to be corrected.

And (3) an image to be identified: the processing target of text recognition is an image carrying text information and having noise such as rotation, folding, or wrinkles.

Exemplary System

Fig. 1 illustrates an exemplary system suitable for use in the image rectification method of the present application. As shown in fig. 1, the system 100 may include a server 102, a communication network 104, and/or one or more user devices 106, illustrated in fig. 1 as a plurality of user devices.

Server 102 may be any suitable server for storing information, data, programs, and/or any other suitable type of content. In some embodiments, server 102 may perform any suitable functions. For example, in some embodiments, the server 102 may be used for image rectification. As an alternative example, in some embodiments, the server 102 may be used for image rectification and text recognition. As another example, in some embodiments, server 102 may be used to send rectified image or text recognition results to a user device.

In some embodiments, the communication network 104 may be any suitable combination of one or more wired and/or wireless networks. For example, the communication network 104 can include any one or more of the following: the network may include, but is not limited to, the internet, an intranet, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a Digital Subscriber Line (DSL) network, a frame relay network, an Asynchronous Transfer Mode (ATM) network, a Virtual Private Network (VPN), and/or any other suitable communication network. The user device 106 can be connected to the communication network 104 by one or more communication links (e.g., communication link 112), and the communication network 104 can be linked to the server 102 via one or more communication links (e.g., communication link 114). The communication link may be any communication link suitable for communicating data between the user device 106 and the server 102, such as a network link, a dial-up link, a wireless link, a hardwired link, any other suitable communication link, or any suitable combination of such links.

User devices 106 may include any one or more user devices adapted to receive image data, capture image data. In some embodiments, user devices 106 may comprise any suitable type of device. For example, in some embodiments, the user device 106 may include a mobile device, a tablet computer, a laptop computer, a desktop computer, a wearable computer, a game console, a media player, and/or any other suitable type of user device.

Although server 102 is illustrated as one device, in some embodiments, any suitable number of devices may be used to perform the functions performed by server 102. For example, in some embodiments, multiple devices may be used to implement the functions performed by the server 102. Alternatively, the functionality of the server 102 may be implemented using a cloud service.

Text recognition method

Based on the above system, an embodiment of the present application provides a text recognition method, as shown in fig. 2, the text recognition method includes the following steps:

step 201, obtaining an image to be identified, wherein the image to be identified comprises a document image, a bill image or a card image.

In an application scene of education or intelligent office, the document image, the bill image or the card image comprise text information, and the text information in the image can be automatically and quickly extracted by performing text recognition on the image comprising the text information, so that the teaching or working efficiency is improved.

The image to be identified including the document image, the bill image or the card image may be an image generated by scanning with a scanner, or an image acquired by an intelligent device having an image acquisition function, such as a mobile phone, a tablet computer, a digital camera, or the like.

Step 202, obtaining boundary information of a text in the image to be recognized, wherein the boundary information is used for indicating the boundary of the text in the image to be recognized.

Step 203, obtaining text line information of the text in the image to be recognized, wherein the text line information is used for indicating the position of each text line in the image to be recognized.

And 204, determining pixel point mapping information according to the boundary information and the text line information, wherein the pixel point mapping information is used for indicating the corresponding relation between the pixel points in the image to be identified and the pixel points in the template image.

And step 205, filling the pixel values of the pixels in the image to be identified into the corresponding pixels in the template image according to the pixel mapping information to obtain a corrected image.

And step 206, performing text recognition on the corrected image to obtain text information.

After the corrected image is obtained, the text information in the corrected image may be recognized through various text recognition algorithms, which is not limited in this application.

In the embodiment of the application, the boundary information may indicate the boundary of the text in the image to be recognized, the text line information may indicate the position of each text line in the image to be recognized, due to the existence of noise such as folds, wrinkles and the like in the image to be recognized, the boundary indicated by the boundary information and the text line indicated by the text line information may be distorted, and the boundary of the text in the corrected image and the text line are straight, so that the pixel point mapping information can be determined according to the boundary information and the text line information, the pixel point mapping information can indicate the corresponding relation between the pixel point in the image to be identified and the pixel point in the template image, and then filling pixel values of the pixels in the image to be recognized into corresponding pixels in the template image according to the pixel mapping information to obtain a corrected image corresponding to the image to be recognized, and further performing text recognition on the corrected image to obtain text information. The image to be recognized is corrected according to the boundary information and the text line information, the corresponding relation between pixel points in the image to be recognized and pixel points in the corrected image can be restrained, the problems of pixel overlapping and holes in the corrected image are avoided, the effect of correcting the image to be recognized can be improved, and the effect of recognizing the text is improved.

In a possible implementation manner, after text recognition is performed on the corrected image to obtain text information, the corrected image and the corresponding text information can be output to check the accuracy of the text recognition result, and the text information can be used as a reference of text contents contained in the image to be recognized, so that a user can conveniently read or process the text.

It should be noted that, in the process of obtaining a corrected image by correcting the image to be recognized in steps 202 to 205, reference may be made to the following description of an embodiment of the image correction method.

Image correction method

Based on the above system, the present application provides an image rectification method, which is described in detail below through a plurality of embodiments.

Fig. 3 is a schematic flowchart of an image rectification method according to an embodiment of the present application. As shown in fig. 3, the image rectification method includes the steps of:

step 301, obtaining the boundary information of the text in the image to be corrected.

The image to be corrected carries text information, which is usually located in the middle region of the image to be corrected. The area where the text in the image to be corrected is located may be defined by boundary information, and the boundary information may indicate the boundary of the text in the image to be corrected. The boundary information may indicate a rectangular region where the text in the image to be corrected is located, where the boundary information includes an upper boundary, a lower boundary, a left boundary, and a right boundary, and the text in the image to be corrected is located in the rectangular region surrounded by the upper boundary, the lower boundary, the left boundary, and the right boundary.

It should be understood that the obtaining of the boundary information of the text in the image to be corrected may be implemented by any implementable image feature extraction algorithm, for example, the boundary information of the text in the image to be corrected may be extracted by a DocUNet algorithm, and the specific method for obtaining the boundary information is not limited in the embodiment of the present application.

Step 302, obtaining text line information of a text in the image to be corrected.

The texts in the image to be corrected are distributed in lines, one line of text is a text line, and the text line information is used for indicating the position of each text line in the image to be corrected. In the image to be corrected, each text line is located in an area surrounded by the boundary indicated by the boundary information.

It should be understood that the obtaining of the text line information of the text in the image to be corrected may be implemented by any available image feature extraction algorithm, for example, the text line information of the text in the image to be corrected may be extracted by a UNet algorithm, and the specific method for obtaining the text line information is not limited in the embodiment of the present application.

Step 303, determining pixel point mapping information according to the boundary information and the text line information.

Because the image to be corrected has noise such as folding and wrinkling, the noise such as folding and wrinkling needs to be removed through telescopic transformation so as to obtain a corrected image with higher quality. When image correction is performed, the corresponding relationship between the image to be corrected and the pixel points in the corrected image needs to be determined, and the corrected image can be obtained by performing pixel filling according to the corresponding relationship.

In the rectified image, the upper boundary, the lower boundary and the text behavior of the text are horizontal straight lines, and the left boundary and the right boundary of the text are vertical lines. In the image to be corrected, due to the presence of noise such as folding, wrinkling, or the like, the boundary indicated by the boundary information and the text line indicated by the text line information may be warped. In the image correction process, the boundary indicated by the boundary information and the text line indicated by the text line information need to be corrected into a straight line, so that the image to be corrected can be corrected by taking the boundary information and the text information as constraints, that is, pixel point mapping information is determined according to the boundary information and the text line information, the pixel point mapping information can indicate the corresponding relationship between pixel points in the image to be corrected and pixel points in the template image, and then pixel values of all the pixel points in the image to be corrected are filled into corresponding pixel points in the template image according to the corresponding relationship, and the corrected image can be obtained, so that the image correction is realized.

The template image is a preset image with the same size and resolution as the corrected image, and the pixel value of each pixel point in the template image can be any value, because the pixel value of the pixel point in the template image can be covered by the pixel value of the pixel point in the image to be corrected when pixel value filling is subsequently carried out. The resolution of the template image and the image to be rectified may be the same or different. And when the template image and the image to be corrected have the same resolution, each pixel point in the image to be corrected corresponds to one pixel point in the template image. When the template image and the image to be corrected have different resolutions, a plurality of adjacent pixel points in the image to be corrected correspond to one pixel point in the template image, or each pixel point in the image to be corrected corresponds to a plurality of adjacent pixel points in the template image.

And step 304, according to the pixel point mapping information, filling the pixel values of the pixel points in the image to be corrected into the corresponding pixel points in the template image to obtain the corrected image.

The pixel point mapping information can indicate the corresponding relation between the pixel points in the image to be corrected and the pixel points in the template image, so that the pixel values of the pixel points in the image to be corrected can be filled into the corresponding pixel points in the template image according to the pixel point mapping information, and the corrected image is obtained after the pixel values of all the pixel points in the template image are filled.

And filling pixel values into the pixels in the template image according to the pixel mapping information to obtain a corrected image, namely the corrected image is the template image after the pixel values are filled into the pixels.

And if the pixel point mapping information indicates that each pixel point in the image to be corrected corresponds to one pixel point in the template image, directly filling the pixel value of the pixel point in the image to be corrected into the corresponding pixel point in the template image. And if the pixel point mapping information indicates that a plurality of adjacent pixel points in the image to be corrected correspond to one pixel point in the template image, filling the average value of the pixel values of the plurality of adjacent pixel points in the image to be corrected into the corresponding pixel point in the template image. And if the pixel point mapping information indicates that one pixel point in the image to be corrected corresponds to a plurality of adjacent pixel points in the target image, respectively filling the pixel values of the pixel points in the image to be corrected into the corresponding plurality of adjacent pixel points in the template image.

In the embodiment of the application, the boundary information may indicate a boundary of a text in an image to be corrected, the text line information may indicate positions of text lines in the image to be corrected, due to existence of noise such as folding and wrinkling in the image to be corrected, the boundary indicated by the boundary information and the text line indicated by the text line information may be distorted, and the boundary of the text in the image and the text line in the corrected image are straight, so that pixel point mapping information may be determined according to the boundary information and the text line information, the pixel point mapping information may indicate a correspondence between a pixel point in the image to be corrected and a pixel point in a template image, and then a pixel value of the pixel point in the image to be corrected may be filled into a corresponding pixel point in the template image according to the pixel point mapping information, so as to obtain a corrected image corresponding to the image to be corrected. The image to be corrected is corrected according to the boundary information and the text line information, the corresponding relation between pixel points in the image to be corrected and pixel points in the image to be corrected can be restrained, the problems of pixel overlapping and holes in the image to be corrected are avoided, and therefore the effect of correcting the image comprising the text information can be improved.

In a possible implementation manner, the pixel mapping information includes a U map and a V map, the U map is used to indicate a U value of a pixel corresponding to each pixel in the image to be corrected on the corrected image, the V map is used to indicate a V value of a pixel corresponding to each pixel in the image to be corrected on the corrected image, both the U value and the V value are values between 0 and 1, the U value is used to indicate a position of the pixel in the corrected image along the width direction, and the V value is used to indicate a position of the pixel in the corrected image along the height direction. For example, a pixel point on the image to be corrected<x,y>Pixel points on the U and V images<x,y>Corresponding U and V values of<u,v>，<u,v>Representing pixel points<x,y>The normalized coordinates of the corresponding pixel points in the rectified image. Multiplying u by the width of the corrected image to obtain pixel points<x,y>Coordinates of corresponding pixel points in the corrected image in the width directionxAnd v is multiplied by the height of the corrected image to obtain pixel points<x,y>Coordinates of corresponding pixel points in the corrected image in the height directionyAnd' correcting pixel points in the image<x´,y´>I.e. the pixel points in the image to be corrected<x,y>Corresponding pixel points, when the pixel points in the image to be corrected correspond to the pixel points in the corrected image one by one, the pixel points<x´,y´>And pixel point<x,y>Have the same pixel value.

Fig. 4 is a schematic diagram of an image rectification process according to an embodiment of the present application. As shown in fig. 4, according to the boundary information and the text line information of the text in the image to be corrected, pixel point mapping information including the U-map and the V-map is determined, and then the pixel values of the pixel points in the image to be corrected are filled into the template image according to the U-map and the V-map, so as to obtain the corrected image.

In the U picture, the pixel points positioned in the same column have the same U value, the U value of the pixel point on the left boundary of the U picture is 0, the U value of the pixel point on the right boundary of the U picture is 1, and the U value of the pixel point on the right column in the two adjacent columns of pixel points is greater than or equal to the U value of the pixel point on the left column. Two thick curves on the left side and the right side in the U-shaped graph are the left boundary and the right boundary of the text in the image to be corrected, which are indicated by the boundary information, a longitudinal curve between the two thick curves is used for representing a dividing line of the text in the image to be corrected, and the dividing line can be determined according to the text line information.

In the V image, the pixel points positioned on the same line have the same V value, the V value of the pixel point on the upper boundary of the V image is 0, the V value of the pixel point on the lower boundary of the V image is 1, and the V value of the pixel point on the lower line in two adjacent lines of pixel points is greater than or equal to the V value of the pixel point on the upper line. Two thick curves positioned at the upper side and the lower side in the V diagram are the upper boundary and the lower boundary of the text in the image to be corrected indicated by the boundary information, and a transverse curve positioned between the two thick curves is the text line in the image to be corrected indicated by the text line information.

As can be seen from the U-diagram and the V-diagram in fig. 4, the boundary and the text line of the text in the image to be corrected are distorted, and the pixel points located on the same boundary or the same text line have different U values or V values, so that when the pixel values are filled into the template image according to the U-diagram and the V-diagram, the boundary and the text line of the text can be straightened, and simultaneously, other areas in the image to be corrected are stretched or compressed, thereby obtaining a corrected image with higher quality.

In a possible implementation manner, when determining the pixel mapping information according to the boundary information and the text line information in step 303, a preset optimized object may be optimized according to a first loss and a second loss to obtain the pixel mapping information, where the first loss is a loss when the optimized object is constrained by the boundary information, and the second loss is a loss when the optimized object is constrained by the text line information.

The optimization object is initialized pixel point mapping information, first loss and second loss are determined according to boundary information and text line information, the optimization object is optimized according to the first loss and the second loss, the optimization object meeting the requirements is solved, and the optimization object meeting the requirements is the pixel point mapping information. For example, when the pixel mapping information includes a U diagram and a V diagram, the optimization objects are initialized U diagrams and V diagrams, the U value of each pixel in the initialized U diagram is 0, and the V value of each pixel in the initialized V diagram is 0.

Because the boundary information is used for indicating the boundary of the text in the image to be corrected, and the text line information is used for indicating the position of the text line in the image to be corrected, the boundary information and the text line information can reflect the noise of folding, wrinkling and the like in the image to be corrected, the optimized object can be optimized by using the boundary information and the text line information to obtain pixel point mapping information, then the pixel values of the pixel points in the image to be corrected are filled into the corresponding pixel points in the template image according to the pixel point mapping information, the noise of folding, wrinkling and the like in the image to be corrected is corrected, and the corrected image with higher quality is obtained.

Because the boundary indicated by the boundary information and the text line indicated by the text line information are lines on the image to be corrected, in order to optimize the optimized object by using the boundary information and the text line information, the boundary information and the text line information can be dispersed into points, and then the optimized object is optimized by using the dispersed points to obtain pixel point mapping information.

In the embodiment of the application, the pixel point mapping information solving process is converted into a process of optimizing by using boundary information and text line information, the optimized object is constrained by using the boundary information and the text line information, a first loss is caused by constraining the optimized object by using the boundary information, a second loss is caused by constraining the optimized object by using the text line information, so that the optimized object can be optimized according to the first loss and the second loss, the optimized object which is optimized is determined to be the pixel point mapping information, the pixel point mapping information can be obtained more quickly and accurately, and the precision of correcting the image and the speed of correcting the image are further ensured.

In one example, the process of optimizing the optimization object according to the first loss and the second loss can be modeled as the following formula (1):

（1）

wherein,φfor the purpose of characterizing the optimization object,

for the purpose of characterizing the first loss,

for the purpose of characterizing the second loss,αfor characterizing preset coefficients, e.g.αMay be equal to 0.8. The meaning of the above formula (1) is to solve

+α

The smallest optimization object.

In a possible implementation manner, when the optimized object is optimized according to the first loss and the second loss to obtain the pixel point mapping information, the optimized object may be optimized according to the first loss, the second loss, and the regularization term to obtain the pixel point mapping information. The regularization item can indicate the distance between pixel point pairs in the template image, and two pixel points corresponding to the pixel point pairs indicated by the optimization object are adjacent in the image to be corrected.

When image correction is performed, although an image to be corrected includes noise such as folding and folding, two corresponding pixel points of two adjacent pixel points in the image to be corrected in the corrected image should be adjacent or close to each other, and when an optimized object is optimized according to a first loss and a second loss to obtain pixel point mapping information, a situation that a distance between two corresponding pixel points of two adjacent pixel points in the image to be corrected in the corrected image is large exists. Therefore, the optimization object is optimized according to the first loss, the second loss and the regularization item to obtain pixel point mapping information, and for two corresponding pixel points of two adjacent pixel points in the image to be corrected, which are indicated by the optimization object, in the template image, the regularization item can indicate the distance between the two points in the template image, so that the regularization item is added into the optimization process of the optimization object, the distance between the corresponding pixel points of the adjacent pixel points in the image to be corrected in the corrected image can be controlled, and the accuracy of the corrected image is improved.

When the optimized object comprises a U image and a V image, the U value in the U image and the V value in the V image are used for indicating the positions of pixel points corresponding to the pixel points in the image to be corrected in the corrected image, the U image and the V image are optimized according to the first loss, the second loss and the regularization term, the change of the U value and the V value of the pixel points in the U image and the V image can be smoother, namely, the distance between the corresponding pixel points of adjacent pixel points in the image to be corrected in the corrected image is controlled not to be overlarge, and therefore the accuracy of image correction is guaranteed.

In the embodiment of the application, the optimization object is optimized according to the first loss, the second loss and the regularization item, so that the pixel point mapping information is obtained, the solving process of the pixel point mapping relation can be converted into the optimization process of the optimization object, the speed of correcting the image is improved, and due to the addition of the regularization item, the distance between the corresponding pixel points of the adjacent pixel points in the corrected image in the image to be corrected can be controlled not to be too large, so that the accuracy of correcting the image including the text is further improved.

In a possible implementation manner, when the optimized object is optimized according to the first loss, the second loss and the regularization term to obtain the pixel point mapping information, the optimized object may be optimized according to a weighted sum of the first loss, the second loss and the regularization term to obtain the pixel point mapping information.

In the embodiment of the application, the optimization object is optimized according to the weighted sum of the first loss, the second loss and the regularization term, and the influence strength of the first loss, the second loss and the regularization term on the optimization object can be flexibly adjusted, so that the proportion of the first loss, the second loss and the regularization term is balanced, the accuracy of the obtained pixel point mapping information is ensured, and the accuracy of correcting the image comprising the text is ensured.

In a possible implementation manner, when the optimized object is optimized, the weighted sum of the first loss, the second loss and the regularization term may be minimized, and the optimized object after the optimization is determined as the pixel point mapping information.

In the embodiment of the application, because the first loss is the loss when the optimized object is constrained by the boundary information, the second loss is the loss when the optimized object is constrained by the text line information, the regularization term indicates the distance between the corresponding pixel points of the adjacent pixel points in the image to be corrected in the template image, and the optimized object is optimized by taking the weighted sum of the first loss, the second loss and the regularization term as the constraint condition, the accuracy of the generated pixel point mapping information can be further improved on the basis of balancing the first loss, the second loss and the regularization term, and the accuracy of correcting the image including the text is further improved.

In one example, the process of optimizing the optimization object with the constraint of the weighted sum of the first loss, the second loss and the regularization term being minimum can be modeled as the following formula (2):

（2）

wherein,φfor the purpose of characterizing the optimization object,

for the purpose of characterizing the first loss,

for the purpose of characterizing the second loss,

for the purpose of characterizing the regularization term,αandλfor characterizing preset coefficients, e.g.αIt may be equal to 0.6 of,λmay be equal to 0.8. The meaning represented by the above formula (2) is to solve

+α

+λ

The smallest optimization object.

In one possible implementation, in optimizing the optimization object, the first loss may be calculated by the following equation (3):

（3）

for the purpose of characterizing the first loss,Kfor characterizing the number of sample pixel points determined at each boundary indicated by the boundary information,

for characterising an optimised object indicationkSample pixel point

The position of the corresponding pixel point in the width direction or the height direction in the template image is 0 <

Sample pixel point < 1, i =0 and j =0

When the sample pixel point is positioned at the upper boundary indicated by the boundary information, i =1 and j =0

When the pixel point of the sample is positioned at the left boundary indicated by the boundary information, i =1 and j =1

The sample pixel point is positioned at the right boundary indicated by the boundary information, and when i =0 and j =1

Located at the lower boundary indicated by the boundary information.

It should be understood that K is used to characterize the number of sample pixels determined on the upper boundary, the lower boundary, the left boundary, or the right boundary indicated by the boundary information, and the greater the number of sample pixels, the more accurate the calculated first loss is, the greater the corresponding data amount required to be processed is, so that the value of K may be set according to the accuracy requirement and the speed requirement of image rectification, for example, K may be 17, 50, or 100.

When the optimization object includes a U map and a V map, the left boundary on the U map is 0, the right boundary is 1, the upper boundary on the V image is 0, the lower boundary is 1, the above formula (3) represents constraint on the left boundary on the U map when i =0 and j =0, the above formula (3) represents constraint on the upper boundary on the V map when i =1 and j =0, the above formula (3) represents constraint on the right boundary on the U map when i =1 and j =1, and the above formula (3) represents constraint on the upper boundary and the lower boundary on the V map when i =0 and j = 1.

In the embodiment of the application, the upper boundary, the lower boundary, the left boundary and the right boundary indicated by the boundary information are used for constraining the optimized object, the loss generated when the optimized object is constrained by the upper boundary, the lower boundary, the left boundary and the right boundary is integrated, the first loss is obtained by calculation according to the formula (1), so that the first loss can comprehensively embody the constraint of the four boundaries of the text in the image to be corrected, the accuracy of the obtained pixel point mapping information can be ensured when the optimized object is optimized by the first loss, and the accuracy of correcting the image comprising the text is further ensured.

In one possible implementation, during the optimization of the optimized object, the second loss may be calculated by the following formula (4):

（4）

for the purpose of characterizing the second loss,Jthe correction method is used for representing the number of pixel points on a horizontal line or a vertical line indicated by text line information, the horizontal line is used for indicating the position of a text line in an image to be corrected, the vertical line is used for indicating the position of a text column in the image to be corrected, and the pixel points are when i =0

And pixel point

Two adjacent pixel points on the same horizontal line, i =1 pixel point

And pixel point

Two adjacent pixel points on the same vertical line,

for characterising an optimised object indicationkSample pixel point

＜1，

For characterising an optimised object indicationk+1 sample pixel points

＜1。

Pixel point when i =0

And pixel point

The two adjacent pixel points on the same horizontal line are used for indicating the text line, and the horizontal line can be a median line of the text line. Pixel point when i =1

And pixel point

The two adjacent pixel points on the same vertical line are the column lines of the text in the image to be corrected determined according to the text line information.

For characterizing calculations

-

2-norm of (d).

When the optimization object comprises a U graph and a V graph, as shown in FIG. 3When i =0

-

The difference of U values of two adjacent pixel points on a bit line in a text line is represented, and when i =1

-

And representing the difference of the V values of two adjacent pixel points on the partition line.

In the embodiment of the application, in the process of optimizing the optimized object, the constraint on the text line is that the coordinates of the pixel points on the same text line in the height direction of the image are the same, the coordinates of the pixel points on the same column in the width direction of the image are the same, and the second loss is obtained by calculation through the formula (4) based on the characteristic, so that the second loss can comprehensively treat the constraints of the text line and the branch line in the image to be corrected, and therefore when the optimized object is optimized through the second loss, the accuracy of the pixel point mapping information obtained by optimizing the optimized object can be improved, and the accuracy of correcting the image comprising the text is improved.

In one possible implementation, during the optimization of the optimized object, the regularization term may be calculated by the following equation (5):

（5）

for the characterization of the regularization term or terms,

for characterizing an optimized object with respect to an image to be rectifiedThe second derivative of the lateral coordinates of the middle pixel,

for characterizing the second derivative of the optimization object with respect to the vertical coordinates of the pixel points in the image to be rectified,βfor characterizing the preset weight coefficients.

When the optimization object comprises a U graph and a V graph, the regularization term is realized by integrating the square of the second derivative of the U graph and the V graph. In the process of optimizing the optimized object, although boundary constraint and text line constraint are added, the condition that the U value on the U image and the V value on the V image are not smooth is likely to exist, and therefore a regularization term is added to balance smoothness and a better fitting solution of the U value and the V value, the image to be corrected can be accurately corrected through pixel point mapping information obtained through optimization, and the accuracy of correcting the image including the text is improved.

In the above-mentioned formula (5),

code for characterizing a domain is a number domain.

In the embodiment of the application, the integral of the second derivative square of the optimized object is calculated as the regularization term, so that the regularization term can accurately reflect the distance between the corresponding pixel points of the adjacent pixel points in the template image in the image to be corrected, and when the optimized object is optimized through the regularization term to obtain the pixel point mapping information, the distance between the corresponding pixel points of the adjacent pixel points in the corrected image in the image to be corrected can be controlled, so that the accuracy of correcting the image including the text is ensured.

Text recognition device

Fig. 5 is a schematic view showing a text recognition apparatus corresponding to the text recognition method in the foregoing embodiment, and as shown in fig. 5, the text recognition apparatus includes:

the first obtaining module 501 is configured to obtain an image to be identified, where the image to be identified includes a document image, a ticket image, or a card image;

a second obtaining module 502, configured to obtain boundary information of a text in an image to be recognized, where the boundary information is used to indicate a boundary of the text in the image to be recognized;

a third obtaining module 503, configured to obtain text line information of a text in the image to be recognized, where the text line information is used to indicate positions of text lines in the image to be recognized;

the optimization module 504 is configured to determine pixel mapping information according to the boundary information and the text line information, where the pixel mapping information is used to indicate a correspondence between a pixel in the image to be identified and a pixel in the template image;

a filling module 505, configured to fill pixel values of pixels in the image to be identified into corresponding pixels in the template image according to the pixel mapping information, so as to obtain a corrected image;

and the recognition module 506 is configured to perform text recognition on the corrected image to obtain text information.

It should be noted that the text recognition apparatus in the embodiment of the present application is used to implement the corresponding text recognition method in the foregoing method embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Electronic device

Fig. 6 is a schematic block diagram of an electronic device according to an embodiment of the present application, and a specific embodiment of the present application does not limit a specific implementation of the electronic device. As shown in fig. 6, the electronic device may include: a processor (processor)602, a communication Interface 604, a memory 606, and a communication bus 608. Wherein:

the processor 602, communication interface 604, and memory 606 communicate with one another via a communication bus 608.

A communication interface 604 for communicating with other electronic devices or servers.

The processor 602 is configured to execute the program 610, and may specifically execute the relevant steps in any of the foregoing image rectification method embodiments or any of the foregoing text recognition method embodiments.

In particular, program 610 may include program code comprising computer operating instructions.

The processor 602 may be a CPU, or an application Specific Integrated circuit (asic), or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 606 stores a program 610. Memory 606 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 610 may specifically be configured to cause the processor 602 to execute an image rectification method or a text recognition method in any of the embodiments described above.

For specific implementation of each step in the program 610, reference may be made to corresponding descriptions in corresponding steps and units in any of the foregoing embodiments of the image rectification method or the text recognition method, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

By the electronic equipment of the embodiment of the application, the boundary information can indicate the boundary of the text in the image to be recognized, the text line information can indicate the position of each text line in the image to be recognized, the boundary indicated by the boundary information and the text line indicated by the text line information can be distorted due to the existence of noise such as folding, folding and the like in the image to be recognized, and the boundary and the text line of the text in the image are corrected, so that pixel point mapping information can be determined according to the boundary information and the text line information, the pixel point mapping information can indicate the corresponding relation between the pixel point in the image to be recognized and the pixel point in the template image, further the pixel value of the pixel point in the image to be recognized can be filled into the corresponding pixel point in the template image according to the pixel point mapping information, a corrected image corresponding to the image to be recognized is obtained, and further the text recognition is carried out on the corrected image, text information is obtained. The image to be recognized is corrected according to the boundary information and the text line information, the corresponding relation between the pixel points in the image to be recognized and the pixel points in the corrected image can be restrained, the problems of pixel overlapping and cavities in the corrected image are avoided, the effect of correcting the image to be recognized can be improved, and the effect of recognizing the text is improved.

Computer storage medium

The present application also provides a computer-readable storage medium storing instructions for causing a machine to perform an image rectification method or a text recognition method as described herein. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present application.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Computer program product

Embodiments of the present application further provide a computer program product, which includes computer instructions for instructing a computing device to perform operations corresponding to any of the above method embodiments.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Further, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. A text recognition method, comprising:

acquiring an image to be identified, wherein the image to be identified comprises a document image, a bill image or a card image;

acquiring boundary information of a text in the image to be recognized, wherein the boundary information is used for indicating the boundary of the text in the image to be recognized;

acquiring text line information of a text in the image to be recognized, wherein the text line information is used for indicating the position of each text line in the image to be recognized;

determining pixel point mapping information according to the boundary information and the text line information, wherein the pixel point mapping information is used for indicating the corresponding relation between the pixel points in the image to be identified and the pixel points in the template image;

filling pixel values of pixel points in the image to be identified into corresponding pixel points in the template image according to the pixel point mapping information to obtain a corrected image;

performing text recognition on the corrected image to obtain text information;

determining pixel point mapping information according to the boundary information and the text line information, including:

optimizing a preset optimized object according to a first loss and a second loss to obtain the pixel point mapping information, wherein the optimized object is initialized pixel point mapping information, the first loss is a loss when the optimized object is constrained by the boundary information, and the second loss is a loss when the optimized object is constrained by the text line information.

2. The text recognition method of claim 1, further comprising: and outputting the corrected image and the text information.

3. An image rectification method comprising:

acquiring boundary information of a text in an image to be corrected, wherein the boundary information is used for indicating the boundary of the text in the image to be corrected;

acquiring text line information of a text in the image to be corrected, wherein the text line information is used for indicating the position of each text line in the image to be corrected;

determining pixel point mapping information according to the boundary information and the text line information, wherein the pixel point mapping information is used for indicating the corresponding relation between pixel points in the image to be corrected and pixel points in a template image;

filling pixel values of pixel points in the image to be corrected into corresponding pixel points in the template image according to the pixel point mapping information to obtain a corrected image;

4. The image rectification method according to claim 3, wherein the optimizing a preset optimization object according to the first loss and the second loss to obtain the pixel point mapping information includes:

optimizing the optimized object according to the first loss, the second loss and a regularization term to obtain the pixel point mapping information, wherein the regularization term is used for indicating the distance between pixel point pairs in the template image, and two pixel points, which are indicated by the optimized object and correspond to the pixel point pairs, are adjacent in the image to be corrected.

5. The image rectification method according to claim 4, wherein the optimizing the optimized object according to the first loss, the second loss and a regularization term to obtain the pixel point mapping information includes:

and optimizing the optimized object according to the weighted sum of the first loss, the second loss and the regularization term to obtain the pixel point mapping information.

6. The image rectification method according to claim 5, wherein the optimizing the optimized object according to the weighted sum of the first loss, the second loss and the regularization term to obtain the pixel point mapping information includes:

and optimizing the optimized object to minimize the weighted sum of the first loss, the second loss and the regularization item, so as to obtain the pixel point mapping information.

7. The image rectification method according to any one of claims 3 to 6, wherein the first loss is calculated by the following formula:

for characterizing the first loss, K being for characterizing a number of sample pixel points determined on each boundary indicated by the boundary information,

(x) sample pixel point with kth for characterizing the optimized object indication_k，y_k) The position of the corresponding pixel point in the width direction or the position in the height direction in the template image,

when i is 0 and j is 0, the sample pixel point (x)_k，y_k) A sample pixel point (x) located at an upper boundary indicated by the boundary information, where i is 1 and j is 0_k，y_k) A sample pixel point (x) located at the left boundary indicated by the boundary information, where i is equal to 1 and j is equal to 1_k，y_k) A sample pixel point (x) located at the right boundary indicated by the boundary information, where i is 0 and j is 1_k，y_k) And the lower boundary indicated by the boundary information is positioned.

8. The image rectification method according to any one of claims 3 to 6, wherein the second loss is calculated by the following formula:

the J is used for representing the second loss, the J is used for representing the number of pixel points on a horizontal line or a vertical line indicated by the text line information, the horizontal line is used for indicating the position of a text line in the image to be corrected, the vertical line is used for indicating the position of a text column in the image to be corrected, and the pixel point (x) is a pixel point when i is 0_k，y_k) And pixel point (x)_k+1，y_k+1) For two adjacent pixels on the same horizontal line, i is equal to 1, and then the pixel (x)_k，y_k) And pixel point (x)_k+1，y_k+1) Two adjacent pixel points on the same vertical line,

(x) sample pixel point (k + 1) for characterizing the optimization object indication_k+1，y_k+1) The position of the corresponding pixel point in the width direction or the position in the height direction in the template image,

9. the image rectification method according to any one of claims 4 to 6, wherein the regularization term is calculated by the following formula:

∈_dfor characterizing the regularization term or terms,

for characterizing the second derivative of the optimized object with respect to the lateral coordinates of a pixel point in the image to be rectified,

the second derivative of the optimized object with respect to the vertical coordinate of the pixel point in the image to be corrected is represented, and the beta is used for representing a preset weight coefficient.

10. A text recognition apparatus comprising:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an image to be identified, and the image to be identified comprises a document image, a bill image or a card image;

the second acquisition module is used for acquiring boundary information of the text in the image to be recognized, wherein the boundary information is used for indicating the boundary of the text in the image to be recognized;

the third acquisition module is used for acquiring text line information of a text in the image to be recognized, wherein the text line information is used for indicating the position of each text line in the image to be recognized;

the optimization module is used for determining pixel point mapping information according to the boundary information and the text line information, wherein the pixel point mapping information is used for indicating the corresponding relation between the pixel points in the image to be identified and the pixel points in the template image;

the filling module is used for filling the pixel values of the pixels in the image to be identified into the corresponding pixels in the template image according to the pixel mapping information to obtain a corrected image;

the recognition module is used for carrying out text recognition on the corrected image to obtain text information;

when the pixel point mapping information is determined according to the boundary information and the text line information, the optimization module optimizes a preset optimization object according to a first loss and a second loss to obtain the pixel point mapping information, wherein the optimization object is initialized pixel point mapping information, the first loss is a loss when the optimization object is constrained through the boundary information, and the second loss is a loss when the optimization object is constrained through the text line information.

11. An electronic device, comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction which causes the processor to execute the operation corresponding to the text recognition method according to any one of claims 1-2 or the operation corresponding to the image rectification method according to any one of claims 3-9.

12. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements a text recognition method as claimed in any one of claims 1-2 or an image rectification method as claimed in any one of claims 3-9.

13. A computer program product comprising computer instructions that instruct a computing device to perform operations corresponding to the text recognition method of any of claims 1-2 or the image rectification method of any of claims 3-9.