CN116110054A

CN116110054A - Training and image correction method, device, equipment and medium for document correction model

Info

Publication number: CN116110054A
Application number: CN202310116117.5A
Authority: CN
Inventors: 李星; 谢群义; 钦夏孟; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-02-08
Filing date: 2023-02-08
Publication date: 2023-05-12

Abstract

The disclosure provides a training and image correction method, device, equipment and medium of a document correction model, relates to the technical fields of deep learning, image processing and computer vision, and can be applied to scenes such as OCR. The specific implementation scheme is as follows: correcting the sample document image by adopting a document correction model to obtain a target document image; detecting text lines of the target document image to obtain a central line of at least one text line; training the document correction model according to at least one of: differences between image coordinates of pixels on the same central line and differences between image coordinates of pixels on different central lines at the same arrangement position in the target document image. Therefore, based on each pixel point on the text line center line, the center line direction of the text line on the document image corrected by the constraint model is matched with the reading direction of the document image, so that the correction effect of the document image is improved, and the correction and restoration accuracy of the document image is improved.

Description

Training and image correction method, device, equipment and medium for document correction model

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, image processing and computer vision, and can be applied to scenes such as OCR (Optical Character Recognition ) and the like, and particularly relates to a training and image correction method, device, equipment and medium of a document correction model.

Background

With the continuous popularization of electronic devices (such as smart phones and tablet computers), more and more users use cameras of the electronic devices to acquire document images, for example, in an office process, the users can call the cameras of the electronic devices to shoot paper documents through products or applications (such as office applications) in the electronic devices, so as to acquire document images.

Because the photographing levels of different users are different, the document images photographed by the cameras often have distortion conditions such as folding, bending, wrinkling and the like. How to correct and restore the distorted document image to improve the accuracy of character recognition in the document image and improve the efficiency of document digitization is very important.

Disclosure of Invention

The present disclosure provides a training and image correction method, device, apparatus and medium for a document correction model.

According to an aspect of the present disclosure, there is provided a training method of a document correction model, including:

acquiring a sample document image, and correcting the sample document image by adopting a document correction model to obtain a target document image;

performing text line detection on the target document image to obtain a central line of at least one text line area;

determining a target loss value from at least one of: differences between image coordinates of pixel points on the same central line in the target document image and differences between image coordinates of pixel points on the same arrangement position on different central lines in the target document image;

and training the document correction model according to the target loss value.

According to another aspect of the present disclosure, there is provided a document image correction method including:

acquiring a document image to be corrected;

correcting the document image by adopting a trained document correction model to obtain a corrected document image;

wherein, the document correction model is trained by the method according to the above aspect of the disclosure.

According to still another aspect of the present disclosure, there is provided a training apparatus of a document correction model, including:

The first acquisition module is used for acquiring a sample document image;

the correction module is used for correcting the sample document image by adopting a document correction model so as to obtain a target document image;

the detection module is used for detecting the text line of the target document image so as to obtain the central line of at least one text line area;

a determining module for determining a target loss value according to at least one of: differences between image coordinates of pixel points on the same central line in the target document image and differences between image coordinates of pixel points on the same arrangement position on different central lines in the target document image;

and the training module is used for training the document correction model according to the target loss value.

According to still another aspect of the present disclosure, there is provided a document image correction apparatus including:

the second acquisition module is used for acquiring a document image to be corrected;

the processing module is used for correcting the document image by adopting the trained document correction model so as to obtain a corrected document image;

wherein the document correction model is trained using the apparatus according to the above-described further aspect of the present disclosure.

According to still another aspect of the present disclosure, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method of the document correction model set forth in the above aspect of the disclosure or to perform the document image correction method set forth in the above aspect of the disclosure.

According to still another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium of computer instructions for causing the computer to perform the training method of the document correction model set forth in the above aspect of the present disclosure or to perform the document correction method set forth in the above aspect of the present disclosure.

According to yet another aspect of the present disclosure, there is provided a computer program product including a computer program which, when executed by a processor, implements the training method of the document correction model set forth in the above aspect of the present disclosure, or implements the document image correction method set forth in the above aspect of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a training method of a document correction model according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of a training method of a document correction model according to a second embodiment of the disclosure;

fig. 3 is a flowchart of a training method of a document correction model according to a third embodiment of the present disclosure;

fig. 4 is a flowchart of a training method of a document correction model according to a fourth embodiment of the present disclosure;

fig. 5 is a flowchart of a training method of a document correction model according to a fifth embodiment of the present disclosure;

FIG. 6 is a flowchart of a training method of a document correction model according to a sixth embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a correction flow of a document image according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram showing a correction effect of a document image according to an embodiment of the present disclosure;

FIG. 9 is a second schematic diagram of a correction effect of a document image according to an embodiment of the present disclosure;

FIG. 10 is a flowchart of a document image correction method according to a seventh embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of a training device for a document correction model according to an embodiment of the present disclosure;

FIG. 12 is a schematic view showing a structure of a document image correction apparatus according to a ninth embodiment of the present disclosure;

FIG. 13 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

At present, correction and restoration of a document image can be realized in a mode of affine transformation of edge control points of the document image.

However, this approach has at least the following problems:

Firstly, only the whole distortion condition of the image can be improved, and the local correction effect is poor;

second, the whole image is corrected, the correction effect of characters or characters in the image is not considered, and the recognition effect of character detection or character detection cannot be improved.

In view of at least one of the above problems, the present disclosure proposes a method, apparatus, device, and medium for training and image correction of a document correction model.

The following describes a training method, an apparatus, a device and a medium for document correction model according to an embodiment of the present disclosure with reference to the accompanying drawings.

Fig. 1 is a flowchart of a training method of a document correction model according to an embodiment of the disclosure.

The embodiment of the disclosure is exemplified by the training method of the document correction model being configured in the training device of the document correction model, and the training device of the document correction model can be applied to any electronic device so that the electronic device can execute the training function of the document correction model.

The electronic device may be any device with computing capability, for example, may be a personal computer (Personal Computer, abbreviated as PC), a mobile terminal, a server, etc., and the mobile terminal may be, for example, a vehicle-mounted device, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, etc., and may be a hardware device with various operating systems, a touch screen, and/or a display screen.

As shown in fig. 1, the training method of the document correction model may include the following steps:

step 101, obtaining a sample document image, and correcting the sample document image by adopting a document correction model to obtain a target document image.

In the embodiment of the disclosure, the obtaining manner of the sample document image is not limited, for example, the sample document image may be a document image obtained from an existing training set, or the sample document image may be an online collected document image, for example, the sample document image may be an offline collected document image by a web crawler technology, for example, a paper document may be photographed by an image collecting device to obtain the sample document image, or the sample document image may be a real-time collected document image, or the sample document image may be a manually synthesized document image, or the like.

In the embodiment of the present disclosure, a document correction model may be used to correct and restore a sample document image to obtain a corrected sample document image, which is referred to as a target document image in the present disclosure.

And 102, performing text line detection on the target document image to obtain a central line of at least one text line area.

In the embodiment of the disclosure, text line detection may be performed on the target document image to obtain at least one text line region, and for any one text line region, a center line of the text line region may be extracted to obtain a center line of each text line region. The direction of the center line matches the arrangement direction of the characters in the text line area, for example, when the arrangement direction of the characters in the text line area is horizontal, the direction of the center line may be horizontal, and for example, when the arrangement direction of the characters in the text line area is vertical, the direction of the center line may be vertical.

Step 103, determining a target loss value according to at least one of the following: differences between image coordinates of pixels on the same central line in the target document image, differences between image coordinates of pixels on the same arrangement position on different central lines in the target document image.

In the embodiment of the present disclosure, the image coordinates of the pixel point may be the coordinate positions of the pixel point in the image coordinate system, or may also be the coordinate positions of the pixel point in the pixel coordinate system.

The origin of coordinates in the image coordinate system may be the center point of the target document image, the horizontal axis (X-axis) is horizontal to the right, and the vertical axis (Y-axis) is horizontal to the down, and the units are pixels. The origin of coordinates of the pixel coordinate system may be the upper left corner of the target document image, the X axis being horizontal to the right and the Y axis being horizontal to the bottom, in pixels.

In one possible implementation manner of the embodiment of the present disclosure, the value of the loss function may be determined according to the difference between the image coordinates of each pixel point on the same central line in the target document image, which is referred to as a target loss value in the present disclosure.

For example, when the arrangement direction of the characters in the target document image is horizontal, that is, the reading direction is from left to right or from right to left, at this time, the target loss value may be determined according to the difference between the Y-axis values (that is, the ordinate) in the image coordinates of the pixels on the same central line, where the target loss value and the difference are in positive correlation, that is, the smaller the difference, the smaller the target loss value, and, conversely, the larger the difference, the larger the target loss value.

That is, when the arrangement direction of the characters in the target document image is horizontal, the Y-axis values (i.e., the ordinate) of the pixels on the same center line can be constrained to be on the same horizontal line, i.e., the ordinate of the pixels on the same center line can be constrained to be consistent.

For another example, when the arrangement direction of the characters in the target document image is vertical, that is, the reading direction is from top to bottom or from bottom to top, at this time, the target loss value may be determined according to the difference between the X-axis values (that is, the abscissa) in the image coordinates of the pixels on the same central line, where the target loss value and the difference are in positive correlation, that is, the smaller the difference, the smaller the target loss value, and, conversely, the larger the difference, the larger the target loss value.

That is, when the arrangement direction of the characters in the target document image is a vertical row, the X-axis values (i.e., the abscissa) of the pixels on the same center line can be constrained to lie on a vertical line, i.e., the abscissa of the pixels on the same center line can be constrained to be consistent.

In another possible implementation of the embodiment of the disclosure, the target loss value may be determined according to a difference between image coordinates of pixels located at the same arrangement position on different centerlines in the target document image.

For example, when the arrangement direction of the characters in the target document image is horizontal, the target loss value may be determined according to the difference between the X-axis values (i.e., the horizontal coordinates) in the image coordinates of the pixels at the same arrangement position on different central lines, where the target loss value and the difference are in positive correlation, i.e., the smaller the difference is, the smaller the target loss value is, and conversely, the larger the difference is, the larger the target loss value is.

That is, when the arrangement direction of the characters in the target document image is horizontal, the X-axis value (i.e., the abscissa) of the ith pixel point on the plurality of center lines may be constrained to lie on one vertical line, i.e., the abscissa of the ith pixel point on the plurality of center lines may be constrained to be consistent. Where i may be a positive integer.

For another example, when the arrangement direction of the characters in the target document image is vertical, the target loss value may be determined according to the difference between the Y-axis values (i.e., the ordinate) in the image coordinates of the pixels at the same arrangement position on different centerlines, where the target loss value and the difference are in positive correlation, i.e., the smaller the difference is, the smaller the target loss value is, and conversely, the larger the difference is, the larger the target loss value is.

That is, when the arrangement direction of the characters in the target document image is the vertical row, the Y-axis value (i.e., the ordinate) of the ith pixel point on the plurality of center lines may be constrained to be located on one horizontal line, i.e., the ordinate of the ith pixel point on the plurality of center lines may be constrained to be consistent.

In another possible implementation manner of the embodiment of the present disclosure, the target loss value may also be determined simultaneously according to a difference (hereinafter referred to as a first difference) between image coordinates of pixels on the same central line in the target document image, and according to a difference (hereinafter referred to as a second difference) between image coordinates of pixels on different central lines in the target document image at the same arrangement position.

The target loss value and the first difference are in positive correlation, and the target loss value and the second difference are also in positive correlation.

And 104, training the document correction model according to the target loss value.

In embodiments of the present disclosure, a document correction model may be trained based on target loss values.

As one example, model parameters in the document correction model may be adjusted based on the target loss value to minimize the target loss value.

It should be noted that, the foregoing example is only implemented by taking the termination condition of model training as the target loss value minimization, and other termination conditions may also be set in practical application, for example, the termination conditions may further include: the number of training times reaches the set number of times, the training time reaches the set time, etc., which is not limited by the present disclosure.

According to the training method of the document correction model, a sample document image is corrected by adopting the document correction model to obtain a target document image, text line detection is carried out on the target document image to obtain a central line of at least one text line area, and the document correction model is trained according to at least one of the following: differences between image coordinates of pixels on the same central line in the target document image, differences between image coordinates of pixels on the same arrangement position on different central lines in the target document image. Therefore, based on each pixel point on the text line center line, the center line direction of the text line on the document image corrected by the constraint model is matched with the reading direction of the document image, so that the correction effect of the document image is improved, and the correction and restoration accuracy of the document image is improved. That is, the document image can be corrected based on at least one of the lateral constraint and the longitudinal constraint of the text line, and the correction effect of the document image can be improved, thereby improving the character detection effect of the document image.

It should be noted that, in the technical solution of the present disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, etc. of the personal information of the user are all performed on the premise of proving the consent of the user, and all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.

In order to clearly illustrate how the above embodiments determine the target loss value, the present disclosure also proposes a training method of the document correction model.

Fig. 2 is a flowchart of a training method of a document correction model according to a second embodiment of the disclosure.

As shown in fig. 2, the training method of the document correction model may include the following steps:

step 201, a sample document image is obtained, and the sample document image is rectified by adopting a document rectification model to obtain a target document image.

Step 202, text line detection is performed on the target document image to obtain a center line of at least one text line area.

The explanation of steps 201 to 202 may be referred to the relevant descriptions in any embodiment of the present disclosure, and will not be repeated here.

Step 203, determining the sub-loss value of the same central line according to the difference between the image coordinates of each pixel point on the same central line in the target document image.

In the embodiment of the disclosure, for any text line area in the target text image, the sub-loss value of the center line of the text line area may be determined according to the difference between the image coordinates of each pixel point on the center line of the text line area.

For example, when the arrangement direction of the characters in the text line area is horizontal, that is, the reading direction is from left to right or from right to left, at this time, the sub-loss value of the center line of the text line area may be determined according to the difference between the Y-axis values (that is, the ordinate) in the image coordinates of the pixels on the center line of the text line area, where the sub-loss value of the center line of the text line area and the difference are in positive correlation, that is, the smaller the difference, the smaller the sub-loss value of the center line of the text line area, and conversely, the larger the difference, the larger the sub-loss value of the center line of the text line area.

That is, when the arrangement direction of the characters in the text line area is horizontal, the Y-axis value (i.e., the ordinate) of each pixel point on the same central line can be constrained to be located on a horizontal line, i.e., the ordinate of each pixel point on the same central line is constrained to be consistent.

For another example, when the arrangement direction of the characters in the text line area is vertical, that is, the reading direction is from top to bottom or from bottom to top, at this time, the sub-loss value of the center line of the text line area may be determined according to the difference between the X-axis values (that is, the abscissa) in the image coordinates of the pixels on the center line of the text line area, where the sub-loss value and the difference are in positive correlation, that is, the smaller the difference is, the smaller the sub-loss value is, and conversely, the larger the difference is, the larger the sub-loss value is.

That is, when the arrangement direction of the characters in the text line area is vertical, the X-axis value (i.e., the abscissa) of each pixel point on the same central line can be constrained to be located on a vertical line, i.e., the abscissa of each pixel point on the same central line is constrained to be consistent.

Step 204, determining a first loss value according to the sub-loss values of each center line.

In the embodiment of the disclosure, the first loss value may be determined according to the sub-loss value of each center line in the target document image.

As an example, the cumulative sum of the sub-loss values for each centerline may be taken as the first loss value.

As another example, the average of the sub-loss values of the respective centerlines may be taken as the first loss value.

As yet another example, the sub-loss values for each centerline may be weighted summed to obtain a first loss value.

Step 205, determining a second loss value according to the difference between the image coordinates of the pixel points at the same arrangement position on different central lines in the target document image.

In the embodiment of the disclosure, the second loss value may be determined according to a difference between image coordinates of pixel points at the same arrangement position on different central lines in the target document image.

For example, when the arrangement direction of the characters in the target document image is horizontal, the second loss value may be determined based on the difference between the X-axis values (i.e., horizontal coordinates) in the image coordinates of the pixel points at the same arrangement position on different center lines.

The second loss value and the difference are in positive correlation, namely, the smaller the difference is, the smaller the second loss value is, and conversely, the larger the difference is, the larger the second loss value is.

That is, when the arrangement direction of the characters in the target document image is horizontal, the X-axis value (i.e., the abscissa) of the ith pixel point on the plurality of center lines may be constrained to lie on one vertical line, i.e., the abscissa of the ith pixel point on the plurality of center lines may be constrained to be consistent.

For another example, when the arrangement direction of the characters in the target document image is vertical, the second loss value may be determined according to the difference between the Y-axis values (i.e., the ordinate) in the image coordinates of the pixels at the same arrangement position on different centerlines, where the second loss value and the difference are in positive correlation, i.e., the smaller the difference is, the smaller the second loss value is, and conversely, the larger the difference is, the larger the second loss value is.

Step 206, determining a target loss value according to at least one of the first loss value and the second loss value.

In one possible implementation manner of the embodiment of the present disclosure, the target loss value may be determined according to the first loss value, where the target loss value has a positive correlation with the first loss value, for example, the first loss value may be regarded as the target loss value.

In another possible implementation manner of the embodiment of the present disclosure, the target loss value may be determined according to the second loss value, where the target loss value and the second loss value have a positive correlation, for example, the second loss value may be regarded as the target loss value.

In another possible implementation of the disclosed embodiments, the target loss value may be determined from the first loss value and the second loss value.

As an example, the sum of the first loss value and the second loss value may be taken as the target loss value.

As another example, the average of the first loss value and the second loss value may be taken as the target loss value.

As yet another example, the first loss value and the second loss value may be weighted and summed to obtain the target loss value.

Step 207, training the document correction model according to the target loss value.

The explanation of step 207 may be referred to the relevant description in any embodiment of the disclosure, and will not be repeated here.

According to the training method of the document correction model, on the basis of each pixel point on the central line of each line of text, the central line direction of each line of text on the document image corrected by the constraint model is matched with the reading direction of the document image, so that the whole image correction effect of the document image can be improved, the local correction effect of the document image can be improved, and the direction of each line of text in the corrected document image is matched with the reading direction of the document image.

In one possible implementation manner of the embodiment of the present disclosure, in order to reduce the amount of computation and save computing resources, it may not be necessary to participate in computation for each pixel point on a central line, for example, for a central line of any text line area, the pixel points on the central line may be uniformly sampled, and the target loss value is computed only according to the sampled pixel points. The above process will be described in detail with reference to fig. 3.

Fig. 3 is a flowchart of a training method of a document correction model according to a third embodiment of the present disclosure.

As shown in fig. 3, the training method of the document correction model may include the following steps:

step 301, obtaining a sample document image, and correcting the sample document image by adopting a document correction model to obtain a target document image.

Step 302, text line detection is performed on the target document image to obtain a center line of at least one text line area.

The explanation of steps 301 to 302 may be referred to the relevant descriptions in any embodiment of the disclosure, and are not repeated here.

And 303, sampling all the pixel points on the same central line in the target document image at equal intervals according to the set interval to obtain a plurality of target pixel points.

The setting interval can be preset according to actual application requirements.

In the embodiment of the disclosure, for any text line area in the target document image, each pixel point on the central line of the text line area may be sampled at equal intervals (i.e., uniformly sampled) according to a set interval, so as to obtain a plurality of target pixel points on the central line of the text line area.

Step 304, determining a first coefficient according to a mean value of the first values of the first coordinate axes in the image coordinates of the plurality of target pixel points.

Wherein the first coefficient and the average value are in positive correlation.

In the embodiment of the present disclosure, the first coordinate axis is related to the reading direction of the target document image or the sample text image.

In a possible implementation manner of the embodiment of the present disclosure, the target document image may be identified based on an image recognition technology to determine an arrangement direction (such as a horizontal row and a vertical row) of characters in the target text image, or the arrangement direction of the characters in the target text image may be determined by a manually specified manner.

Thus, in the present disclosure, the first coordinate axis that matches the arrangement direction may be determined according to the arrangement direction. For example, when the arrangement direction is a horizontal row, the first coordinate axis is a vertical axis (Y axis), and for example, when the arrangement direction is a vertical row, the first coordinate axis is a horizontal axis (X axis).

Therefore, a calculation mode for determining the loss value in a targeted manner based on the reading direction of the document image can be realized, and the training effect of the model, namely the prediction accuracy of the model, can be improved.

In step 305, determining a sub-loss value of the same center line according to the difference between the first value and the first coefficient in the image coordinates of the plurality of target pixel points.

In the embodiment of the present disclosure, the first coefficient may be determined according to a mean value of first values of first coordinate axes in image coordinates of a plurality of target pixel points on a central line of a same text line area, and the sub-loss value of the central line of the same text line area may be determined according to a difference between the first values of the image coordinates of the plurality of target pixel points on the central line of the same text line area and the first coefficient.

As an example, when the arrangement direction of the characters in the text line area is a horizontal line, the sub-loss value for the center line of the j-th line text line area in the target document area may be:

wherein W represents the number of target pixel points on the central line of the jth line text line area, y (j, k) represents the ordinate of the kth target pixel point on the central line of the jth line text line area, y _j The mean (i.e., first coefficient) of the ordinate of each target pixel point on the centerline of the jth line text line region is represented.

Note that, when the arrangement direction of the characters in the text line area is a vertical row, the calculation manner of the sub-loss value of the center line of each text line area is similar to the calculation manner of the sub-loss value corresponding to the horizontal row, and will not be described herein.

Step 306, determining a first loss value according to the sub-loss values of each centerline.

The explanation of step 306 may be referred to the relevant descriptions in any embodiment of the disclosure, and will not be repeated here.

Taking the average value of sub-loss values according to each center line as a first loss value as an example, a first loss value L ₁ The method comprises the following steps:

where H is the number of Chinese line regions in the target document image.

Step 307, determining a second loss value according to the difference between the image coordinates of the pixel points at the same arrangement position on different central lines in the target document image.

The explanation of step 307 may be referred to the relevant description in any embodiment of the present disclosure, and will not be repeated here.

In one possible implementation manner of the embodiment of the present disclosure, in order to reduce the amount of computation and save computing resources, the second loss value may also be determined according to the target pixel point.

As an example, the second coefficient may be determined according to a mean value of second values of second coordinate axes in image coordinates of the target pixel points at the same arrangement position on each center line, and the second loss value may be determined according to a difference between the second values and the second coefficients in image coordinates of the target pixel points at the same arrangement position on each center line.

The second axis is perpendicular to the first axis, for example, when the first axis is a horizontal axis, the second axis may be a vertical axis, or when the first axis is a vertical axis, the second axis may be a horizontal axis.

For example, when the arrangement direction of the characters in the text line area is horizontal, the second coefficient may be determined according to the average value of the abscissa of the ith target pixel point on the center line of the text line area of each line, where the second coefficient is in positive correlation with the average value, so in the present disclosure, the second loss value may be determined according to the difference between the abscissa of the ith target pixel point on the center line of the text line area of each line and the second coefficient.

An example, the second loss value L ₂ The method comprises the following steps:

wherein x (j, i)) X represents the abscissa of the ith target pixel point on the centerline of the jth text line region _i The mean value (i.e., the second coefficient) of the abscissa representing the i-th target pixel point on the centerline of each line of text line region, i is a positive integer, for example, i may be 1, or i may be 2, 3, 4, or the like.

It should be noted that, when the arrangement direction of the characters in the text line area is a vertical row, the calculation manner of the second loss value is similar to the calculation manner of the second loss value corresponding to a horizontal row, which is not described herein.

Step 308, determining a target loss value according to at least one of the first loss value and the second loss value.

Step 309, training the document correction model according to the target loss value.

The explanation of steps 308 to 309 may be referred to the relevant descriptions in any embodiment of the disclosure, and are not repeated here.

According to the training method of the document correction model, the sub-loss value of the central line is not required to be determined according to the image coordinates of each pixel point on the central line, the pixels on the central line are uniformly sampled, the sub-loss value of the central line is calculated according to the sampled pixels, the calculated amount can be reduced, and the calculation resources are saved.

In order to clearly illustrate how the above embodiment corrects the sample document image based on the document correction model, the present disclosure also proposes a training method of the document correction model.

Fig. 4 is a flowchart of a training method of a document correction model according to a fourth embodiment of the present disclosure.

As shown in fig. 4, the training method of the document correction model may include the following steps:

in step 401, a sample document image is acquired.

The explanation of step 401 may be referred to the relevant description in any embodiment of the disclosure, and will not be repeated here.

And step 402, extracting features of the sample document image by adopting a first convolution network in the document correction model to obtain a feature map.

In the embodiment of the disclosure, the feature extraction may be performed on the sample document image by using a first convolution network in the document correction model, so as to obtain a feature map of the sample document image. For example, shallow features of the sample document image may be extracted based on the first convolution network to obtain a feature map of the sample document image.

And step 403, predicting the offset of the feature map by adopting a prediction network in the document correction model to obtain a first offset map, wherein the first offset map is used for indicating the coordinate offset of each pixel point in the sample document image.

In the embodiment of the disclosure, the feature map may be input to a prediction network in the document correction model to perform offset prediction, so as to obtain a first offset map, where the first offset map is used to indicate coordinate offsets of all pixel points in the sample document image.

For example, the first offset map may be a 2-channel offset map, where one channel is used to indicate a pixel offset value or a coordinate offset value (Δx) of the pixel point in the horizontal axis direction, and the other channel is used to indicate a pixel offset value or a coordinate offset value (Δy) of the pixel point in the vertical axis direction.

And step 404, correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the first offset graph to obtain the target document image.

In the embodiment of the disclosure, the image position of the corresponding pixel point in the sample document image may be corrected according to the coordinate offset of each pixel point indicated by the first offset map, so as to obtain the target document image.

For example, for any one pixel point in the sample document image, the image coordinate (x, y) of the pixel point may be updated to (x+Δx, y+Δy) according to the horizontal axis offset Δx and the vertical axis offset Δy corresponding to the pixel point indicated by the first offset map.

In the image coordinate system or the pixel coordinate system, the abscissa and the ordinate of the pixel point are both integer values, and at least one of Δx and Δy of the model prediction may not be integer values, in this case, after correcting the image coordinate of the pixel point, there may be a case where there is no pixel value on a part of the pixel points, and in this case, the pixel value of the pixel point may be calculated by interpolating the pixel values of the remaining pixel points adjacent to the pixel point by using an interpolation algorithm.

Step 405, performing text line detection on the target document image to obtain a center line of at least one text line area.

Step 406, determining a target loss value according to at least one of: differences between image coordinates of pixels on the same central line in the target document image, differences between image coordinates of pixels on the same arrangement position on different central lines in the target document image.

Step 407, training the document correction model according to the target loss value.

The explanation of steps 405 to 407 may be referred to the relevant description in any embodiment of the disclosure, and will not be repeated here.

According to the training method of the document correction model, correction can be achieved on each pixel point in the document image, correction effects of the whole image can be improved, and local correction effects of the image can be improved.

In order to clearly illustrate the above embodiments of the present disclosure, the present disclosure further provides a training method of a document correction model.

Fig. 5 is a flowchart of a training method of a document correction model according to a fifth embodiment of the present disclosure.

As shown in fig. 5, the training method of the document correction model may include the following steps:

Step 501, obtaining a sample document image, and extracting features of the sample document image by adopting a first convolution network in a document correction model to obtain a feature map.

In any one of the embodiments of the present disclosure, in order to reduce the processing burden of the model, the sample document image may be further preprocessed (including but not limited to background removal processing, denoising processing, etc.), and the feature extraction may be performed on the preprocessed sample document image by using the first convolution network in the document correction model, so as to obtain the feature map.

Step 502, predicting the offset of the feature map by using a prediction network in the document correction model to obtain a first offset map, where the first offset map is used to indicate the coordinate offset of each pixel point in the sample document image.

The explanation of steps 501 to 502 may be referred to the relevant description in any embodiment of the disclosure, and will not be repeated here.

Step 503, inputting the first offset map into a second convolution network in the document correction model to perform weight prediction so as to obtain a weight map, where the weight map is used to indicate coordinate correction weights of all pixel points in the sample document image.

In the embodiment of the disclosure, the first offset map may be further input into a second convolution network in the document correction model to perform weight prediction, so as to obtain a weight map, where the weight map is used to indicate coordinate correction weights of all pixel points in the sample document image.

The weight map may be a weight map of 1 channel, for example, Δx and Δy correspond to the same coordinate correction weight, or the weight map may be a weight map of 2 channels, where one channel (such as a first channel) is used to indicate the coordinate correction weight corresponding to Δx, and the other channel (such as a second channel) is used to indicate the coordinate correction weight corresponding to Δy.

Step 504, correcting the first offset map according to the weight map to obtain a second offset map; the second offset graph is used for indicating corrected coordinate offset corresponding to each pixel point in the sample document image.

In the embodiment of the present disclosure, the first offset map may be corrected according to the weight map, for example, the coordinate offset of the corresponding pixel indicated by the first offset map may be corrected according to the coordinate correction weight of each pixel indicated by the weight map, so as to obtain the second offset map.

For example, when the weight map is a 2-channel weight map, for any one pixel, the coordinate correction weight of the first channel of the pixel indicated by the weight map may be multiplied by Δx of the corresponding pixel indicated by the first offset map, and the coordinate correction weight of the second channel of the pixel indicated by the weight map may be multiplied by Δy of the corresponding pixel indicated by the first offset map, so as to obtain the second offset map.

And 505, correcting the image position of the corresponding pixel point in the sample document image according to the corrected coordinate offset corresponding to each pixel point indicated by the second offset graph so as to obtain the target document image.

In the embodiment of the disclosure, the image position of the corresponding pixel point in the sample document image may be corrected according to the coordinate offset indicated by the second offset map after the correction, so as to obtain the target document image.

For example, for any one pixel point in the sample document image, the corrected horizontal axis offset Δx corresponding to the pixel point indicated by the second offset map may be used ₁ And a corrected vertical axis offset deltay ₁ Updating the image coordinates (x, y) of the pixel point to (x+Deltax) ₁ ，y+Δy ₁ )。

In the image coordinate system or the pixel coordinate system, the abscissa and the ordinate of the pixel point are integer values, and Δx of the model prediction ₁ 、Δy ₁ At least one of the pixel points may not be an integer value, and after the image coordinates of the pixel points are corrected, a situation that no pixel value exists on part of the pixel points may occur, and for this case, for the pixel points without pixel values in the corrected sample text image, the pixel values of the pixel points may be calculated by using an interpolation algorithm in a manner of interpolating the pixel values of the rest of the pixel points adjacent to the pixel points.

At step 506, text line detection is performed on the target document image to obtain a centerline of at least one text line region.

Step 507, determining a target loss value according to at least one of: differences between image coordinates of pixels on the same central line in the target document image, differences between image coordinates of pixels on the same arrangement position on different central lines in the target document image.

And step 508, training the document correction model according to the target loss value.

The explanation of steps 506 to 508 may be referred to the relevant descriptions in any embodiment of the present disclosure, and are not repeated here.

According to the training method of the document correction model, correction of the offset graph output by the prediction network can be achieved, and the fine-granularity offset graph is obtained, so that correction and restoration of the document image are carried out based on the fine-granularity offset graph, and the correction effect of the image can be improved.

In order to clearly illustrate any embodiment of the disclosure, the disclosure further provides a training method of the document correction model.

Fig. 6 is a flowchart of a training method of a document correction model according to a sixth embodiment of the present disclosure.

As shown in fig. 6, the training method of the document correction model may include the following steps:

In step 601, a sample document image is acquired.

Step 602, judging whether the first image size of the sample document image is matched with the input requirement of the document correction model, if so, executing step 603, and if not, executing step 604.

In the embodiment of the present disclosure, an image size (referred to as a first image size in the present disclosure) of a sample document image may be obtained, and an input requirement of a document correction model may be obtained, for example, the input requirement of the document correction model includes: the size of the input image is the set image size.

Steps

603 and 611 to 613 may be performed in case the first image size matches the input requirement of the document correction model, and steps 604 to 613 may be performed in case the first image size does not match the input requirement of the document correction model.

It should be noted that, the

steps

603 and 604 are two implementations in parallel, and only one implementation is needed for practical application.

And 603, correcting the sample document image by adopting a document correction model to obtain a target document image.

In the embodiment of the disclosure, when the first image size of the sample document image is matched with the input requirement of the document correction model, the sample document image may be corrected by using the document correction model to obtain the target document image. The detailed description of the embodiments is referred to in any of the foregoing embodiments, and will not be repeated here.

Therefore, the document image with the image size matched with the input requirement of the document correction model is input into the document correction model for correction and restoration, the effectiveness of model prediction can be improved, and the effectiveness of image correction and restoration is further improved, so that the situation that the image cannot be corrected due to the fact that the image size is not matched with the input requirement of the document correction model is avoided.

In step 604, the sample document image is resized to obtain an adjusted sample document image.

In the embodiment of the present disclosure, in the case that the first image size does not match the input requirement of the document correction model, the sample document image may be resized to obtain an adjusted sample document image, where the image size of the adjusted sample document image (referred to as the second image size in the present disclosure) matches the input requirement of the document correction model.

And step 605, performing offset prediction on the adjusted sample document image by adopting a document correction model to obtain a third offset graph.

The third offset graph is used for indicating the coordinate offset of each pixel point in the adjusted sample document image.

In the embodiment of the disclosure, the offset prediction may be performed on the adjusted sample document image by using a document correction model to obtain a third offset map, where the third offset map is used to indicate the coordinate offset of each pixel point in the adjusted sample document image.

As a possible implementation manner, the first convolution network in the document correction model may be used to perform feature extraction on the adjusted sample document image to obtain a target feature map, and the prediction network in the document correction model may be used to perform offset prediction on the target feature map to obtain a third offset map, where the third offset map is used to indicate coordinate offsets of all pixels in the adjusted sample document image.

As another possible implementation manner, the first convolution network in the document correction model may be used to perform feature extraction on the adjusted sample document image to obtain a target feature map, and the prediction network in the document correction model may be used to perform offset prediction on the target feature map to obtain an intermediate offset map, where the intermediate offset map is used to indicate an initial coordinate offset map of each pixel point in the adjusted sample document image. Then, the intermediate offset graph can be input into a second convolution network in the document correction model to conduct weight prediction so as to obtain a target weight graph, wherein the target weight graph is used for indicating coordinate correction weights of all pixel points in the adjusted sample document image, and the intermediate offset graph is corrected according to the target weight graph so as to obtain a third offset graph; the third offset graph is used for indicating the final coordinate offset corresponding to each pixel point in the adjusted sample document image.

Step 606, determining whether the second image size of the adjusted sample document image is smaller than the first image size, if so, performing steps 607 to 608, and if not, performing steps 609 to 610.

It should be noted that, steps 607 to 608 and steps 609 to 610 are implemented in parallel, and only one implementation is needed for practical application.

Step 607 upsamples the third offset map to obtain a fourth offset map.

In the embodiment of the disclosure, when the second image size of the adjusted sample document image is smaller than the first image size, the third offset map may be upsampled to obtain a fourth offset map. The third image size of the fourth offset graph is matched with the first image size, and the fourth offset graph is used for indicating the coordinate offset of each pixel point in the sample document image.

And 608, correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the fourth offset graph to obtain the target document image.

In the embodiment of the disclosure, the image position of the corresponding pixel point in the sample document image may be corrected according to the coordinate offset of each pixel point indicated by the fourth offset map, so as to obtain the target document image.

For example, for any one pixel point in the sample document image, the horizontal axis offset Δx corresponding to the pixel point indicated by the fourth offset map may be used ₂ And a longitudinal and transverse axis offset Δy ₂ Updating the image coordinates (x, y) of the pixel point to (x+Deltax) ₂ ，y+Δy ₂ )。

Therefore, under the condition that the image size of the offset graph is smaller than that of the sample document image, the image size of the offset graph can be aligned with that of the sample document image by means of upsampling the offset graph, so that the image positions of all pixel points in the sample document image are corrected based on the aligned offset graph, and the overall correction and restoration effect of the sample document image can be improved.

Step 609, downsampling the third offset map to obtain a fifth offset map.

In an embodiment of the disclosure, in a case where the second image size is larger than the first image size, the third offset map may be downsampled to obtain a fifth offset map; the fourth image size of the fifth offset graph is matched with the first image size, and the fifth offset graph is used for indicating the coordinate offset of each pixel point in the sample document image.

And 610, correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the fifth offset graph to obtain the target document image.

In the embodiment of the disclosure, the image position of the corresponding pixel point in the sample document image may be corrected according to the coordinate offset of each pixel point indicated by the fifth offset map, so as to obtain the target document image.

For example, for any one of the sample document imagesA pixel point, which can be according to the horizontal axis offset Deltax corresponding to the pixel point indicated by the fifth offset map ₃ And a longitudinal and transverse axis offset Δy ₃ Updating the image coordinates (x, y) of the pixel point to (x+Deltax) ₃ ，y+Δy ₃ )。

In step 611, text line detection is performed on the target document image to obtain a center line of at least one text line area.

Step 612, determining a target loss value according to at least one of: differences between image coordinates of pixels on the same central line in the target document image, differences between image coordinates of pixels on the same arrangement position on different central lines in the target document image.

Step 613, training the document correction model according to the target loss value.

The explanation of steps 611 to 613 may be referred to the relevant description in any embodiment of the present disclosure, and will not be repeated here.

In summary, when the image size of the offset map is larger than the image size of the sample document image, the image size of the offset map and the image size of the sample document image can be aligned by downsampling the offset map, so that the image positions of all pixels in the sample document image are corrected based on the aligned offset map, and the overall correction and restoration effect of the sample document image can be improved.

In any of the embodiments of the present disclosure, the document image may be rectified and restored based on text line constraint, where the text line constraint is at least one of the following constraints: 1. for pixel points on a text line central line in a distorted document image, constraining each pixel point on the corresponding text line central line on the corrected image predicted by the document correction model to be positioned on a horizontal line, namely keeping the vertical coordinate consistent; 2. the abscissa of the ith pixel point on the central line of each line of text on the corrected image predicted by the constraint model is kept consistent.

As an example, the document image correction flow may be as shown in fig. 7, and mainly includes the following steps: preprocessing (such as background removal processing) and size adjustment processing are carried out on a distorted document image, the processed document image is input into a first convolution network (or convolution block) in a document correction model for feature extraction to obtain shallow visual image features, the shallow visual image features are input into a decoder and an encoder of a prediction network (such as a Transformer) in the document correction model for global association to obtain a coarse-grained first offset map, the first offset map is input into a second convolution network in the document correction model for weight prediction, the first offset map is corrected based on the weight map obtained by prediction to obtain a second offset map, then the second offset map can be up-sampled, and the distorted document image is corrected and restored according to the up-sampled second offset map to obtain a corrected document image.

In the method, a text line constraint loss function can be added in a training stage of a document correction model to enhance the correction effect of a document image, wherein the purpose of document correction is as follows: further improving the text detection effect and the character recognition effect, therefore, the correction effect of the granularity of the text or the character is important to the document correction capability.

In the training stage, regular directions of the text lines can be added, for example, the center line of each text line in the distorted document image can be uniformly sampled to obtain a target pixel point (or called a control point) of each text line, and after the prediction of the document correction model, the control points of the same text line are positioned on the same horizontal line, namely the vertical coordinates are equal. For example, the document image before correction and the document image after correction may be as shown in fig. 8 or as shown in fig. 9.

The above embodiments correspond to the training method of the document correction model, and the disclosure further provides an application method of the document correction model, that is, a document image correction method.

Fig. 10 is a flowchart of a document image correction method according to a seventh embodiment of the present disclosure.

As shown in fig. 10, the document image correction method may include the steps of:

In step 1001, a document image to be corrected is acquired.

In the embodiment of the disclosure, the method for acquiring the document image is not limited, for example, the document image may be an image acquired from an existing test set, or the document image may be an image acquired online, for example, by a web crawler technology, or the document image may also be an image acquired offline, for example, a paper document may be photographed by an image acquisition device to obtain the document image, or the document image may also be an image acquired in real time, or a sample document image may also be an image synthesized manually, or the like, which is not limited in the embodiment of the disclosure.

Step 1002, a document image is rectified by using a trained document rectification model to obtain a rectified document image.

The document correction model is obtained by training through any method embodiment.

In the embodiment of the disclosure, the document image can be rectified by adopting a trained document rectification model so as to obtain a rectified document image.

According to the document image correction method, a document image to be corrected is obtained; and correcting the document image by adopting the trained document correction model so as to obtain a corrected document image. Therefore, the correction and restoration of the document image based on the deep learning technology can be realized, the correction effect of the document image can be improved, namely the accuracy of the correction and restoration of the document image can be improved, and the character detection effect of the document image can be improved.

Corresponding to the training method of the document correction model provided by the embodiments of fig. 1 to 6, the present disclosure further provides a training device of the document correction model, and since the training device of the document correction model provided by the embodiments of the present disclosure corresponds to the training method of the document correction model provided by the embodiments of fig. 1 to 6, the implementation of the training method of the document correction model is also applicable to the training device of the document correction model provided by the embodiments of the present disclosure, which is not described in detail in the embodiments of the present disclosure.

Fig. 11 is a schematic structural diagram of a training device for a document correction model according to an embodiment of the present disclosure.

As shown in fig. 11, the training apparatus 1100 of the document correction model may include: a first acquisition module 1101, a correction module 1102, a detection module 1103, a determination module 1104, and a training module 1105.

Wherein, the first acquisition module 1101 is configured to acquire a sample document image.

And the rectification module 1102 is used for rectifying the sample document image by adopting the document rectification model so as to obtain a target document image.

And the detection module 1103 is configured to perform text line detection on the target document image to obtain a center line of at least one text line area.

A determining module 1104 for determining a target loss value according to at least one of: differences between image coordinates of pixels on the same central line in the target document image, differences between image coordinates of pixels on the same arrangement position on different central lines in the target document image.

The training module 1105 is configured to train the document correction model according to the target loss value.

In one possible implementation of the embodiments of the present disclosure, the determining module 1104 is configured to: the target document image determines a sub-loss value of the same central line according to the difference between the image coordinates of each pixel point on the same central line in the target document image; determining a first loss value according to the sub-loss values of the central lines; determining a second loss value according to differences between image coordinates of pixel points which are positioned at the same arrangement position on different central lines in the target document image; and determining a target loss value according to at least one of the first loss value and the second loss value.

In one possible implementation of the embodiments of the present disclosure, the determining module 1104 is configured to: sampling each pixel point on the same central line in the target document image at equal intervals according to the set interval to obtain a plurality of target pixel points; determining a first coefficient according to the average value of the first values of the first coordinate axes in the image coordinates of the plurality of target pixel points; and determining a sub-loss value of the same central line according to the difference between the first value and the first coefficient in the image coordinates of the plurality of target pixel points.

In one possible implementation of the embodiments of the present disclosure, the determining module 1104 is configured to: determining a second coefficient according to the average value of the second values of the second coordinate axes in the image coordinates of the target pixel points at the same arrangement position on each central line; and determining a second loss value according to the difference between the second value and the second coefficient in the image coordinates of the target pixel points at the same arrangement position on each central line.

In one possible implementation of the embodiments of the present disclosure, the determining module 1104 is further configured to: acquiring the arrangement direction of characters in a target document image; and determining a first coordinate axis matched with the arrangement direction according to the arrangement direction.

In one possible implementation of the embodiments of the present disclosure, the correction module 1102 is configured to: extracting features of the sample document image by adopting a first convolution network in the document correction model to obtain a feature map; carrying out offset prediction on the feature images by adopting a prediction network in the document correction model to obtain a first offset image, wherein the first offset image is used for indicating the coordinate offset of each pixel point in the sample document image; and correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the first offset graph so as to obtain a target document image.

In one possible implementation of the embodiments of the present disclosure, the correction module 1102 is configured to: inputting the first offset map into a second convolution network in the document correction model to perform weight prediction so as to obtain a weight map, wherein the weight map is used for indicating coordinate correction weights of all pixel points in the sample document image; correcting the first offset map according to the weight map to obtain a second offset map; the second offset graph is used for indicating the corrected coordinate offset corresponding to each pixel point in the sample document image; and correcting the image positions of the corresponding pixel points in the sample document image according to the corrected coordinate offset corresponding to each pixel point indicated by the second offset graph so as to obtain a target document image.

In one possible implementation of the embodiments of the present disclosure, the determining module 1104 is further configured to: a first image size of the sample document image is determined to match the input requirements of the document rectification model.

In one possible implementation of the embodiment of the present disclosure, the training apparatus 1100 of the document correction model may further include:

and the adjusting module is used for adjusting the size of the sample document image under the condition that the first image size is not matched with the input requirement of the document correction model so as to obtain an adjusted sample document image.

The prediction module is used for predicting the offset of the adjusted sample document image by adopting the document correction model so as to obtain a third offset graph; the third offset graph is used for indicating the coordinate offset of each pixel point in the adjusted sample document image.

The sampling module is used for upsampling the third offset graph to obtain a fourth offset graph under the condition that the second image size of the adjusted sample document image is smaller than the first image size; the third image size of the fourth offset graph is matched with the first image size, and the fourth offset graph is used for indicating the coordinate offset of each pixel point in the sample document image.

The correction module 1102 is further configured to correct an image position of a corresponding pixel in the sample document image according to the coordinate offset of each pixel indicated by the fourth offset map, so as to obtain a target document image.

In a possible implementation manner of the embodiment of the present disclosure, the sampling module is further configured to downsample the third offset map to obtain a fifth offset map if the second image size is greater than the first image size; the fourth image size of the fifth offset graph is matched with the first image size, and the fifth offset graph is used for indicating the coordinate offset of each pixel point in the sample document image.

The correction module 1102 is further configured to correct an image position of a corresponding pixel in the sample document image according to the coordinate offset of each pixel indicated by the fifth offset map, so as to obtain a target document image.

According to the training device for the document correction model, the document correction model is adopted to correct a sample document image to obtain a target document image, text line detection is carried out on the target document image to obtain a central line of at least one text line area, and the document correction model is trained according to at least one of the following: differences between image coordinates of pixels on the same central line in the target document image, differences between image coordinates of pixels on the same arrangement position on different central lines in the target document image. Therefore, based on each pixel point on the text line center line, the center line direction of the text line on the document image corrected by the constraint model is matched with the reading direction of the document image, so that the correction effect of the document image is improved, and the correction and restoration accuracy of the document image is improved. That is, the document image can be corrected based on at least one of the lateral constraint and the longitudinal constraint of the text line, and the correction effect of the document image can be improved, thereby improving the character detection effect of the document image.

Corresponding to the document image correction method provided by the embodiment of fig. 10 described above, the present disclosure also provides a document image correction apparatus, and since the document image correction apparatus provided by the embodiment of the present disclosure corresponds to the document image correction method provided by the embodiment of fig. 10 described above, the implementation of the document image correction method is also applicable to the document image correction apparatus provided by the embodiment of the present disclosure, and will not be described in detail in the embodiment of the present disclosure.

Fig. 12 is a schematic structural view of a document image correction apparatus according to a ninth embodiment of the present disclosure.

As shown in fig. 12, the document image correction apparatus 1200 may include: a second acquisition module 1201 and a processing module 1202.

The second obtaining module 1201 is configured to obtain a document image to be rectified.

A processing module 1202, configured to rectify the document image using the trained document rectification model to obtain a rectified document image.

Wherein the document correction model is trained using the apparatus as shown in the embodiment of fig. 11 described above.

The document image correction device of the embodiment of the disclosure obtains a document image to be corrected; and correcting the document image by adopting the trained document correction model so as to obtain a corrected document image. Therefore, the correction and restoration of the document image based on the deep learning technology can be realized, the correction effect of the document image can be improved, namely the accuracy of the correction and restoration of the document image can be improved, and the character detection effect of the document image can be improved.

To achieve the above embodiments, the present disclosure also provides an electronic device that may include at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method or the document image correction method of the document correction model according to any one of the above embodiments of the present disclosure.

To achieve the above embodiments, the present disclosure further provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the training method of the document correction model or the document image correction method set forth in any one of the above embodiments of the present disclosure.

To achieve the above embodiments, the present disclosure further provides a computer program product, which includes a computer program that, when executed by a processor, implements the training method of the document correction model or the document image correction method set forth in any of the above embodiments of the present disclosure.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 13 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. The electronic device may include the server and the client in the above embodiments. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 13, the electronic device 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to a computer program stored in a ROM (Read-Only Memory) 1302 or a computer program loaded from a storage unit 1308 into a RAM (Random Access Memory, random access/Memory) 1303. In the RAM 1303, various programs and data required for the operation of the electronic device 1300 can also be stored. The computing unit 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An I/O (Input/Output) interface 1305 is also connected to bus 1304.

Various components in electronic device 1300 are connected to I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, etc.; and a communication unit 1309 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1309 allows the electronic device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a CPU (Central Processing Unit ), GPU (Graphic Processing Units, graphics processing unit), various dedicated AI (Artificial Intelligence ) computing chips, various computing units running machine learning model algorithms, DSP (Digital Signal Processor ), and any suitable processor, controller, microcontroller, etc. The calculation unit 1301 executes the respective methods and processes described above, such as the training method of the document correction model described above or the document image correction method. For example, in some embodiments, the training method of the document correction model or the document image correction method described above may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1300 via the ROM 1302 and/or the communication unit 1309. When the computer program is loaded into the RAM 1303 and executed by the computing unit 1301, one or more steps of the training method of the document correction model or the document image correction method described above may be performed. Alternatively, in other embodiments, the computing unit 1301 may be configured to perform the training method of the document correction model or the document image correction method described above in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit System, FPGA (Field Programmable Gate Array ), ASIC (Application-Specific Integrated Circuit, application-specific integrated circuit), ASSP (Application Specific Standard Product, special-purpose standard product), SOC (System On Chip ), CPLD (Complex Programmable Logic Device, complex programmable logic device), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, RAM, ROM, EPROM (Electrically Programmable Read-Only-Memory, erasable programmable read-Only Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., CRT (Cathode-Ray Tube) or LCD (Liquid Crystal Display ) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network ), WAN (Wide Area Network, wide area network), internet and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service (Virtual Private Server, virtual special servers) are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, artificial intelligence is a subject of studying a certain thought process and intelligent behavior (such as learning, reasoning, thinking, planning, etc.) of a computer to simulate a person, and has a technology at both hardware and software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

Deep learning is a new research direction in the field of machine learning. It is the inherent law and presentation hierarchy of learning sample data, and the information obtained during these learning processes greatly helps the interpretation of data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data.

According to the technical scheme of the embodiment of the disclosure, a sample document image is corrected by adopting a document correction model to obtain a target document image, text line detection is carried out on the target document image to obtain a central line of at least one text line area, and the document correction model is trained according to at least one of the following: differences between image coordinates of pixels on the same central line in the target document image, differences between image coordinates of pixels on the same arrangement position on different central lines in the target document image. Therefore, based on each pixel point on the text line center line, the center line direction of the text line on the document image corrected by the constraint model is matched with the reading direction of the document image, so that the correction effect of the document image is improved, and the correction and restoration accuracy of the document image is improved. That is, the document image can be corrected based on at least one of the lateral constraint and the longitudinal constraint of the text line, and the correction effect of the document image can be improved, thereby improving the character detection effect of the document image.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions presented in the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of a document correction model, comprising:

And training the document correction model according to the target loss value.

2. The method of claim 1, wherein the determining a target loss value according to at least one of:

determining a sub-loss value of the same central line according to the difference between the image coordinates of all pixel points on the same central line in the target document image;

determining a first loss value according to the sub-loss value of each center line;

determining a second loss value according to differences between image coordinates of pixel points which are positioned at the same arrangement position on different central lines in the target document image;

and determining the target loss value according to at least one of the first loss value and the second loss value.

3. The method of claim 2, wherein the determining the sub-loss value of the same center line based on differences between image coordinates of pixels on the same center line in the target document image comprises:

sampling all pixel points on the same central line in the target document image at equal intervals according to a set interval to obtain a plurality of target pixel points;

determining a first coefficient according to the average value of the first values of the first coordinate axes in the image coordinates of the plurality of target pixel points;

And determining a sub-loss value of the same center line according to the difference between the first value and the first coefficient in the image coordinates of the target pixel points.

4. A method according to claim 3, wherein said determining a second loss value based on differences between image coordinates of pixels at the same arrangement position on different centerlines in said target document image comprises:

determining a second coefficient according to the average value of the second values of the second coordinate axes in the image coordinates of the target pixel points which are positioned at the same arrangement position on each central line;

and determining a second loss value according to the difference between a second value in the image coordinates of the target pixel points which are positioned at the same arrangement position on each central line and the second coefficient.

5. A method according to claim 3, wherein before determining the first coefficient from the mean of the first values of the first coordinate axes of the image coordinates of the plurality of target pixel points, the method further comprises:

acquiring the arrangement direction of characters in the target document image;

and determining a first coordinate axis matched with the arrangement direction according to the arrangement direction.

6. The method of claim 1, wherein the rectifying the sample document image using the document rectification model to obtain a target document image comprises:

Extracting features of the sample document image by adopting a first convolution network in the document correction model to obtain a feature map;

performing offset prediction on the feature map by adopting a prediction network in the document correction model to obtain a first offset map, wherein the first offset map is used for indicating the coordinate offset of each pixel point in the sample document image;

and correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the first offset graph so as to obtain the target document image.

7. The method of claim 6, wherein the correcting the image position of the corresponding pixel in the sample document image according to the coordinate offset of each pixel indicated by the first offset map to obtain the target document image includes:

inputting the first offset map into a second convolution network in the document correction model to conduct weight prediction so as to obtain a weight map, wherein the weight map is used for indicating coordinate correction weights of all pixel points in the sample document image;

correcting the first offset map according to the weight map to obtain a second offset map; the second offset graph is used for indicating corrected coordinate offset corresponding to each pixel point in the sample document image;

And correcting the image positions of the corresponding pixel points in the sample document image according to the corrected coordinate offset corresponding to each pixel point indicated by the second offset graph so as to obtain the target document image.

8. The method of claim 6, wherein before the feature extraction is performed on the sample document image using the first convolution network in the document correction model to obtain a feature map, the method further comprises:

a first image size of the sample document image is determined to match an input requirement of the document rectification model.

9. The method of claim 8, wherein the method further comprises:

under the condition that the first image size is not matched with the input requirement of the document correction model, carrying out size adjustment on the sample document image to obtain an adjusted sample document image;

performing offset prediction on the adjusted sample document image by adopting the document correction model to obtain a third offset map; the third offset graph is used for indicating the coordinate offset of each pixel point in the adjusted sample document image;

upsampling the third offset map to obtain a fourth offset map if the second image size of the adjusted sample document image is smaller than the first image size; the third image size of the fourth offset graph is matched with the first image size, and the fourth offset graph is used for indicating the coordinate offset of each pixel point in the sample document image;

And correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the fourth offset graph so as to obtain the target document image.

10. The method of claim 9, wherein the method further comprises:

downsampling the third offset map to obtain a fifth offset map if the second image size is greater than the first image size; the fourth image size of the fifth offset graph is matched with the first image size, and the fifth offset graph is used for indicating the coordinate offset of each pixel point in the sample document image;

and correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the fifth offset graph so as to obtain the target document image.

11. A document image correction method, comprising:

acquiring a document image to be corrected;

wherein the document correction model is trained using the method of any one of claims 1-10.

12. A training device for a document correction model, comprising:

the first acquisition module is used for acquiring a sample document image;

13. The apparatus of claim 12, wherein the means for determining is configured to:

14. The apparatus of claim 13, wherein the means for determining is configured to:

15. The apparatus of claim 14, wherein the means for determining is configured to:

16. The apparatus of claim 14, wherein the means for determining is further configured to:

Acquiring the arrangement direction of characters in the target document image;

17. The apparatus of claim 12, wherein the corrective module is to:

18. The apparatus of claim 17, wherein the rectification module is configured to:

19. The apparatus of claim 17, wherein the means for determining is further configured to:

20. The apparatus of claim 19, wherein the apparatus further comprises:

the adjusting module is used for adjusting the size of the sample document image to obtain an adjusted sample document image under the condition that the first image size is not matched with the input requirement of the document correction model;

the prediction module is used for predicting the offset of the adjusted sample document image by adopting the document correction model so as to obtain a third offset graph; the third offset graph is used for indicating the coordinate offset of each pixel point in the adjusted sample document image;

The sampling module is used for upsampling the third offset graph to obtain a fourth offset graph under the condition that the second image size of the adjusted sample document image is smaller than the first image size; the third image size of the fourth offset graph is matched with the first image size, and the fourth offset graph is used for indicating the coordinate offset of each pixel point in the sample document image;

and the correction module is further used for correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the fourth offset graph so as to obtain the target document image.

21. The apparatus of claim 20, wherein,

the sampling module is further configured to downsample the third offset map to obtain a fifth offset map if the second image size is greater than the first image size; the fourth image size of the fifth offset graph is matched with the first image size, and the fifth offset graph is used for indicating the coordinate offset of each pixel point in the sample document image;

and the correction module is further used for correcting the image position of the corresponding pixel point in the sample document image according to the coordinate offset of each pixel point indicated by the fifth offset graph so as to obtain the target document image.

22. A document image correction apparatus comprising:

wherein the document correction model is trained using the apparatus of any one of claims 12-21.

23. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method of the document correction model of any one of claims 1-10 or to perform the document image correction method of claim 11.

24. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the training method of the document correction model according to any one of claims 1 to 11 or to perform the document image correction method of claim 11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the steps of a training method of a document correction model according to any one of claims 1-10, or implements the steps of a document image correction method according to claim 11.