CN113627428A

CN113627428A - Document image correction method and device, storage medium and intelligent terminal device

Info

Publication number: CN113627428A
Application number: CN202110921692.3A
Authority: CN
Inventors: 江忠泽
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-11-09

Abstract

The application discloses a document image correction method and device, a storage medium and intelligent terminal equipment, and belongs to the technical field of computers. The method is applied to a distributed file system and comprises the following steps: the method comprises the steps of obtaining an original document image, inputting the original document image into an edge detection model comprising a semantic segmentation branch and an edge detection branch to obtain an edge probability image corresponding to the original document image, determining vertex coordinates of a document edge frame in the original document image from the edge probability image, calculating a perspective transformation matrix based on the vertex coordinates of the document edge frame in the original document image, and finally carrying out perspective correction on the original document image based on the perspective transformation matrix to obtain a target document image comprising a distorted and repaired front-view angle document image, so that reading and document archiving of a user are facilitated, and user experience is improved.

Description

Document image correction method and device, storage medium and intelligent terminal device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for correcting a document image, a storage medium, and an intelligent terminal device.

Background

In daily office life, it is often necessary to convert paper documents into electronic version documents, which is conventionally implemented using a scanner. With the popularization of intelligent terminal equipment and the improvement of the camera shooting quality of the intelligent terminal equipment, the paper document can be converted into the electronic document by using the intelligent terminal equipment in a shooting mode. However, since the shooting angle and the viewing range are difficult to be accurately controlled, when a paper document is converted into an electronic document by using a shooting mode, the paper document is skewed to some extent in a shot image when the shooting angle is not proper, and perspective deformation exists.

Therefore, the above-mentioned problems may cause the converted electronic version document to be inconvenient to read and archive.

Disclosure of Invention

The embodiment of the application provides a document image correction method and device, a storage medium and an intelligent terminal device, which are applied to the intelligent terminal device, can correct a document with distortion and skew in a document image, and are convenient for a user to read and file the document. The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a document image rectification method, where the method is applied to an intelligent terminal device, and the method includes:

acquiring an original document image;

inputting the original document image into an edge detection model to obtain an edge probability image corresponding to the original document image, wherein the edge detection model comprises a semantic segmentation branch and an edge detection branch, the semantic segmentation branch is used for obtaining a first edge image based on semantic information of the original text image, the edge detection branch is used for performing edge detection on the original text image to obtain a second edge image, and the edge probability image is a fusion image of the first edge image and the second edge image;

determining the vertex coordinates of the document in the original document image from the edge probability image;

and calculating a perspective transformation matrix based on the vertex coordinates of the document in the original document image, and performing perspective correction on the original document image according to the perspective transformation matrix to obtain a target document image.

In a second aspect, an embodiment of the present application provides a document image rectification apparatus, including:

the original image acquisition module is used for acquiring an original document image;

an edge detection module, configured to input the original document image into an edge detection model, to obtain an edge probability image corresponding to the original document image, where the edge detection model includes a semantic segmentation branch and an edge detection branch, the semantic segmentation branch is used to obtain a first edge image based on semantic information of the original document image, the edge detection branch is used to perform edge detection on the original document image to obtain a second edge image, and the edge probability image is a fused image of the first edge image and the second edge image;

the first coordinate acquisition module is used for determining the vertex coordinates of the document in the original document image from the edge probability image;

and the matrix correction module is used for calculating a perspective transformation matrix based on the vertex coordinates of the document in the original document image and carrying out perspective correction on the original document image according to the perspective transformation matrix to obtain a target document image.

In a third aspect, embodiments of the present application provide a storage medium having at least one instruction stored thereon, where the at least one instruction is adapted to be loaded by a processor and to perform the above-mentioned method steps.

In a fourth aspect, an embodiment of the present application provides an intelligent terminal device, which may include: a processor and a memory; wherein the memory stores at least one instruction adapted to be loaded by the processor and to perform the above-mentioned method steps.

The beneficial effects brought by the technical scheme provided by some embodiments of the application at least comprise:

the document image correction method provided by the embodiment of the application is adopted, firstly, an original document image is obtained, then the original document image is input into an edge detection model comprising a semantic segmentation branch and an edge detection branch, an edge probability image corresponding to the original document image is obtained, vertex coordinates of a document in the original document image are determined from the edge probability image, a perspective transformation matrix is calculated based on the vertex coordinates of the document in the original document image, and finally, the original document image is subjected to perspective correction based on the perspective transformation matrix, so that a target document image comprising a distorted and repaired front view angle document image is obtained, no skew and distortion exist, reading and document archiving of a user are facilitated, and user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 provides an exemplary illustration of document distortion in a document image according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a document image rectification method according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating an example of obtaining an edge probability map according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a document image rectification method according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an example of a straight line set and a first intersection set with filtered intersections according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating an example of obtaining vertex coordinates from four types of intersections according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating an example of computing vertex coordinates according to an embodiment of the present disclosure;

FIG. 8 provides an exemplary illustration of perspective correction for an embodiment of the present application;

FIG. 9 provides an exemplary diagram of target document clipping according to an embodiment of the present application;

FIG. 10 is a flowchart illustrating a document image rectification method according to an embodiment of the present application;

FIG. 11 provides an exemplary illustration of a refinement process in an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a document image rectification device according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an intelligent terminal device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description of the present application, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it is noted that, unless explicitly stated or limited otherwise, "including" and "having" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art. Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Before describing the embodiments of the present invention more clearly, some concepts of the present invention will be described in detail to better understand the present invention.

Edge detection-edge detection is a fundamental problem in image processing and computer vision, and the purpose of edge detection is to identify pixels in digital images where brightness changes are significant (typically the edges of a graphic). An edge is a collection of pixels on an image that have a fast gray level change.

Semantic segmentation: is a pixel-level classification of an image that classifies each pixel in the image and segments it according to the semantics of the image, which refers to the content of the image.

Edge probability image: the gray value of each pixel point on the gray image represents the probability that the pixel point is an edge pixel point.

Perspective transformation: the process of projecting an image from one viewing plane to another, so the perspective transformation is also called projection mapping. For example, in the process of shooting a document image, if the plane of the document is not parallel to the imaging plane of the camera, the shot document is distorted in the image. This distortion is one type of perspective distortion that can be particularly present as points closer to the camera appear larger and points further away appear smaller.

The correction of the distorted image by the perspective transformation needs to obtain the coordinates of a group of four points of the distorted image and the coordinates of a group of four points of the target image, a transformation matrix of the perspective transformation can be calculated through the two groups of coordinate points, and then the transformation of the transformation matrix is executed on the whole original image, so that the image correction can be realized.

Referring to fig. 1, an exemplary diagram of document distortion in a document image is provided for an embodiment of the present application. As shown in fig. 1, the intelligent terminal needs to shoot the paper document to obtain an electronic version document of the paper document, and in the shooting process, because the shooting angle and the shooting range are not controlled properly, the actual shooting effect is as shown in the document image in fig. 1, the paper document is obviously skewed in the electronic version document and has a large background area around, which is not convenient for reading and archiving, and it is difficult to realize a good recognition effect if OCR character recognition is required to be performed on the document in the following.

Based on this, embodiments of the present application provide a document image rectification method, an apparatus, a storage medium, and an intelligent terminal device, where an execution subject of the document image rectification method may be the document image rectification apparatus provided in embodiments of the present application, or the intelligent terminal device integrated with the document image rectification apparatus, where the document image rectification apparatus may be implemented in a hardware or software manner. The intelligent terminal device can be a smart phone, a tablet computer, a palm computer, a notebook computer, or an intelligent wearable device and the like which is provided with a processor (including but not limited to a general processor, a customized processor and the like) and a camera and has the capability of shooting images and processing images.

In the embodiment of the application, an original document image is obtained firstly, and then edge detection is carried out on the original document image by using an edge detection model comprising a semantic segmentation branch and an edge detection branch, so that the obtained edge probability image is more accurate; and determining the vertex coordinates of the document in the original document image from the edge probability image, calculating a perspective transformation matrix based on the vertex coordinates of the document in the original document image, and finally performing perspective correction on the original document image based on the perspective transformation matrix to obtain a target document image comprising the distorted and repaired front-view angle document image, so that convenience is brought to reading and document archiving of a user, and the user experience is improved.

For convenience of description, the following embodiments are all described by taking the smart terminal device as an example of a smart phone. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims. The flow diagrams depicted in the figures are merely exemplary and need not be performed in the order of the steps shown. For example, some steps are parallel, and there is no strict sequence relationship in logic, so the actual execution sequence is variable.

Referring to fig. 2, a flowchart of a document image rectification method is provided according to an embodiment of the present application. As shown in fig. 2, the document image rectification method may include the following steps S101 to S104.

S101, acquiring an original document image;

wherein, the original document image is the document image with distortion. The original document image may be a text image formed by shooting with a portable camera or a video camera.

It is understood that when shooting is performed by using a portable camera or a camera or other shooting equipment, the shot original document image has distortion problems to a greater or lesser extent due to the difficulty in controlling the shooting angle and the view range.

For example, the original document image may be referred to specifically as the electronic version document shown in fig. 1.

Optionally, the document types in the original document image include, but are not limited to, books (cover/inner page), cards (business/identification/bank card), tickets (train ticket/air ticket/invoice/receipt, etc.), paper (paper/poster/menu/leaflet/newspaper), PPT projection/TV/computer screen, etc.

S102, inputting the original document image into an edge detection model to obtain an edge probability image corresponding to the original document image, wherein the edge detection model comprises a semantic segmentation branch and an edge detection branch, the semantic segmentation branch is used for obtaining a first edge image based on semantic information of the original document image, the edge detection branch is used for performing edge detection on the original document image to obtain a second edge image, and the edge probability image is a fusion image of the first edge image and the second edge image;

specifically, the original document image is input into a trained edge detection model, a semantic segmentation branch in the edge detection model obtains a first edge image based on semantic information of the original document image, an edge detection branch in the edge detection model performs edge detection on the original document image to obtain a second edge image, and a final edge probability map is obtained based on a segmentation result in the first edge image and an edge detection result in the second edge image.

The edge detection model includes a semantic segmentation branch and an edge detection branch. The semantic segmentation branch is used for performing semantic segmentation on the original document image, specifically classifying each pixel point in the original document image and segmenting according to the semantics of each part of image in the original document image. In the embodiment of the application, an image formed by pixel points of which the semantics is a document in an original document image is mainly segmented to obtain a first edge image after semantic segmentation. The edge detection branch is used for detecting pixel points which are possibly edges in the original document image, the edges refer to a set of pixel points with fast gray level change in the original document image, and the edge detection is carried out on the original document image to obtain a second edge image.

It is understood that the edge detection branch performs edge detection on the original document image to obtain an edge in a second edge image, where the edge in the second edge image includes not only the edge of the document but also some edges that may exist in a background region outside the document, and the first edge image includes only a segmentation result of the document in the original document image, so that the first edge image and the second edge image are combined to filter out edges of the first edge image that do not belong to the document, thereby generating a final edge probability image. The edge probability map comprises a document edge frame corresponding to the document.

In this embodiment of the application, the edge probability map may be a gray scale image, where a gray scale value of each pixel in the edge probability map indicates a probability that the pixel is an edge, for example, if a probability that a pixel 1 is an edge is 0.5, a gray scale value displayed in the edge probability map is 255 × 0.5 — 127.5.

Referring to fig. 3, an exemplary schematic diagram of obtaining an edge probability image is provided for the embodiment of the present application. As shown in fig. 3, the edge detection model includes an encoding network, a decoding network, a semantic division branch and an edge detection branch, an original document image is input into the edge detection model, a first edge image can be obtained from the semantic division branch in the edge detection model, a second edge image can be obtained from the edge detection branch in the edge detection model, and the edge probability image shown in fig. 3 can be obtained by combining the results of the first edge image and the second edge image. It can be easily seen that the first edge image shown in fig. 3 is obtained by performing semantic segmentation on the original document image through a semantic segmentation branch, where the document region and the non-document region where the document is located are indicated in the first edge image, and as shown in fig. 3, a white region in the first edge image is a document region, and a black region is a non-document region; the second edge image is obtained by carrying out edge detection on the original document image through the edge detection branch, the second edge image not only comprises the edge of the document but also comprises invalid edges except the document, and the invalid edges which do not belong to the document in the second edge image can be filtered out through the first edge image obtained by combining the second edge image with the semantic segmentation branch to obtain the edge probability image only comprising the document edge.

In the embodiment of the present application, document edges in the edge probability image are collectively referred to as a document edge frame.

S103, determining the vertex coordinates of the document in the original document image from the edge probability image;

in an implementation manner, a straight line set based on the document edge frame may be obtained by performing straight line detection on the document edge frame in the edge probability image, the intersection points between every two straight lines in the straight line set may be roughly divided into different regions, the centroid is respectively obtained for the intersection points in the different regions, and the obtained centroid may be used as the vertex coordinates of the document in the original document image.

In an implementation manner, a straight line set based on a document edge frame can be obtained by performing straight line detection on the document edge frame in the edge probability image, then a rectangular frame set formed by any four straight lines in the straight line set is obtained, and the vertex coordinate of the largest rectangular frame is used as the vertex coordinate of the document in the original document image.

And S104, calculating a perspective transformation matrix based on the vertex coordinates of the document in the original document image, and carrying out perspective correction on the original document image according to the perspective transformation matrix to obtain a target document image.

Specifically, the vertex coordinates of the document in the target document image are calculated and obtained based on the vertex coordinates of the document in the original document image, a perspective transformation matrix is calculated and obtained based on the corresponding transformation relation between the vertex coordinates of the document in the original document image and the vertex coordinates of the document in the target document image, and then the original document image is subjected to perspective correction according to the perspective transformation matrix to obtain the target document image.

Referring to fig. 4, a flowchart of a document image rectification method is provided according to an embodiment of the present application. The execution main body of the embodiment of the application is intelligent terminal equipment. As shown in fig. 4, the document image rectification method may include the following steps.

S201, acquiring an original document image;

specifically, step S201 may refer to the detailed description in step S101, which is not repeated herein.

S202, inputting the original document image into an edge detection model to obtain an edge probability image corresponding to the original document image, wherein the edge detection model comprises a semantic segmentation branch and an edge detection branch, the semantic segmentation branch is used for obtaining a first edge image based on semantic information of the original document image, the edge detection branch is used for performing edge detection on the original document image to obtain a second edge image, and the edge probability image is a fusion image of the first edge image and the second edge image;

specifically, step S202 may refer to the detailed description in step S102, which is not repeated herein.

S203, carrying out straight line detection on the document edge frame in the edge probability image by using Hough transform to obtain a straight line set;

specifically, each edge of the document edge frame in the edge probability image can be regarded as a set of a plurality of straight lines, and straight line detection is performed on the document edge frame in the edge probability image by using hough transform to obtain a set of all the straight lines in the document edge frame.

And each straight line in the straight line set is a straight line in an image coordinate system of the edge probability image.

Optionally, the straight line detection on the document edge frame in the edge probability image may also be implemented by other straight line detection algorithms, for example: a Hough _ line detection algorithm, an LSD line detection algorithm, an FLD line detection algorithm and the like.

S204, calculating the intersection point of every two straight lines in the straight line set to obtain a second intersection point set;

specifically, based on a linear equation of each straight line in the straight line set in an image coordinate system of the edge probability image, an intersection point of every two straight lines in the straight line set is calculated to obtain a second straight line set.

S205, performing intersection filtering processing on the second intersection set to obtain a first intersection set;

specifically, each intersection point in the second intersection point set is traversed, a first target intersection point where an included angle between two straight lines corresponding to the intersection point does not satisfy an included angle interval and a second target intersection point which is not located in a document edge frame in the edge probability image are determined, and the first target intersection point and the second target intersection point are filtered out to obtain a first intersection point set.

It is understood that, since there may be some error in edge detection and line detection, each intersection in the second intersection set may have an unreasonable intersection that is too far away from the vertex of the document edge frame in the edge probability image.

The first target intersection point for determining that the included angle between the two straight lines corresponding to the intersection point does not satisfy the included angle interval means that, in an actual situation, any two adjacent sides of a document are generally perpendicular to each other, that is, the included angle between the adjacent sides is 90 degrees, while a shot original document image has perspective deformation, and the document has a skew problem and a distortion problem in the image, which may cause the included angle between the adjacent sides of the document presented in the original document image to be near 90 degrees and generally not to exceed ± 30 degrees, each straight line detected in the edge probability image may be approximated to a side of the document, the intersection point of every two straight lines may be approximated to a vertex of the document, that is, the included angle between the two straight lines corresponding to each intersection point should satisfy a certain included angle interval, and the intersection point may be considered to be a reasonable intersection point. Therefore, the intersection point formed by two straight lines of which the included angles do not meet the included angle interval is used as a first target intersection point, and the first target intersection point is filtered in the first intersection point set, so that the reasonability of each intersection point in the first intersection point set is improved.

The second target intersection point which is not located in the document edge frame in the marginal probability image is determined, and it can be understood that due to the uncertainty of line detection and the problem of perspective deformation of a document in an original document image, when the intersection point between every two lines in a line set is calculated, part of the intersection points may fall outside the document edge frame in the marginal probability image, even outside the marginal probability image, and an unreasonable intersection point which is not located in the document edge frame in the marginal probability image is taken as the second target intersection point, so that the second target intersection point is filtered in the first intersection point set, and the reasonability of each intersection point in the first intersection point set is improved.

Specifically, the manner of determining whether the intersection point is located in the document edge frame in the marginal probability image may be that, determining whether the gray value of the pixel point corresponding to the intersection point is greater than a gray value threshold, if the gray value of the pixel point corresponding to the intersection point is greater than the gray value threshold, determining that the intersection point is located in the document edge frame, and if the gray value of the pixel point corresponding to the intersection point is not greater than the gray value threshold, determining that the intersection point is not located in the document edge frame.

Optionally, the mode of determining whether the intersection point is located in the document edge frame in the marginal probability image may be further configured to determine whether a probability value that a pixel point corresponding to the intersection point is a margin is greater than a probability value threshold, determine that the intersection point is located in the document edge frame if the probability value that the pixel point corresponding to the intersection point is a margin is greater than the probability value threshold, and determine that the intersection point is not located in the document edge frame if the probability value that the pixel point corresponding to the intersection point is a margin is not greater than the probability value threshold.

Referring to fig. 5 together with the request of steps S203 to S205, an exemplary schematic diagram of a straight line set and a first intersection set after filtering the intersection is provided for the embodiment of the present application.

As shown in FIG. 5, the straight line set is obtained by performing straight line detection on the document edge frame of the edge probability image, and the first intersection point set is an intersection point of every two straight lines in the straight line set.

S206, classifying each intersection point in the first intersection point set by adopting a clustering algorithm to obtain four types of intersection points;

specifically, the centroid of the first intersection point set is calculated to obtain a centroid coordinate of the centroid of the first intersection point set, the centroid coordinate is used as a coordinate origin to draw a rectangular coordinate system, and each intersection point in the first intersection point set is divided into four types of intersection points based on four quadrants of the rectangular coordinate system.

S207, calculating a clustering center of each of the four types of intersection points to obtain four clustering centers, and taking coordinates of the four clustering centers as vertex coordinates of the document in the original document image;

referring to fig. 6, an exemplary diagram of obtaining vertex coordinates from four types of intersections is provided for the embodiment of the present application. As shown in fig. 6, a rectangular coordinate system is established with the centroid of the first intersection set as the origin of coordinates, each intersection in the first intersection set is divided into 4 classes according to four quadrants of the rectangular coordinate system, and the cluster centers of the 4 classes of intersections are calculated to obtain four vertex coordinates, for example, a graph in which the cluster center of the first class of intersection is the vertex V1, the cluster center of the second class of intersection is the vertex V2, the cluster center of the third class of intersection is the vertex V3, and the cluster center of the fourth class of intersection is the vertex V4.

S208, calculating the vertex coordinates of the document in the target document image based on the vertex coordinates of the document in the original document image;

specifically, the vertex coordinates of the document in the target document image may be obtained based on the vertex coordinates of the document in the original document image in the following manner. Assuming that the vertex coordinate of the document in the original document image is V₁(x₁，y₁)，V₂(x₂，y₂)，V₃(x₃，y₃)，V₄(x₄，y₄) Then the vertex coordinates of the document in the target document image may be V'₁(x′₁，y′₁)，V′₂(x′₂，y′₂)，V′₃(x′₃，y′₃)，V′₄(x′₄，y′₄)。

Wherein the content of the first and second substances,

referring to fig. 7, an exemplary diagram for calculating vertex coordinates is provided according to an embodiment of the present disclosure. As shown in FIG. 7, according to the algorithm described above, the vertex V may be represented₁Calculating to obtain the vertexV′₁From the indicated vertex V₂Calculating to obtain the vertex V'₂From the indicated vertex V₃Calculating to obtain the vertex V'₃From the indicated vertex V₄Calculating to obtain the vertex V'₄。

S209, calculating to obtain a perspective transformation matrix based on the corresponding transformation relation between the vertex coordinates of the document in the original document image and the vertex coordinates of the document in the target document image;

specifically, the vertex coordinates V of the document in the original document image in step S208₁And the vertex coordinates V 'of the document in the target document image'₁For example, according to the principle of perspective transformation, assume V₁And V'₁The corresponding coordinate in three-dimensional space is (X)₁，Y₁，Z₁) Then V will be₁The process of perspective transformation to three-dimensional space can be represented as:

wherein the content of the first and second substances,

is a perspective transformation matrix.

Will (X)₁，Y₁，Z₁) Projected onto the target plane, then

Namely, it is

Another a₃₃Developing the above formula, we can obtain:

from the vertex coordinates V of the document in the original document image₁And the vertex coordinates V of the document in the target document image′₁Two equations may be determined. By the same token, three additional sets of vertices, i.e., V, can be used₂And V'₂、V₃And V'₃、V₄And V'₄Six other equations are determined. From four groups of vertexes, 8 equations can be obtained, and the 8 unknowns can be solved by using the 8 equations, namely, the perspective transformation matrix M is solved.

S210, performing perspective correction on the original document image according to the perspective transformation matrix to obtain a target document image;

referring to fig. 8, an exemplary perspective correction is provided for an embodiment of the present application. As shown in fig. 8, the original document image is processed by the perspective change matrix obtained in step S209 to obtain the target document image, and it is easy to see that the document in the target document image is a front-view document.

S211, cutting the target document image based on the vertex coordinates of the document in the target document image to obtain the target document.

Specifically, the target document image is cut, and a background area outside the document in the target document image is cut according to the vertex coordinates of the document in the target document image, so that a final target document is obtained.

Referring to fig. 9, an exemplary diagram of target document clipping is provided according to an embodiment of the present application. As shown in FIG. 9, the target document shown in the figure is obtained by cutting the target document image, and the target document does not include a background area, so that the target document is more convenient to read and store.

In the embodiment of the application, an original document image is obtained firstly, and then edge detection is carried out on the original document image by using an edge detection model comprising a semantic segmentation branch and an edge detection branch, so that the obtained edge probability image is more accurate, the error of document image correction can be reduced, and the calculated amount in the document image correction process is reduced; then, a straight line set is obtained by carrying out straight line detection on a file edge frame in the edge probability image, intersection points of every two straight lines in the straight line set are calculated to obtain a second intersection point set, and intersection point filtering processing is carried out on each intersection point in the second intersection point set, so that the reasonability of each intersection point in the first intersection point set is fully guaranteed, and the accuracy of document image correction is improved; then clustering all the intersection points in the first intersection point set to obtain four types of intersection points, taking the mass centers of the four types of intersection points as vertex coordinates of documents in an original document image, then calculating the vertex coordinates of the documents in a target document image based on the vertex coordinates of the documents in the original document image, then calculating to obtain a perspective transformation matrix based on the vertex coordinates of the documents in the original document image and the vertex coordinates of the documents in the target document image, and finally performing perspective correction on the original document image based on the perspective transformation matrix to obtain a target document image comprising a distorted and repaired front-view angle document image, so that convenience is brought to reading and document archiving of a user, and user experience is improved.

In an implementable manner, after the edge probability map is generated, the edge probability map may be first refined to obtain a refined edge map after refinement, and then the operations of obtaining vertex coordinates and a perspective transformation matrix are performed based on the refined edge map obtained through refinement.

Referring to fig. 10, a flowchart of a document image rectification method is provided according to an embodiment of the present application. As shown in fig. 10, the document image rectification method may include the following steps.

S301, acquiring an original document image;

specifically, step S301 may refer to the detailed description in step S101, which is not repeated herein.

S302, inputting the original document image into an edge detection model to obtain an edge probability image corresponding to the original document image, wherein the edge detection model comprises a semantic segmentation branch and an edge detection branch, the semantic segmentation branch is used for obtaining a first edge image based on semantic information of the original document image, the edge detection branch is used for performing edge detection on the original document image to obtain a second edge image, and the edge probability image is a fusion image of the first edge image and the second edge image;

specifically, step S302 may refer to the detailed description in step S102, which is not repeated herein.

S303, thinning the edge probability image to obtain a thinned edge image;

specifically, binarization processing is performed on the edge probability image to obtain a binarization edge image, edge filtering processing is performed on the binarization edge image to obtain a filtering edge image, and an image thinning algorithm is used for thinning the filtering edge image to obtain a thinned edge image.

The edge probability image may be a gray scale image, where the gray scale value of each pixel in the edge probability image represents the probability that the pixel is an edge, for example, if the probability that pixel 1 is an edge is 0.5, the gray scale value displayed in the edge probability image is 255 × 0.5 — 127.5. And performing binarization processing on the edge probability image to obtain a binarized edge image, wherein the binarized edge image can traverse all pixel points in the edge probability image, judge whether the probability value of the pixel points as edges is greater than a probability value threshold, take the pixel points with the probability value greater than the probability value threshold as the edges, and take the pixel points with the probability value not greater than the probability value threshold as a background to obtain the binarized edge image.

Optionally, the binarization processing is performed on the edge probability image to obtain a binarization edge image, and the binarization edge image may also be obtained by traversing all pixel points in the edge probability image, judging whether the gray value of the pixel point is greater than a gray value threshold, using the pixel point with the gray value greater than the gray value threshold as an edge, and using the pixel point with the gray value not greater than the gray value threshold as a background, so as to obtain a binarization edge image.

The edge filtering processing on the binary edge image means that some edges which do not belong to the document edge can be mistakenly identified in the process of edge detection on the original document image, the mistakenly identified edges are usually edge blocks with smaller areas in the binary edge image, and the edge blocks which are smaller than the area threshold or the perimeter threshold can be filtered out from the binary edge image in a mode of setting the area threshold or the perimeter threshold, so that a filtered edge image is obtained.

The document edge frames in the filtered edge images can be further refined by performing refinement processing on the filtered edge images by using an image refinement algorithm.

Referring to fig. 11, an exemplary schematic diagram of a refinement process is provided in the embodiment of the present application. As shown in fig. 11, the binarization processing and the edge filtering processing are performed on the edge probability map to refine edges and filter out edges that are misrecognized, so as to obtain a filtered edge image, and then the image refinement is performed on the filtered edge image, so as to obtain a refined edge image in which a document edge frame is further refined.

S304, determining the vertex coordinates of the document in the original document image from the refined edge image;

specifically, step S304 may refer to the descriptions in step S203 to step S207, and the edge probability image in step S203 to step S207 may be replaced by a refined edge image, which is not described herein again.

S305, calculating a perspective transformation matrix based on the vertex coordinates of the document in the original document image;

specifically, step S305 may refer to the descriptions in step S208 to step S209, and the edge probability image in step S203 to step S207 may be replaced by a refined edge image, which is not described herein again.

S306, performing perspective correction on the original document image according to the perspective transformation matrix to obtain a target document image;

s307, the target document image is cut based on the vertex coordinates of the document in the target document image, and the target document is obtained.

In the embodiment of the application, an acquired original document image is input into an edge detection model comprising a semantic segmentation branch and an edge detection branch, after an edge probability image corresponding to the original document image is obtained, the edge probability image is subjected to thinning processing, edges in the edge probability image are further thinned to obtain a thinned edge image, vertex coordinates of a document in the original document image are determined from the thinned edge image, a perspective transformation matrix is calculated based on the vertex coordinates of the document in the original document image, the original document image is subjected to perspective correction according to the perspective transformation matrix to obtain a target document image, and finally the target document image is cut to obtain a target document, and the edge probability image is subjected to thinning processing, so that the accuracy of document image correction is improved, and the calculation amount in the processes of straight line detection and intersection point calculation is greatly reduced, the speed of document image rectification is improved, and better user experience is realized.

Referring to fig. 12, a schematic structural diagram of a document image rectification device according to an embodiment of the present application is provided. As shown in fig. 12, the document image rectification apparatus 1 may be implemented by software, hardware, or a combination of both as all or a part of an intelligent terminal device. According to some embodiments, the document image rectification device 1 includes an original image acquisition module 11, a duplicate adjustment module 12, a first coordinate acquisition module 13, and a matrix rectification module 14, and specifically includes:

an original image obtaining module 11, configured to obtain an original document image;

an edge detection module 12, configured to input the original document image into an edge detection model, to obtain an edge probability image corresponding to the original document image, where the edge detection model includes a semantic segmentation branch and an edge detection branch, the semantic segmentation branch is used to obtain a first edge image based on semantic information of the original document image, the edge detection branch is used to perform edge detection on the original document image to obtain a second edge image, and the edge probability image is a fused image of the first edge image and the second edge image;

a first coordinate obtaining module 13, configured to determine vertex coordinates of a document in the original document image from the edge probability image;

and the matrix correction module 14 is configured to calculate a perspective transformation matrix based on the vertex coordinates of the document in the original document image, and perform perspective correction on the original document image according to the perspective transformation matrix to obtain a target document image.

Optionally, the apparatus further comprises:

a thinning processing module 15, configured to perform thinning processing on the edge probability image to obtain a thinned edge image;

optionally, the first coordinate obtaining module 13 is specifically configured to:

determining the vertex coordinates of the document in the original document image from the refined edge image;

optionally, the refining processing module 15 is specifically configured to:

carrying out binarization processing on the edge probability image to obtain a binarization edge image;

carrying out edge filtering on the binary edge image to obtain a filtered edge image;

and thinning the filtered edge image by using an image thinning algorithm to obtain a thinned edge image.

Optionally, the first coordinate obtaining module 13 includes:

a line detection unit 131, configured to perform line detection on the document edge frame in the edge probability image by using hough transform to obtain a line set;

an intersection set obtaining unit 132, configured to calculate an intersection of every two straight lines in the straight line set, so as to obtain a first intersection set;

an intersection clustering unit 133, configured to classify each intersection in the intersection set by using a clustering algorithm to obtain four types of intersections;

the first coordinate obtaining unit 134 is configured to calculate a cluster center of each of the four types of intersection points to obtain four cluster centers, and use coordinates of the four cluster centers as vertex coordinates of a document in the original document image.

Optionally, the intersection set obtaining unit 132 further includes:

an intersection point set obtaining subunit 1321, configured to calculate an intersection point of every two straight lines in the straight line set, and perform intersection point filtering processing on the intersection point set to obtain a second intersection point set;

and an intersection filtering subunit 1322 is configured to perform intersection filtering processing on the second intersection set to obtain a first intersection set.

Optionally, the intersection filtering subunit 1322 is specifically configured to:

traversing each intersection point in the second intersection point set, and determining a first target intersection point of an included angle between two straight lines corresponding to the intersection point, which does not meet the included angle interval, and a second target intersection point which is not located in a document edge frame in the edge probability image;

and filtering the first target intersection point and the second target intersection point to obtain a first intersection point set.

Optionally, the intersection clustering unit 133 is specifically configured to:

calculating the centroid of the first intersection point set to obtain the centroid coordinate of the centroid of the first intersection point set;

and drawing a rectangular coordinate system by taking the centroid coordinate as a coordinate origin, and dividing each intersection point in the first intersection point set into four types of intersection points based on four quadrants of the rectangular coordinate system.

Optionally, the matrix rectification module 14 includes:

a second coordinate obtaining unit 141, configured to calculate vertex coordinates of a document in a target document image based on the vertex coordinates of the document in the original document image;

and a matrix obtaining unit 142, configured to calculate a perspective transformation matrix based on a corresponding transformation relationship between the vertex coordinates of the document in the original document image and the vertex coordinates of the document in the target document image.

Optionally, the apparatus further comprises:

and the image clipping module 16 is configured to clip the target document image based on the vertex coordinates of the document in the target document image, so as to obtain the target document.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the embodiment of the application, an original document image is obtained firstly, and then edge detection is carried out on the original document image by using an edge detection model comprising a semantic segmentation branch and an edge detection branch, so that the obtained edge probability image is more accurate; determining the vertex coordinates of the document in the original document image from the edge probability image, calculating a perspective transformation matrix based on the vertex coordinates of the document in the original document image, and finally performing perspective correction on the original document image based on the perspective transformation matrix to obtain a target document image comprising the distorted and repaired front view angle document image, so that convenience is brought to reading and document archiving of a user, and the user experience is improved; optionally, the edge probability image is subjected to thinning processing, so that the document image correction precision is improved, the calculation amount of straight line detection and intersection point calculation is reduced, and the document image correction speed is improved; optionally, before the vertex coordinates of the document in the original document image are calculated based on the first intersection point set, intersection points in the second intersection point set obtained by calculating the intersection points of every two straight lines are subjected to intersection point filtering processing to obtain the first intersection point set, so that the reasonability of each intersection point in the first intersection point set is ensured, and the accuracy of document image correction is ensured.

An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the document image rectification method according to the embodiment shown in fig. 1 to 11, and a specific execution process may refer to specific descriptions of the embodiment shown in fig. 1 to 11, which is not described herein again.

The present application further provides a computer program product, where at least one instruction is stored in the computer program product, and the at least one instruction is loaded by the processor and executed by the method for correcting a document image according to the embodiment shown in fig. 1 to 11, where a specific execution process may refer to specific descriptions of the embodiment shown in fig. 1 to 11, and is not described herein again.

Referring to fig. 13, a block diagram of an intelligent terminal device according to an exemplary embodiment of the present application is shown. The intelligent terminal device in the application can comprise one or more of the following components: a processor 110, a memory 120, an input device 130, an output device 140, and a bus 150. The processor 110, memory 120, input device 130, and output device 140 may be connected by a bus 150.

Processor 110 may include one or more processing cores. The processor 110 connects various parts within the entire intelligent terminal apparatus using various interfaces and lines, and performs various functions of the intelligent terminal apparatus 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), field-programmable gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a communication chip.

The Memory 120 may include a Random Access Memory (RAM) or a read-only Memory (ROM). Optionally, the memory 120 includes a non-transitory computer-readable medium. The memory 120 may be used to store instructions, programs, code sets, or instruction sets.

The input device 130 is used for receiving input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used for outputting instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In the embodiment of the present application, the input device 130 may be a temperature sensor, and is configured to obtain an operating temperature of the intelligent terminal device. The output device 140 may be a speaker for outputting audio signals.

In addition, those skilled in the art will appreciate that the structure of the intelligent terminal device shown in the above figures does not constitute a limitation of the intelligent terminal device, and the intelligent terminal device may include more or less components than those shown, or combine some components, or arrange different components. For example, the intelligent terminal device further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WiFi) module, a power supply, a bluetooth module, and other components, which are not described herein again.

In the embodiment of the present application, the execution subject of each step may be the intelligent terminal device described above. Optionally, the execution subject of each step is an operating system of the intelligent terminal device. The operating system may be an android system, an IOS system, or another operating system, which is not limited in this embodiment of the present application.

In the intelligent terminal device shown in fig. 13, the processor 110 may be configured to call the document image rectification program stored in the memory 120 and execute the program to implement the document image rectification method according to the various method embodiments of the present application.

In the embodiment of the application, an original document image is obtained firstly, and then edge detection is carried out on the original document image by using an edge detection model comprising a semantic segmentation branch and an edge detection branch, so that the obtained edge probability image is more accurate; determining the vertex coordinates of the document in the original document image from the edge probability image, calculating a perspective transformation matrix based on the vertex coordinates of the document in the original document image, and finally performing perspective correction on the original document image based on the perspective transformation matrix to obtain a target document image comprising the distorted and repaired front view angle document image, so that convenience is brought to reading and document archiving of a user, and user experience is improved; optionally, the edge probability image is subjected to thinning processing, so that the document image correction precision is improved, the calculation amount of straight line detection and intersection point calculation is reduced, and the document image correction speed is improved; optionally, before the vertex coordinates of the document in the original document image are calculated based on the first intersection point set, intersection points in the second intersection point set obtained by calculating the intersection points of every two straight lines are subjected to intersection point filtering processing to obtain the first intersection point set, so that the reasonability of each intersection point in the first intersection point set is ensured, and the accuracy of document image correction is ensured.

It is clear to a person skilled in the art that the solution of the present application can be implemented by means of software and/or hardware. The "unit" and "module" in this specification refer to software and/or hardware that can perform a specific function independently or in cooperation with other components, where the hardware may be, for example, a Field-ProgrammaBLE Gate Array (FPGA), an Integrated Circuit (IC), or the like.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some service interfaces, devices or units, and may be an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A document image rectification method is applied to intelligent terminal equipment, and comprises the following steps:

acquiring an original document image;

2. The method of claim 1, wherein determining vertex coordinates of a document in an original document image from the edge probability image comprises:

thinning the edge probability image to obtain a thinned edge image;

determining the vertex coordinates of the document in the original document image from the refined edge image.

3. The method according to claim 2, wherein the refining the edge probability image to obtain a refined edge probability image comprises:

performing edge filtering processing on the binary edge image to obtain a filtered edge image;

4. The method of claim 1, wherein determining vertex coordinates of a document in the original document image from the edge probability image comprises:

carrying out straight line detection on the document edge frame in the edge probability image by using Hough transform to obtain a straight line set;

acquiring the intersection point of any two straight lines in the straight line set to obtain a first intersection point set;

classifying each intersection point in the first intersection point set by adopting a clustering algorithm to obtain four types of intersection points;

and calculating the clustering center of each of the four types of intersection points to obtain four clustering centers, and taking the coordinates of the four clustering centers as the vertex coordinates of the document in the original document image.

5. The method of claim 4, wherein the calculating the intersection point of two lines in the line set to obtain a first intersection point set comprises:

calculating the intersection point of every two straight lines in the straight line set to obtain a second intersection point set;

and carrying out intersection filtering processing on the second intersection set to obtain a first intersection set.

6. The method of claim 5, wherein the performing intersection filtering processing on the second intersection set to obtain a first intersection set comprises:

7. The method of claim 4, wherein the classifying each intersection in the first set of intersections using a clustering algorithm to obtain four types of intersections comprises:

8. The method of claim 1, wherein computing a perspective transformation matrix based on vertex coordinates of documents in the original document image comprises:

calculating the vertex coordinates of the document in the target document image based on the vertex coordinates of the document in the original document image;

and calculating to obtain a perspective transformation matrix based on the corresponding transformation relation between the vertex coordinates of the document in the original document image and the vertex coordinates of the document in the target document image.

9. The method according to claim 1, wherein after the perspective correction is performed on the original document image according to the perspective transformation matrix to obtain the target document image, the method further comprises:

and cutting the target document image based on the vertex coordinates of the document in the target document image to obtain the target document.

10. A document image rectification apparatus, characterized in that the apparatus comprises:

11. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.

12. An intelligent terminal device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the steps of the method according to any of claims 1-9.