CN115862044A

CN115862044A - Method, apparatus, and medium for extracting target document part from image

Info

Publication number: CN115862044A
Application number: CN202211448474.3A
Authority: CN
Inventors: 李国政
Original assignee: China Construction Bank Corp Shanghai Branch
Current assignee: China Construction Bank Corp Shanghai Branch
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-03-28

Abstract

Embodiments of the present disclosure relate to methods, apparatuses, and media for extracting a target document portion from an image. According to the method, an image to be processed is obtained, wherein the image to be processed is provided with a target document part and a background part; the image to be processed is subjected to binarization processing, so that the image to be processed after binarization processing is respectively subjected to first filtering processing and second filtering processing to respectively obtain a first image and a second image, wherein the first image has less noise than the second image, and the second image has higher sharpness than the first image; determining coordinates of four corner points of the target document part based on the first image; and extracting the target document part from the second image based on the determined coordinates of the four corner points. Therefore, the clearly usable target document part can be accurately and efficiently extracted from the image.

Description

Method, apparatus and medium for extracting target document part from image

Technical Field

Embodiments of the present disclosure relate generally to the field of image processing, and more particularly, to a method, apparatus, and medium for extracting a target document portion from an image.

Background

When an image of a document is captured by an image capturing device such as a camera, a document portion related to the document and a background portion related to a surrounding environment (e.g., a desktop) where the document is placed are generally included in the captured image, and the images may also generally include various distortions such as shadows (e.g., shadows of the surrounding environment, etc.), distortions, edge blurs, or light shadows, which may cause a great disturbance in accurately extracting the document portion from the image. Currently, it is usually necessary for the user to manually extract the required document part from such an image, but this is inefficient, and due to the aforementioned various distortions, it is difficult for the user to accurately find the precise border of the target document in such an image, and therefore it is difficult to accurately extract the target document completely without the background part. Moreover, the user usually can not perform distortion processing on the corresponding target document after extracting the target document, thereby further reducing the processing efficiency.

Accordingly, there is a need to provide a technique for automatically extracting a target document portion from an image so as to be able to accurately and efficiently extract a clearly usable document portion from the image.

Disclosure of Invention

In view of the above problems, the present disclosure provides a method, apparatus, and medium for extracting a target document portion from an image, so that a clearly usable document portion can be accurately and efficiently extracted from the image.

According to a first aspect of the present disclosure, there is provided a method for extracting a target document portion from an image, comprising: acquiring an image to be processed, wherein the image to be processed is provided with a target document part and a background part; performing binarization processing on the image to be processed so as to respectively perform first filtering processing and second filtering processing on the image to be processed after binarization processing to respectively obtain a first image and a second image, wherein the first image has less noise than the second image, and the second image has higher sharpness than the first image; determining coordinates of four corner points of the target document part based on the first image; and extracting the target document part from the second image based on the determined coordinates of the four corner points.

According to a second aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect of the disclosure.

In a third aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions, characterized in that the computer instructions are for causing the computer to perform the method of the first aspect of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference numbers indicate like or similar elements.

FIG. 1 shows a schematic diagram of an exemplary system 100 for implementing a method for extracting a target document portion from an image according to an embodiment of the present disclosure.

FIG. 2 shows a flow diagram of a method 200 for extracting a target document portion from an image, according to an embodiment of the present disclosure.

FIG. 3 shows a flowchart of a method 300 for determining coordinates of four corner points of a target document portion based on a first image resulting from a first filtering process, according to an embodiment of the present disclosure.

FIG. 4 shows a flowchart of a method 400 for extracting a target document portion from a second image resulting from a second filtering process, according to an embodiment of the present disclosure.

FIG. 5 illustrates a flow diagram of a method 500 for determining coordinates of four corner points of a skew-corrected target document portion, in accordance with an embodiment of the present disclosure.

Fig. 6A shows a schematic diagram of an exemplary to-be-processed image subjected to binarization processing according to an embodiment of the present disclosure;

fig. 6B shows a schematic diagram of an exemplary third image according to an embodiment of the disclosure.

Fig. 6C shows a schematic diagram of an exemplary fourth image, in accordance with an embodiment of the present disclosure.

FIG. 6D shows a schematic diagram of a final extracted target document portion according to an embodiment of the present disclosure.

Fig. 7 shows a block diagram of an electronic device 700 according to an embodiment of the disclosure.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In the following description, for the purposes of illustrating various inventive embodiments, certain specific details are set forth in order to provide a thorough understanding of the various inventive embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details. In other instances, well-known devices, structures and techniques associated with this application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.

Throughout the specification and claims, the word "comprise" and variations thereof, such as "comprises" and "comprising", will be understood to have an open, inclusive meaning, i.e., will be interpreted to mean "including, but not limited to", unless the context requires otherwise.

Reference throughout this specification to "one embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment. Thus, the appearances of the phrases "in one embodiment" or "in some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the terms first, second, third, fourth, etc. used in the description and in the claims, are used for distinguishing between various objects for clarity of description only and do not limit the size, other order, etc. of the objects described therein.

As described above, when an image of a document is captured by an image capturing device such as a camera, a document portion with respect to the document and a background portion with respect to a surrounding environment (e.g., a desktop) in which the document is placed are generally included in the captured image, and the images may also generally include various distortions such as shadows (e.g., shadows of the surrounding environment, etc.), distortions, edge blurs, or light shadows, thereby causing a great disturbance in accurately extracting the document portion from the image. At present, the user generally extracts the required document part from such an image manually, but this is inefficient, and due to the aforementioned various distortions, it is difficult for the user to accurately find the precise border of the target document in such an image, and therefore it is difficult to accurately extract the target document completely without the background part. Moreover, the user usually performs distortion processing on the corresponding target document after extracting the target document, thereby further reducing the processing efficiency.

To address at least in part one or more of the above problems and other potential problems, an example embodiment of the present disclosure proposes a method for extracting a target document portion from an image, comprising: acquiring an image to be processed, wherein the image to be processed is provided with a target document part and a background part; performing binarization processing on the image to be processed so as to respectively perform first filtering processing and second filtering processing on the image to be processed after binarization processing to respectively obtain a first image and a second image, wherein the first image has less noise than the second image, and the second image has higher sharpness than the first image; determining coordinates of four corner points of the target document portion based on the first image; and extracting the target document part from the second image based on the determined coordinates of the four corner points. In this way, clearly usable target document portions can be accurately and efficiently extracted from the image.

Hereinafter, specific examples of the present scheme will be described in more detail with reference to the accompanying drawings.

FIG. 1 shows a schematic diagram of an exemplary system 100 for implementing a method for extracting a target document portion from an image according to an embodiment of the present disclosure. As shown in fig. 1, system 100 includes a computing device 110, a network 120, and a server 130. Computing device 110 and server 130 may interact with data via network 120 (e.g., the internet). In the present disclosure, server 130 may provide a service for providing pending images to computing device 110. Computing device 110 may communicate with server 130 via network 120 to enable extraction of a target document portion from an acquired image to be processed. The computing device 110 may include at least one processor 112 and at least one memory 114 coupled to the at least one processor 112, the memory 114 having stored therein instructions 116 executable by the at least one processor 112, the instructions 116 when executed by the at least one processor 112 performing the method 200 as described below. Note that herein, computing device 110 may be part of server 130 or may be separate from server 130. Of course, in some embodiments, server 130 and network 120 may not be included, and the image to be processed may be loaded or unloaded directly into computing device 110 by the user to extract the target document portion from the image to be processed by computing device 110. The specific structure of the computing device 110 or the server 130 may be described, for example, in connection with fig. 7 as follows.

FIG. 2 shows a flow diagram of a method 200 for extracting a target document portion from an image, according to an embodiment of the present disclosure. The method 200 may be performed by the computing device 110 as shown in FIG. 1, or may be performed at the electronic device 700 shown in FIG. 7. It should be understood that method 200 may also include additional blocks not shown and/or may omit blocks shown, as the scope of the present disclosure is not limited in this respect.

In step 202, a to-be-processed image is acquired, the to-be-processed image having a target document portion and a background portion.

In the present disclosure, the image to be processed may be an image obtained by capturing a document with an image capturing device such as a camera.

Included in the image to be processed are a target document portion captured in respect of the document and a background portion captured in respect of the surrounding environment (e.g. the desktop) in which the document is located.

An object of the present disclosure is to extract a target document portion in which information is recognizable from such an image to be processed.

In some embodiments, before performing other processing on the image to be processed, it may be determined whether the image to be processed is available, mainly whether the image to be processed is unavailable due to the existence of an excessive shadow. If there is an excessively large shadow (i.e., a dense shadow) in the image to be processed, after the target document portion is extracted from the image to be processed, content information such as text cannot be effectively recognized from the target document portion due to such shadow. In some implementations, to determine whether an image to be processed is unavailable due to the presence of excessive shading, a grayscale histogram of the image to be processed may be generated first. Then, based on the gray histogram, it is determined whether a proportion of pixels having a gray value less than or equal to a predetermined value (e.g., 49) to all pixels of the image to be processed is greater than or equal to a predetermined threshold value (e.g., 24%). In response to determining that the ratio is greater than the predetermined threshold, determining that the image to be processed is not available due to the presence of excessive shading, otherwise determining that the image to be processed is available so that subsequent processing can occur.

In step 204, the to-be-processed image is subjected to binarization processing, so that the to-be-processed image (for example, as shown in fig. 6A) subjected to binarization processing is respectively subjected to first filtering processing and second filtering processing to respectively obtain a first image and a second image. In the present disclosure, the first image resulting from the first filtering process has less noise than the second image resulting from the second filtering process, but the second image has a higher sharpness than the second image.

In the present disclosure, it is possible to contribute to an increase in subsequent processing speed by performing binarization processing on an image to be processed.

In some embodiments, the binarization processing of the image to be processed may include the following steps.

First, an image to be processed is converted into a grayscale image. In the present disclosure, converting the image to be processed into a grayscale map will be beneficial to increase the speed of the subsequent processing.

Then, a blur determination is performed on the gray scale image to determine whether the gray scale image is sufficiently clear. In some implementations, the fuzzy determination of the grayscale map may include: and multiplying the gray-scale image by a preset Laplace convolution kernel to obtain a corresponding response image. In the present disclosure, the pre-set laplacian convolution kernel may be, for example, a 3 by 3 convolution kernel, which is, for example, a 3 by 3 convolution kernel

After the response map is obtained, the variance of the response map can be calculated. If the variance is smaller than a preset threshold value, determining that the document image to be processed is fuzzy, otherwise determining that the document image to be processed is clear enough. The predetermined threshold may be, for example, the median of the gray values of all pixels in the image to be processed.

If the gray map is determined to be clear enough through fuzzy judgment, the image to be processed can be subjected to binarization based on the gray map, and the binarization can be realized by using an adaptive threshold algorithm, for example. For example, the threshold value of the gray scale map may be determined by using a binarization threshold value algorithm, and then the gray scale value of the pixel with the gray scale value larger than the determined threshold value in the gray scale map is set to 1, and the gray scale value of the pixel with the gray scale value smaller than the determined threshold value in the gray scale map is set to 0, so as to implement the binarization processing of the gray scale map.

At present, no filtering process can simultaneously consider both noise and sharpness of an image, so that in the method, two different images obtained by performing different filtering processes on the same image to be processed are respectively used when coordinates of four corner points of a target document part are determined and the target document part is extracted, the accuracy of the determined coordinates can be effectively ensured, and the clarity of the extracted target document part can be ensured. As described above, in step 204, the first image is obtained by subjecting the binarized image to the first filtering process, and the second image is obtained by subjecting the binarized image to the second filtering process, and the first image has less noise than the second image, but the second image has higher sharpness than the second image. On the one hand, since the first image has less noise than the second image, determining the coordinates of the four corner points of the target document portion by recording the first image in subsequent processing helps to improve the accuracy of the identified four corner points. On the other hand, since the second image has higher sharpness than the first image, the content information (e.g., text) contained in the second image is sharper without shading, so that extracting the target document portion from the second image will help to obtain a better quality document portion, so as to facilitate better recognition of useful content information such as text from the resulting document image.

In some embodiments, the first filtering process may be, for example, median filtering, and the second filtering process may be, for example, bilateral filtering. The median filtering is to divide an image (e.g., the aforementioned grayscale map) into a plurality of image blocks (e.g., each image block is 5 × 5 pixels in size), and then replace the pixel value of each pixel in each image block with the median pixel value of the image block, i.e., the median pixel value is the median among the pixel values of all the pixels in the 5 × 5 image block. Most of noise in the image to be processed can be effectively removed through median filtering, so that the accuracy of the determined coordinates of the four corner points can be effectively ensured by using the first image obtained through median filtering to determine the coordinates of the four corner points of the target document part. However, the median filtering may blur useful content (e.g., text in a document portion) included in the image while removing noise, and thus if the target document portion is extracted from the first image, the resulting useful information in the target document portion may not be clear enough.

Bilateral filtering may remove shadows in the image, making content information (e.g., text) in the image sharp, but there is still some other noise that cannot be removed, so using the second image resulting from bilateral filtering to extract the target document portion will help improve the clarity of the resulting useful information in the target document portion.

At step 206, coordinates of four corner points of the target document portion are determined based on the first image.

As described earlier, since less noise is included in the first image, determining the coordinates of the four corner points of the target document object based on the first image can ensure the accuracy of the determined coordinates.

Step 206 is described in further detail below in conjunction with fig. 3.

At step 208, a target document portion is extracted from the second image based on the determined coordinates of the four corner points.

Since the second image is obtained by processing the same image to be processed as the first image, the coordinates of the four corner points of the target document portion determined based on the first image are actually the coordinates of the four corner points of the target document portion in the second image, and therefore the desired target document portion can be accurately extracted from the second image based on the coordinates determined in step 206.

Also, as previously described, since the sharpness of the second image is relatively higher, the target document portion extracted from the second image may be clearer relative to the first image.

Step 208 is described in further detail below based on FIG. 4.

FIG. 3 shows a flowchart of a method 300 for determining coordinates of four corner points of a target document portion based on a first image resulting from a first filtering process, according to an embodiment of the present disclosure. The method 300 may be performed by the computing device 110 as shown in FIG. 1, or may be performed at the electronic device 700 shown in FIG. 7. It should be understood that method 300 may also include additional blocks not shown and/or may omit blocks shown, as the scope of the disclosure is not limited in this respect.

At step 302, based on the first image, a degree of tilt of the target document portion is determined.

This first image is the first image previously obtained at step 204.

In some embodiments, determining the inclination of the target document portion based on the first image may include detecting straight lines in the first image to determine the inclination of the target document portion by determining an inclination angle of a lateral straight line of the detected straight lines with respect to a horizontal border of the first image. In other embodiments, determining the inclination of the target document portion may include detecting straight lines in the first image to determine the inclination of the target document portion by determining an inclination angle of a longitudinal line of the detected straight lines with respect to a vertical border of the first image.

In the above embodiments, the detection of the straight line in the first image may be implemented, for example, using a hough transform algorithm.

In step 304, the first image is rotated based on the tilt determined in step 302 to obtain a third image having a tilt-corrected target document portion.

After determining the inclination of the target document portion, the inclination of the target document portion is corrected by rotating the first image by a corresponding angle, and the resulting third image has the inclination-corrected target document portion. For example, one example of the third image is shown in fig. 6B. According to fig. 6B, the target document portion in the third image has been corrected for tilt, and the noise therein is small, but the text portion is not particularly clear.

At step 306, edge detection is performed on the third image to preliminarily determine four border lines of the tilt-corrected target document portion.

In some embodiments, the third image may be edge detected, for example, using a Canny edge detection algorithm.

In the present disclosure, to further eliminate noise so as to improve the accuracy of the corner coordinates of the target document portion determined in the subsequent processing, the third image may be further filtered (e.g., mean filtered, the filtering kernel of which may be, for example, 5 × 5) after step 306, and then subjected to two dilation and erosion operations.

At step 308, connected component analysis is performed on the third image to determine coordinates of four corner points of the tilt-corrected target document portion based on the plurality of connected components determined from the third image and the preliminarily determined four bounding lines.

In the present disclosure, a plurality of connected regions may be determined from the third image by performing a connected region analysis on the third image. In the present disclosure, based on at least these connected regions, the coordinates of the four corner points of the tilt-corrected target document portion may be determined.

Step 308 is described in further detail below in conjunction with fig. 5.

By adopting the above means, the present disclosure can accurately and efficiently determine the position of the target document portion.

FIG. 4 shows a flowchart of a method 400 for extracting a target document portion from a second image resulting from a second filtering process, according to an embodiment of the present disclosure. The method 400 may be performed by the computing device 110 as shown in FIG. 1, or may be performed at the electronic device 700 shown in FIG. 7. It should be understood that method 400 may also include additional blocks not shown and/or may omit blocks shown, as the scope of the disclosure is not limited in this respect.

At step 402, the second image is rotated based on the tilt determined at step 302 to obtain a fourth image having a tilt-corrected target document portion.

The second image is the second image previously obtained at step 204.

One example of a fourth image is shown in fig. 6C. As described with respect to FIG. 6C, the target document portion in the fourth image has been tilt-corrected and the text therein is clearer relative to the third image, but the noise in the fourth image is larger relative to the third image.

Since the same processing is performed on the second image in step 402 as that performed on the first image in step 304, the positions of the tilt-corrected target document portion in the thus-obtained third image and fourth image are practically the same. Accordingly, using the coordinates of the tilt-corrected target document portion determined in step 308 relative to the third image, the coordinates of the tilt-corrected target document portion in the fourth image may be known, thereby facilitating the extraction of the tilt-corrected target document portion from the fourth image based on these coordinates.

At step 404, based on the determined coordinates of the four corner points, a bounding box of the tilt-corrected target document portion in the fourth image is determined.

Specifically, by connecting the corresponding coordinate points in the fourth image in order by straight lines, the frame of the tilt-corrected target document portion in the fourth image can be obtained.

In step 406, affine transformation is performed on the fourth image based on the determined bounding box and the bounding box of the fourth image itself using perspective transformation to extract an image including only the tilt-corrected target document portion from the fourth image, and the extracted image is enlarged to have a horizontal width equal to that of the minimum bounding rectangle of the fourth image.

In the present disclosure, the image extracted from the fourth image by affine transformation is an image from which the background portion has been removed, that is, an image including only the tilt-corrected target document portion. In addition, in the present disclosure, by the affine transformation, the four sides of the extracted image may be stretched in equal proportion so that the horizontal width of the image is equal to the horizontal width of the minimum bounding rectangle of the fourth image.

Fig. 6D, for example, shows an image of the finally extracted target document portion.

By adopting the above means, the present disclosure can obtain a desired clear image of the target document portion.

FIG. 5 illustrates a flow diagram of a method 500 for determining coordinates of four corner points of a skew-corrected target document portion, in accordance with an embodiment of the present disclosure. The method 500 may be performed by the computing device 110 as shown in FIG. 1, or may be performed at the electronic device 700 shown in FIG. 7. It should be understood that method 500 may also include additional blocks not shown and/or may omit blocks shown, as the scope of the disclosure is not limited in this respect.

At step 502, it is determined whether the ratio of the area of the largest connected region of the plurality of connected regions to the area of the third image (i.e., the third image obtained at step 304) is greater than a predetermined first ratio.

The plurality of connected regions referred to in step 502 are the plurality of connected regions determined by the connected region analysis of the third image as referred to above in step 308. These connected regions may be ordered by size of area, so that the connected region with the largest area of these connected regions, i.e., the largest connected region, may be determined.

In the present disclosure, the predetermined first ratio may be determined, for example, by making a number of recognition statistics, which may be, for example, 1/2 or 2/3, etc.

In one aspect, in step 504, if the ratio of the area of the largest connected region to the area of the third image is greater than the predetermined first ratio, the coordinates of the four corner points of the smallest bounding rectangle of the largest connected region are determined.

In step 506, if at least one of the four corner points of the minimum bounding rectangle of the maximum connected region is outside the maximum connected region, the corner point is replaced by the point in the maximum connected region that is closest to the corner point.

It should be understood that if two or more of the four corner points of the minimum bounding rectangle of the maximum connected region are outside the maximum connected region, the operation mentioned in step 506 may be performed for each of these corner points.

In step 508, the resulting coordinates of the four points are taken as the coordinates of the four corner points of the tilt-corrected target document portion.

Conversely, if the four corner points of the minimum bounding rectangle of the maximum connected region are all located within the maximum connected region, steps 506 and 508 are not performed, and the coordinates of the four corner points of the minimum bounding rectangle of the maximum connected region are directly taken as the coordinates of the four corner points of the tilt-corrected target document portion.

On the other hand, in step 510, if the ratio of the area of the largest connected region of the plurality of connected regions (i.e., the plurality of connected regions obtained by analyzing the connected regions of the third image) to the area of the third image is smaller than the predetermined first ratio, the plurality of connected regions are stitched to obtain a stitched connected region.

In the present disclosure, before stitching the above multiple connected regions, the third image may be subjected to two expansion and erosion operations, so as to further remove noise in the third image, so that subsequent processing is more accurate.

In some embodiments, splicing the plurality of connected regions includes the following steps. First, one or more connected regions of the plurality of connected regions (i.e., the plurality of connected regions obtained by analyzing the connected regions of the third image) in which the horizontal width of the smallest circumscribed rectangle is smaller than the horizontal width of the smallest circumscribed rectangle of the largest connected region by a predetermined second ratio are determined. In the present disclosure, the predetermined second ratio may be, for example, 1/2, etc. Since connected regions with too small an area (i.e., the aforementioned connected regions with the horizontal width of the minimum bounding rectangle being smaller than the horizontal width of the minimum bounding rectangle of the maximum connected region by the predetermined second proportion) may be connected regions caused by some noise in the image, which may generally have an influence on the accuracy of the determination of the coordinates of the target document portion, it is necessary in the present disclosure to find and eliminate such small connected regions first. However, there is a special case that these small connected regions may include connected regions related to titles, and if the connected regions related to titles are also eliminated, the corresponding titles are excluded, and thus the document portion extracted last is not accurate. It should be appreciated that headings are typically located at the top of an article (e.g., a document title) or within an article (e.g., a chapter title). It is found by analysis that titles inside an article, such as chapter titles, are usually merged into a larger connected region together with other characters, so that the false elimination of the connected region of the title inside the article is not caused in the present disclosure. However, for a title at the top of an article, such as a title of a document, since the corresponding connected region obtained for the title at the top of the article is usually small, when the small connected region is easily removed, the small connected region is easily removed together, and thus, the corresponding determination needs to be performed on the connected region before the removal. This determination is made because the connected components associated with the headlines at the top of the article are typically in the top half of the corresponding image. Specifically, if one of the determined one or more connected regions (i.e., one or more connected regions having a horizontal width of a minimum bounding rectangle that is less than a predetermined second proportion of the horizontal width of the minimum bounding rectangle of the maximum connected region) is located in the upper half of the third image, optical character recognition (i.e., OCR recognition) is performed on the connected region to determine whether the connected region contains a title.

In one aspect, in response to determining that the connected region includes a title, coordinates of an upper left corner point and a lower right corner point of the connected region are determined. Meanwhile, coordinates of four corner points of each of the other connected regions (i.e., the connected regions obtained by analyzing the connected regions of the third image) except for the determined one or more connected regions (i.e., the one or more connected regions having the previously determined horizontal width of the minimum bounding rectangle smaller than the horizontal width of the minimum bounding rectangle of the maximum connected region by a predetermined second ratio) are determined. The coordinates of the four corner points of the stitched connected region may then be determined based on the determined coordinates of the four corner points of each of the other connected regions, the coordinates of the top left corner point and the bottom right corner point of the connected region containing the title, and the preliminarily determined four bounding lines (i.e. the four bounding lines previously determined in step 306). Specifically, the coordinates of the four corner points of each of the other determined connected regions and the coordinates of the four points closest to the preliminarily determined four frame lines among the coordinates of the upper left corner point and the lower right corner point of the connected region including the title may be determined as the coordinates of the four corner points of the spliced connected region. For example, the coordinates of the top frame line and the left frame line in the above-mentioned respective coordinates that are relatively closest to the initially determined four frame lines may be taken as the top left corner of the connected region that is spliced, the coordinates of the top frame line and the right frame line in the above-mentioned respective coordinates that are relatively closest to the initially determined four frame lines may be taken as the top right corner of the connected region that is spliced, and so on, and the coordinates of the four corners of the connected region that is spliced may be finally determined.

On the other hand, in response to determining that no header is included in the connected region, the determined one or more connected regions (i.e., the previously determined one or more connected regions having a horizontal width of the smallest bounding rectangle that is less than a predetermined second proportion of the horizontal width of the smallest bounding rectangle of the largest connected region) are removed. And simultaneously determining the coordinates of four corner points of the minimum bounding rectangle of each of the rest connected regions. Subsequently, coordinates of the four corner points of the stitched connected region may be determined based on the coordinates of the four corner points of the smallest bounding rectangle of each of the remaining connected regions and the preliminarily determined four bounding lines (i.e., the four bounding lines previously determined in step 306). Specifically, the coordinates of the four points closest to the preliminarily determined four frame lines among the coordinates of the four corner points of the minimum bounding rectangle of each of the remaining connected regions may be determined as the coordinates of the four corner points of the spliced connected region. For example, the coordinates of the top frame line and the left frame line in the above-mentioned respective coordinates that are relatively closest to the initially determined four frame lines may be taken as the top left corner of the connected region that is spliced, the coordinates of the top frame line and the right frame line in the above-mentioned respective coordinates that are relatively closest to the initially determined four frame lines may be taken as the top right corner of the connected region that is spliced, and so on, and the coordinates of the four corners of the connected region that is spliced may be finally determined.

At step 512, coordinates of four corner points of the tilt-corrected target document portion are determined based on the stitched connected regions.

The shape obtained by connecting the four corner points of the spliced communicated area in sequence is generally a trapezoid, so that the coordinates of two corner points at the top of the trapezoid are required to be judged in order to prevent the top area of the trapezoid from being too small, and if the distance between the two coordinates is smaller than the maximum width of the communicated area, the top coordinate is replaced by the top coordinate of the maximum communicated area. Specifically, if the distance between the top left corner point and the top right corner point of the four corner points of the spliced connected region is smaller than the horizontal width of the minimum circumscribed rectangle of the maximum connected region, the coordinates of the top left corner point and the top right corner point of the four corner points of the spliced connected region are replaced with the coordinates of the top left corner point and the top right corner point of the minimum circumscribed rectangle of the maximum connected region, respectively, as the coordinates of the top left corner point and the top right corner point of the finally determined target document portion.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. For example, the computing device 110 as shown in fig. 1 may be implemented by the electronic device 700. As shown, electronic device 700 includes a Central Processing Unit (CPU) 701 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 702 or computer program instructions loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the random access memory 703, various programs and data necessary for the operation of the electronic apparatus 700 can also be stored. The central processing unit 701, the read only memory 702 and the random access memory 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A plurality of components in the electronic apparatus 700 are connected to the input/output interface 705, including: an input unit 706 such as a keyboard, a mouse, a microphone, and the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The various processes and processes described above, such as methods 200-500, may be performed by the central processing unit 701. For example, in some embodiments, methods 200-500 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, some or all of the computer program may be loaded and/or installed onto the device 700 via the read only memory 702 and/or the communication unit 709. The computer program may perform one or more of the actions of the methods 200-500 described above when loaded into the random access memory 703 and executed by the central processing unit 701.

The present disclosure relates to methods, apparatuses, systems, electronic devices, computer-readable storage media and/or computer program products. The computer program product may include computer-readable program instructions for performing various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge computing devices. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the market, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for extracting a target document portion from an image, comprising:

acquiring an image to be processed, wherein the image to be processed is provided with a target document part and a background part;

performing binarization processing on the image to be processed so as to respectively perform first filtering processing and second filtering processing on the image to be processed after binarization processing to respectively obtain a first image and a second image, wherein the first image has less noise than the second image, and the second image has higher sharpness than the first image;

determining coordinates of four corner points of the target document portion based on the first image; and

based on the determined coordinates of the four corner points, a target document portion is extracted from the second image.

2. The method of claim 1, wherein the first filtering process is median filtering and the second filtering process is bilateral filtering.

3. The method of claim 1, determining coordinates of four corner points of the target document portion based on the first image comprising:

determining a tilt of the target document portion based on the first image;

rotating the first image based on the inclination to obtain a third image having the inclination-corrected target document portion;

performing edge detection on the third image to preliminarily determine four frame lines of the target document portion subjected to inclination correction; and

connected region analysis is performed on the third image to determine coordinates of four corner points of the tilt-corrected target document portion based on the plurality of connected regions determined from the third image and the preliminarily determined four bounding lines.

4. The method of claim 3, extracting the target document portion from the second image based on the determined coordinates of the four corner points comprises:

rotating the second image based on the inclination to obtain a fourth image having the inclination-corrected target document portion;

determining a frame of the tilt-corrected target document portion in the fourth image based on the determined coordinates of the four corner points; and

performing affine transformation on the fourth image based on the determined bounding box and the bounding box of the fourth image itself using perspective transformation to extract an image including only the tilt-corrected target document portion from the fourth image, and enlarging the extracted image to have a horizontal width equal to that of a minimum bounding rectangle of the fourth image.

5. The method of claim 3, wherein determining the inclination of the target document portion based on the first image comprises:

detecting straight lines in the first image to determine a tilt of the target document portion by determining a tilt angle of a lateral straight line of the detected straight lines with respect to a horizontal border of the first image; or

Straight lines are detected in the first image to determine a tilt of the target document portion by determining a tilt angle of a longitudinal straight line of the detected straight lines with respect to a vertical border of the first image.

6. The method of claim 3, wherein determining coordinates of four corner points of the tilt-corrected target document portion based on the plurality of connected regions determined from the third image and the preliminarily determined four bounding lines comprises:

if the ratio of the area of the largest connected region in the plurality of connected regions to the area of the third image is larger than a preset first ratio, determining the coordinates of four corner points of the smallest circumscribed rectangle of the largest connected region;

if at least one corner point of four corner points of the minimum circumscribed rectangle of the maximum connected region is positioned outside the maximum connected region, replacing the corner point by using a point which is closest to the corner point in the maximum connected region;

the coordinates of the four points finally obtained are taken as the coordinates of the four corner points of the tilt-corrected target document portion.

7. The method of claim 6, wherein determining coordinates of four corner points of the tilt-corrected target document portion based on the plurality of connected regions determined from the third image and the preliminarily determined four bounding lines further comprises:

and if the four corner points of the minimum circumscribed rectangle of the maximum connected region are all positioned in the maximum connected region, taking the coordinates of the four corner points of the minimum circumscribed rectangle as the coordinates of the four corner points of the target document part subjected to tilt correction.

8. The method of claim 3, wherein determining coordinates of four corner points of the tilt-corrected target document portion based on the plurality of connected regions determined from the third image and the preliminarily determined four bounding lines comprises:

if the ratio of the area of the largest connected region in the plurality of connected regions to the area of the third image is smaller than a preset first ratio, splicing the plurality of connected regions to obtain a spliced connected region;

based on the stitched connected regions, coordinates of four corner points of the tilt-corrected target document portion are determined.

9. The method of claim 8, wherein stitching the plurality of connected regions comprises:

determining one or more connected regions of the plurality of connected regions having a horizontal width of the smallest bounding rectangle that is less than a predetermined second proportion of the horizontal width of the smallest bounding rectangle of the largest connected region;

if one of the determined one or more connected regions is located in the upper half of the third image, performing optical character recognition on the connected region to determine whether the connected region contains a title;

in response to determining that the connected region contains the title, determining coordinates of an upper left corner point and a lower right corner point of the connected region;

determining coordinates of four corner points of each of the other connected regions of the plurality of connected regions except for the determined one or more connected regions;

and determining the coordinates of the four corner points of the spliced connected region based on the determined coordinates of the four corner points of each other connected region, the coordinates of the upper left corner point and the lower right corner point of the connected region containing the title, and the preliminarily determined four frame lines.

10. The method of claim 9, determining coordinates of four corner points of the tilt-corrected target document portion based on the plurality of connected regions determined from the third image and the preliminarily determined four bounding lines, further comprising:

in response to determining that no header is contained in the connected region, removing the determined one or more connected regions;

determining coordinates of four corner points of the minimum bounding rectangle of each of the remaining connected regions;

and determining the coordinates of the four corner points of the spliced connected region based on the coordinates of the four corner points of the minimum external rectangle of each remaining connected region and the preliminarily determined four frame lines.

11. The method of claim 8, wherein determining coordinates of four corner points of the tilt-corrected target document portion based on the stitched connected regions comprises:

and if the distance between the upper left corner point and the upper right corner point of the four corner points of the spliced connected region is smaller than the horizontal width of the minimum external rectangle of the maximum connected region, replacing the coordinates of the upper left corner point and the upper right corner point of the four corner points of the spliced connected region with the coordinates of the upper left corner point and the upper right corner point of the minimum external rectangle of the maximum connected region respectively as the finally determined coordinates of the upper left corner point and the upper right corner point of the target document part.

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

13. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11.