CN106803269B

CN106803269B - Method and device for perspective correction of document image

Info

Publication number: CN106803269B
Application number: CN201510830447.6A
Authority: CN
Inventors: 李鑫; 刘伟; 范伟; 孙俊
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-11-25
Filing date: 2015-11-25
Publication date: 2020-03-10
Anticipated expiration: 2035-11-25
Also published as: CN106803269A

Abstract

The invention discloses a method and equipment for perspective correction of a document image. The method comprises the following steps: determining the part of the long Arabic numeral string in the document image and the content of the long Arabic numeral string; creating a reference image according to the content of the long Arabic numeral string; calculating a correction function according to the part of the long Arabic numeral string and the reference image; and carrying out perspective correction on the document image according to the correction function.

Description

Method and device for perspective correction of document image

Technical Field

The present invention relates generally to the field of image processing. In particular, the present invention relates to a method and apparatus for enabling perspective correction of document images containing long Arabic numeral strings.

Background

In modern society, there are a variety of documents, cards, documents, etc., such as identification cards, business cards, bank cards, house books, drivers' licenses, passports, household management documents of a party, etc. Some entities or individuals need to collect or archive such information frequently, requiring such documents, cards, documents, etc. to be retained in electronic form. The usual way of doing this electronically, with the exception of some special reading tools, is to take a picture and then store the image or to identify the image and then store the identified information.

In the process of taking and keeping, the problem of perspective transformation is often needed to be solved. This is because: due to environmental or equipment limitations, when these documents, cards, documents, etc. are photographed, the photographing result may be affected by a tilt, which is generally called perspective transformation, because the photographing result is not photographed directly on the surface of the photographic subject but has a certain angle with the normal direction of the surface of the photographic subject. For the next step of identification and storage, perspective correction must be performed on the image, and then subsequent processing such as layout analysis and identification can be performed.

The traditional method is to analyze the shot image, find out the edge and corner points, or directly identify the content of the image, compare the information with a standard template, establish a transformation formula according to a perspective transformation model, and perform perspective projection transformation. Therefore, the conventional method requires, before each correction, establishment of a standard template specific to a certificate, card, document, or the like, and correction is performed based on matching between a captured image and the standard template image. Building a standard template is often a very cumbersome matter requiring a ruler to measure the relative distance between all corner points to be used. These methods are not suitable for use in cases where the types of documents, cards, documents, etc. to be electronized are relatively large or the information position of the same kind of electronized object is not fixed. In addition, if the user holds the identification card by hand to shoot, the hand can easily block the four corner points of the identification card, and the processing can not be carried out based on the corner points.

That is, the conventional method and apparatus for performing perspective correction heavily depend on the standard template, the workload for preparing the standard template is large, the adaptability of the standard template is not wide, the flexibility is low, and the perspective correction effect is unstable.

Disclosure of Invention

The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to determine the key or critical elements of the present invention, nor is it intended to limit the scope of the present invention. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

The invention aims to provide a method and equipment for flexibly performing perspective correction on a document image containing a long Arabic numeral string without depending on a standard template prepared in advance.

To achieve the above object, according to one aspect of the present invention, there is provided a method of perspective-correcting a document image containing a long arabic numeral string, the method including: determining the part of the long Arabic numeral string in the document image and the content of the long Arabic numeral string; creating a reference image according to the content of the long Arabic numeral string; calculating a correction function according to the part of the long Arabic numeral string and the reference image; and carrying out perspective correction on the document image according to the correction function.

According to another aspect of the present invention, there is provided an apparatus for perspective correction of an image of a document containing a long arabic numeral string, the apparatus comprising: a numeric string determination device configured to: determining the part of the long Arabic numeral string in the document image and the content of the long Arabic numeral string; a reference image creating device configured to: creating a reference image according to the content of the long Arabic numeral string; a correction function calculation device configured to: calculating a correction function according to the part of the long Arabic numeral string and the reference image; and a perspective correction device configured to: and carrying out perspective correction on the document image according to the correction function.

In addition, according to another aspect of the present invention, there is also provided a storage medium. The storage medium includes a program code readable by a machine, which, when executed on an information processing apparatus, causes the information processing apparatus to execute the above-described method according to the present invention.

Further, according to still another aspect of the present invention, there is provided a program product. The program product comprises machine-executable instructions which, when executed on an information processing apparatus, cause the information processing apparatus to perform the above-described method according to the invention.

Drawings

The above and other objects, features and advantages of the present invention will be more readily understood by reference to the following description of the embodiments of the present invention taken in conjunction with the accompanying drawings. The components in the figures are meant to illustrate the principles of the present invention. In the drawings, the same or similar technical features or components will be denoted by the same or similar reference numerals. In the drawings:

FIG. 1 illustrates a flow diagram of a method for perspective correction of a document image containing a long Arabic numeral string, in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a process for determining a portion of the document image where the long Arabic numeral string is located;

FIG. 3 shows the input document image and the intermediate result after processing in step S1;

FIG. 4 illustrates an example of a portion where a long Arabic number string is located;

fig. 5 shows an example of feature point extraction;

FIGS. 6(a) and 6(b) show an input document image before perspective correction and a transformation result after perspective correction, respectively;

fig. 6(c) shows the recognition result;

FIG. 7 is a block diagram illustrating an apparatus for perspective correction of a document image containing a long Arabic numeral string according to an embodiment of the present invention; and

FIG. 8 shows a schematic block diagram of a computer that may be used to implement methods and apparatus according to embodiments of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.

It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the device structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted. In addition, it is also noted that elements and features depicted in one drawing or one embodiment of the invention may be combined with elements and features depicted in one or more other drawings or embodiments.

A flow of a method of perspective correction of a document image containing a long arabic numeral string according to an embodiment of the present invention will be described below with reference to fig. 1.

FIG. 1 illustrates a flow diagram of a method for perspective correction of a document image containing a long Arabic numeral string, according to an embodiment of the present invention. As shown in fig. 1, the method for perspective correction of a document image containing a long arabic numeral string according to an embodiment of the present invention includes the following steps: determining a portion of the document image where the long arabic numeral string is located and contents of the long arabic numeral string (step S1); creating a reference image according to the content of the long arabic numeral string (step S2); calculating a correction function according to the part where the long Arabic numeral string is located and the reference image (step S3); and perspective correction is performed on the document image in accordance with the correction function (step S4).

As described above, the document image includes images of various documents, cards, documents, and the like, including images of identification cards, business cards, bank cards, house books, driver licenses, passports, household registration management documents of places, and the like.

The common feature of these document images is that they contain long Arabic number strings, such as identification numbers, card numbers, etc. Since some identification numbers have a tail number that is X, rather than a number, hereinafter, for convenience of processing, only the purely digital portion of the identification number is taken as an example of a long Arabic number string.

Since the long arabic numeral string is significantly different from other portions of the document image, the portion where the long arabic numeral string is located can be relatively easily and accurately located and analyzed to determine a correction function for perspective correction.

Therefore, according to the present invention, first, in step S1, the portion of the document image where the long arabic numeral string is located and the content of the long arabic numeral string are determined.

Specifically, referring to fig. 2, determining the portion of the document image where the long arabic numeral string is located includes: performing binarization processing on the document image to obtain a binarized image (step S11); extracting all connected domains in the binarized image (step S12); performing OCR recognition on the extracted connected domain using a digital OCR engine (step S13); searching a group of connected domains which have high confidence degree, are close to each other and form a longest string in the OCR result (step S14); an area surrounded by the circumscribed rectangle of the set of connected components is determined as a portion of the document image where the long arabic numeral string is located (step S15).

In step S11, the document image is subjected to binarization processing to obtain a binarized image.

When the document image itself is a gray level image, the binary threshold value can be directly used to binarize the gray level document image to obtain a binary image.

When the document image itself is a color image, the color document image may be converted into a grayscale image, and then the grayscale document image may be binarized by using a binarization threshold value to obtain a binarized image.

As for the method of converting a color document image into a grayscale image, two preferred embodiments are given here. The present invention is not limited thereto as long as the conversion of the color document image into the gradation image can be achieved.

A first preferred embodiment of the method of converting a color document image into a grayscale image is to take, for each pixel in the document image, the maximum value in R, G, B of that pixel as the pixel value of the corresponding location in the grayscale image to obtain a grayscale image.

The second preferred embodiment of the method of converting a color document image into a grayscale image is to take, for each pixel in the document image, the smaller of the product of the maximum value in R, G, B of the pixel and a predetermined constant larger than 1, and 255 as the pixel value of the corresponding position in the grayscale image to obtain the grayscale image.

I.e., the following equation.

g＝min(255,max(r,g,b)*1.25)

Where g is a gray value, min () represents taking a minimum value, max represents taking a maximum value, r, g, b represent color three channel values of the pixel, and an example of a predetermined constant greater than 1 in the above formula is 1.25, which may be empirically specified, without being limited thereto.

By adopting the above two methods, especially the second embodiment for graying, the non-black color pixel points are more white. Taking the identity card as an example, part of characters are colored, and the identity card number is black, so that the primary screening of pixels (removing the character part with non-black color) is facilitated, and the binarization effect is improved.

In order to further improve the processing effect, black pixel enhancement processing may be performed on the grayscale image before binarization.

Specifically, for each gray value p in the gray image_iThe gradation value is updated using the following formula.

New grey value

Wherein p is_iIndicating the gray values α, β are predetermined positive integers, α takes a value close to 127.5 and β takes, for example, 5, but not limited thereto, and mainly plays a role of amplification.

α is taken to be in the middle of 0 to 255, so that when p is_iFar greater than α, updated p_iCloser to 255, i.e. the white pixel is whiter, p_iFar less than α, updated p_iCloser to 0, i.e. the black pixel is darker, p_iThe closer to α, the updated p_iApproximately 127.5, with little change before and after updating.

By the black pixel enhancement processing, the black pixels can be effectively enhanced, and the gray pixels can be weakened.

In step S12, all connected domains in the binarized image are extracted.

Connected component extraction is a well-known technique in the art and will not be described in detail herein.

In step S13, OCR recognition is performed on the extracted connected component using a digital OCR engine. OCR recognition is a technique well known in the art and will not be described in detail herein.

It should be noted that a digital OCR engine is employed herein. Since the purpose of step S1 is to locate the part of the document image where the long arabic numeral string is located and to identify the content of the long arabic numeral string, it can be achieved by using a digital OCR engine. Meanwhile, the black pixels in the binarized image also include text. For a digital OCR engine, characters belong to noise, so that the corresponding recognition confidence coefficient is low, and the method is favorable for positioning the part where the long Arabic numeral string is located.

In step S14, a set of connected domains with high confidence, close to each other, forming a longest string is searched for in the OCR result.

Specifically, through the confidence, the corresponding connected domains such as characters, noise and the like are removed, then the connection relation between every two connected domains is calculated in the rest connected domains, if the two connected domains are adjacent to each other, the two connected domains are marked, and therefore the longest string formed by one connected domain is obtained, because the identification number is the longest in the continuous number part in the processing object. And determining a group of recognition results with highest confidence degrees in the recognition results corresponding to the part of the long Arabic numeral string in the OCR results as the content of the long Arabic numeral string. Marking each connected domain of the string as cc₀…cc_nAnd obtaining a corresponding recognition result a₀…a_nWhere it is assumed that the string consists of n connected domains.

In step S15, the area surrounded by the circumscribed rectangle of the set of connected components is determined as the portion of the document image where the long arabic numeral string is located.

Fig. 3 shows the input document image and the intermediate result obtained after the processing in step S1. The name, sex, nationality, birth, year, month, day, address, citizen identification number and other characters appearing in the ID card in color characters and the square and circular noise are eliminated. The location and content of the identification number 123456789987654321 is confirmed.

In step S2, a reference image is created according to the content of the long arabic numeral string.

Since the content of the long arabic numeral string has been determined in step S1, a standard reference image without perspective transformation problem can be created based on the content as a basis for calculating the correction function.

Specifically, according to the content of the long Arabic numeral string, a corresponding reference image is formed by using a preset font. The predetermined font is a specified font of the digital part in the processing object. Compared with the traditional technology, the invention only needs to know the font information without a standard template.

In step S3, a correction function is calculated according to the part where the long arabic numeral string is located and the reference image.

The correction function is, for example, a perspective transformation equation.

The perspective transformation equation is shown below.

Wherein, a₁₁、a₁₂、a₁₃、a₂₁、a₂₂、a₂₃、a₃₁、a₃₂、a₃₃For perspective transformation parameters, X, Y are respectively the abscissa and ordinate of the pixel before transformation, and U, V is respectively the abscissa and ordinate of the pixel after transformation. Therefore, the feature point p of the part where the long Arabic numeral string is located is only required to be pointed out_i(X, Y) and feature point P of reference image_iAnd substituting the 'U, V' into the equation to solve the perspective transformation parameters. a is₃₃In factThe default value in the actual calculation is 1. The perspective transformation equations are 2, have eight unknown parameters in total, and can be solved by substituting four groups of coordinates with eight values.

Step S3 can be realized, for example, by the following steps: acquiring four feature points of the part where the long arabic numeral string is located (step S31); acquiring four feature points of the reference image (step S32); from the eight feature points acquired, a correction function is calculated (step S33).

The method of selecting the feature points is described below.

In step S31, extracting a center point of an upper side of a circumscribed rectangle of a left connected component on the left side of the portion where the long arabic numeral string is located as a first point and a center point of a lower side as a second point; extracting the central point of the upper side of the circumscribed rectangle of the right connected domain of the part of the long Arabic numeral string close to the right side as a third point and the central point of the lower side as a fourth point; determining the average pixel position of the intersection point of a first straight line connecting the first point and the third point and the left connected domain after the first straight line integrally moves downwards by a plurality of pixels as a first characteristic point; determining the average pixel position of the intersection point of a second straight line connecting the second point and the fourth point and the left connected domain after moving upwards by a plurality of pixels as a second characteristic point; determining the average pixel position of the intersection point of the first straight line and the right connected domain after the first straight line is wholly shifted down by a plurality of pixels as a third feature point; and determining the average pixel position of the intersection point of the second straight line and the right connected domain after moving upwards by a plurality of pixels as a fourth characteristic point.

In fig. 4, the left-most connected component is taken as the leftmost connected component (corresponding to numeral 1), and the right-most connected component is taken as the rightmost connected component (corresponding to numeral 2). As shown in fig. 4, the first straight line is shifted down by several pixels as a whole and the second straight line is shifted up as two horizontal lines in fig. 4 as a whole.

The first to fourth feature points extracted are shown in the upper part of fig. 5.

The left side connected domain and the right side connected domain can be close to the middle instead of the connected domain on the side, but the interval between the left side connected domain and the right side connected domain is relatively larger.

Therefore, the whole first straight line is moved downwards by a plurality of pixels, and the whole second straight line is moved upwards to remove the influence of noise at the highest point and near the lowest point, so that misjudgment is prevented.

Similar methods can be used to extract feature points of the reference image.

Of course, since the reference image is a standard image and has no noise interference, four points at both ends may be directly selected.

That is, in step S32, the reference image is subjected to binarization processing and connected components are extracted; extracting the central point of the upper side of the circumscribed rectangle of the connected domain close to the left side of the reference image as a fifth characteristic point and the central point of the lower side of the circumscribed rectangle of the connected domain close to the left side of the reference image as a sixth characteristic point; and extracting the central point of the upper side of the circumscribed rectangle of the connected domain close to the right side of the reference image as a seventh characteristic point and the central point of the lower side as an eighth characteristic point.

Similarly, the connected domain close to the left side may be the leftmost connected domain (corresponding to the number 1), the connected domain close to the right side may be the rightmost connected domain (corresponding to the number 2), and the connected domain close to the left side and the connected domain close to the right side may also both be close to the middle instead of the connected domain on the sides, but the interval between the connected domain close to the left side and the connected domain close to the right side needs to be relatively large.

The fifth to eighth feature points extracted are shown below fig. 5.

Of course, other geometric methods or image processing methods can be used to obtain enough corresponding points on the digital string image and the reference image as feature points.

In step S33, a correction function is calculated from the eight feature points acquired.

As described above, only the coordinates of the four sets of feature points are required to calculate the correction function.

Having obtained the correction function, the document image can be perspective-corrected in accordance with the correction function in step S4.

Specifically, the original image, that is, the coordinates of each pixel point in the input document image are substituted into the perspective transformation equation by using the perspective transformation equation, the pixel position of each pixel point after perspective correction is obtained, and the pixel value of the pixel point is given to the pixel position after perspective correction.

Fig. 6(a) and 6(b) show an input document image before perspective correction and a transformation result after perspective correction, respectively.

The perspective-corrected document image may be subjected to subsequent processing.

For example, the portion of the document image where the content of interest is located may be determined according to the position of the portion of the document image where the long arabic number string is located. This is because the relative positional relationship of the respective contents of interest in the document image is fixed. Then, text line extraction and mixed text OCR recognition are carried out on the part of the document image where the content of interest is located, and the content of interest is obtained. Note that since the recognition object of OCR performed at this time includes characters and numbers, recognition is performed using a mixed-text OCR engine, not a digital OCR engine used before. Fig. 6(c) shows the recognition result.

Next, an apparatus for perspective correction of a document image containing a long arabic numeral string according to an embodiment of the present invention will be described with reference to fig. 7.

Fig. 7 is a block diagram illustrating a configuration of an apparatus for perspective correction of a document image containing a long arabic numeral string according to an embodiment of the present invention. As shown in fig. 7, a perspective correction apparatus 700 for perspective-correcting a document image containing a long arabic numeral string according to the present invention includes: a numeric string determination device 71 configured to: determining the part of the long Arabic numeral string in the document image and the content of the long Arabic numeral string; a reference image creating device 72 configured to: creating a reference image according to the content of the long Arabic numeral string; a correction function calculation device 73 configured to: calculating a correction function according to the part of the long Arabic numeral string and the reference image; and perspective correction means 74 configured to: and carrying out perspective correction on the document image according to the correction function.

In one embodiment, the digital string determining means 71 comprises: an area determination unit, comprising: a binarization processing subunit configured to: carrying out binarization processing on the document image to obtain a binarized image; a connected component extracting subunit configured to: extracting all connected domains in the binary image; a digital OCR engine configured to: performing OCR recognition on the extracted connected domain; a search subunit configured to: searching a group of connected domains which have high confidence degree and are close to each other and form a longest string in the OCR result; a determination subunit configured to: and determining the area surrounded by the circumscribed rectangle of the group of connected domains as the part of the long Arabic numeral string in the document image.

In one embodiment, the digital string determining means 71 further comprises: a content determination unit configured to: and determining a group of recognition results with highest confidence degrees in the recognition results corresponding to the part of the long Arabic numeral string in the OCR results as the content of the long Arabic numeral string.

In one embodiment, the binarization processing sub-unit is further configured to: for each pixel in the document image, taking the maximum value in R, G, B of the pixel as the pixel value of the corresponding position in the gray scale image to obtain a gray scale image; and carrying out binarization on the gray level image by using a binarization threshold value to obtain a binarization image.

In one embodiment, the binarization processing sub-unit is further configured to: for each pixel in the document image, taking the smaller of the product of the maximum value in R, G, B of the pixel and a predetermined constant larger than 1 and 255 as the pixel value of the corresponding position in the gray-scale image to obtain a gray-scale image; and carrying out binarization on the gray level image by using a binarization threshold value to obtain a binarization image.

In one embodiment, the binarization processing sub-unit is further configured to: before binarization is performed, black pixel enhancement processing shown by the following formula is also performed on each gray value pi in the gray image:

wherein p is_iRepresenting gray values α, β are predetermined positive integers, α is close to 127.5.

In one embodiment, the reference image creating device 72 is further configured to: and forming a corresponding reference image by using a preset font according to the content of the long Arabic numeral string.

In one embodiment, the correction function calculation means 73 comprises: a first feature point acquisition unit configured to: acquiring four characteristic points of a part where the long Arabic numeral string is located; a second feature point acquisition unit configured to: acquiring four characteristic points of the reference image; a correction function calculation unit configured to: and calculating a correction function according to the obtained eight characteristic points.

In one embodiment, the first feature point acquisition unit is further configured to: extracting the central point of the upper side of the circumscribed rectangle of the left connected domain of the left side of the part where the long Arabic numeral string is located as a first point and the central point of the lower side as a second point; extracting the central point of the upper side of the circumscribed rectangle of the right connected domain of the part of the long Arabic numeral string close to the right side as a third point and the central point of the lower side as a fourth point; determining the average pixel position of the intersection point of a first straight line connecting the first point and the third point and the left connected domain after the first straight line integrally moves downwards by a plurality of pixels as a first characteristic point; determining the average pixel position of the intersection point of a second straight line connecting the second point and the fourth point and the left connected domain after moving upwards by a plurality of pixels as a second characteristic point; determining the average pixel position of the intersection point of the first straight line and the right connected domain after the first straight line is wholly shifted down by a plurality of pixels as a third feature point; and determining the average pixel position of the intersection point of the second straight line and the right connected domain after moving upwards by a plurality of pixels as a fourth characteristic point.

In one embodiment, the second feature point acquisition unit is further configured to: carrying out binarization processing on the reference image and extracting a connected domain; extracting the central point of the upper side of the circumscribed rectangle of the connected domain close to the left side of the reference image as a fifth characteristic point and the central point of the lower side of the circumscribed rectangle of the connected domain close to the left side of the reference image as a sixth characteristic point; and extracting the central point of the upper side of the circumscribed rectangle of the connected domain close to the right side of the reference image as a seventh characteristic point and the central point of the lower side as an eighth characteristic point.

In one embodiment, the left-most connected component is the leftmost connected component, and the right-most connected component is the rightmost connected component.

In one embodiment, the perspective correction apparatus 700 further includes: an interesting content acquiring apparatus configured to: determining the part where the interested content in the document image is located according to the position of the part where the long Arabic numeral string is located in the document image; and performing text line extraction and mixed text OCR recognition on the part of the document image where the content of interest is located, thereby obtaining the content of interest.

In one embodiment, the document image includes: identity card, images of house notebooks; the long Arabic number string includes an identification number.

Since the processes in the respective devices and units included in the perspective correction apparatus 700 according to the present invention are respectively similar to the processes in the respective steps included in the perspective correction method described above, a detailed description of these devices and units is omitted here for the sake of brevity.

Further, it should be noted that each constituent device and unit in the above-described apparatus may be configured by software, firmware, hardware, or a combination thereof. The specific means or manner in which the configuration can be used is well known to those skilled in the art and will not be described further herein. In the case of implementation by software or firmware, a program constituting the software is installed from a storage medium or a network to a computer (for example, a general-purpose computer 800 shown in fig. 8) having a dedicated hardware configuration, and the computer can execute various functions and the like when various programs are installed.

In fig. 8, a Central Processing Unit (CPU)801 executes various processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 to a Random Access Memory (RAM) 803. In the RAM 803, data necessary when the CPU 801 executes various processes and the like is also stored as necessary. The CPU 801, the ROM 802, and the RAM 803 are connected to each other via a bus 804. An input/output interface 805 is also connected to the bus 804.

The following components are connected to the input/output interface 805: an input section 806 (including a keyboard, a mouse, and the like), an output section 807 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like), a storage section 808 (including a hard disk and the like), a communication section 809 (including a network interface card such as a LAN card, a modem, and the like). The communication section 809 performs communication processing via a network such as the internet. A drive 810 may also be connected to the input/output interface 805 as desired. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like can be mounted on the drive 810 as necessary, so that the computer program read out therefrom is installed into the storage portion 808 as necessary.

In the case where the above-described series of processes is realized by software, a program constituting the software is installed from a network such as the internet or a storage medium such as the removable medium 811.

It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 811 shown in fig. 8 in which the program is stored, distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 811 include a magnetic disk (including a floppy disk (registered trademark)), an optical disk (including a compact disk read only memory (CD-ROM) and a Digital Versatile Disk (DVD)), a magneto-optical disk (including a Mini Disk (MD) (registered trademark)), and a semiconductor memory. Alternatively, the storage medium may be the ROM 802, a hard disk included in the storage section 808, or the like, in which programs are stored and which are distributed to users together with the apparatus including them.

The invention also provides a program product with machine readable instruction codes stored. The instruction codes are read and executed by a machine, and can execute the method according to the embodiment of the invention.

Accordingly, a storage medium carrying the above-described program product having machine-readable instruction code stored thereon is also included in the present disclosure. Including, but not limited to, floppy disks, optical disks, magneto-optical disks, memory cards, memory sticks, and the like.

In the foregoing description of specific embodiments of the invention, features described and/or illustrated with respect to one embodiment may be used in the same or similar manner in one or more other embodiments, in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

In addition, the method of the present invention is not limited to be performed in the time sequence described in the specification, and may be performed in other time sequences, in parallel, or independently. Therefore, the order of execution of the methods described in this specification does not limit the technical scope of the present invention.

While the present invention has been disclosed above by the description of specific embodiments thereof, it should be understood that all of the embodiments and examples described above are illustrative and not restrictive. Various modifications, improvements and equivalents of the invention may be devised by those skilled in the art within the spirit and scope of the appended claims. Such modifications, improvements and equivalents are also intended to be included within the scope of the present invention.

Supplementary note

1. A method of perspective correction of a document image containing a long arabic numeral string, comprising:

determining the part of the long Arabic numeral string in the document image and the content of the long Arabic numeral string;

creating a reference image according to the content of the long Arabic numeral string;

calculating a correction function according to the part of the long Arabic numeral string and the reference image; and

and carrying out perspective correction on the document image according to the correction function.

2. The method of supplementary note 1, wherein determining the portion of the document image where the long arabic numeral string is located and the content of the long arabic numeral string comprises:

carrying out binarization processing on the document image to obtain a binarized image;

extracting all connected domains in the binary image;

performing OCR recognition on the extracted connected domain by utilizing a digital OCR engine;

searching a group of connected domains which have high confidence degree and are close to each other and form a longest string in the OCR result;

and determining the area surrounded by the circumscribed rectangle of the group of connected domains as the part of the long Arabic numeral string in the document image.

3. The method of supplementary note 2, wherein determining the portion of the document image where the long arabic numeral string is located and the content of the long arabic numeral string further comprises:

and determining a group of recognition results with highest confidence degrees in the recognition results corresponding to the part of the long Arabic numeral string in the OCR results as the content of the long Arabic numeral string.

4. The method of supplementary note 2, wherein the binarizing process on the document image to obtain a binarized image comprises:

for each pixel in the document image, taking the maximum value in R, G, B of the pixel as the pixel value of the corresponding position in the gray scale image to obtain a gray scale image;

and carrying out binarization on the gray level image by using a binarization threshold value to obtain a binarization image.

5. The method of supplementary note 2, wherein the binarizing process on the document image to obtain a binarized image comprises:

for each pixel in the document image, taking the smaller of the product of the maximum value in R, G, B of the pixel and a predetermined constant larger than 1 and 255 as the pixel value of the corresponding position in the gray-scale image to obtain a gray-scale image;

6. The method according to supplementary note 4 or 5, wherein before the binarization, each gradation value p in the gradation image is also subjected to_iThe black pixel enhancement processing is performed as shown by the following formula:

7. The method of supplementary note 1, wherein creating a reference picture according to contents of the long arabic numeral string comprises:

and forming a corresponding reference image by using a preset font according to the content of the long Arabic numeral string.

8. The method according to supplementary note 1, wherein calculating the correction function according to the portion of the long arabic numeral string and the reference image comprises:

acquiring four characteristic points of a part where the long Arabic numeral string is located;

acquiring four characteristic points of the reference image;

and calculating a correction function according to the obtained eight characteristic points.

9. The method of supplementary note 8, wherein obtaining four feature points of the portion where the long arabic numeral string is located includes:

extracting the central point of the upper side of the circumscribed rectangle of the left connected domain of the left side of the part where the long Arabic numeral string is located as a first point and the central point of the lower side as a second point;

extracting the central point of the upper side of the circumscribed rectangle of the right connected domain of the part of the long Arabic numeral string close to the right side as a third point and the central point of the lower side as a fourth point;

determining the average pixel position of the intersection point of a first straight line connecting the first point and the third point and the left connected domain after the first straight line integrally moves downwards by a plurality of pixels as a first characteristic point;

determining the average pixel position of the intersection point of a second straight line connecting the second point and the fourth point and the left connected domain after moving upwards by a plurality of pixels as a second characteristic point;

determining the average pixel position of the intersection point of the first straight line and the right connected domain after the first straight line is wholly shifted down by a plurality of pixels as a third feature point;

and determining the average pixel position of the intersection point of the second straight line and the right connected domain after moving upwards by a plurality of pixels as a fourth characteristic point.

10. The method according to supplementary note 8, wherein acquiring four feature points of the reference image includes:

carrying out binarization processing on the reference image and extracting a connected domain;

extracting the central point of the upper side of the circumscribed rectangle of the connected domain close to the left side of the reference image as a fifth characteristic point and the central point of the lower side of the circumscribed rectangle of the connected domain close to the left side of the reference image as a sixth characteristic point;

and extracting the central point of the upper side of the circumscribed rectangle of the connected domain close to the right side of the reference image as a seventh characteristic point and the central point of the lower side as an eighth characteristic point.

11. An apparatus for perspective correction of a document image containing a long arabic numeral string, comprising:

a numeric string determination device configured to: determining the part of the long Arabic numeral string in the document image and the content of the long Arabic numeral string;

a reference image creating device configured to: creating a reference image according to the content of the long Arabic numeral string;

a correction function calculation device configured to: calculating a correction function according to the part of the long Arabic numeral string and the reference image; and

a perspective correction device configured to: and carrying out perspective correction on the document image according to the correction function.

12. The apparatus according to supplementary note 11, wherein the number string determining means includes: an area determination unit, comprising:

a binarization processing subunit configured to: carrying out binarization processing on the document image to obtain a binarized image;

a connected component extracting subunit configured to: extracting all connected domains in the binary image;

a digital OCR engine configured to: performing OCR recognition on the extracted connected domain;

a search subunit configured to: searching a group of connected domains which have high confidence degree and are close to each other and form a longest string in the OCR result;

a determination subunit configured to: and determining the area surrounded by the circumscribed rectangle of the group of connected domains as the part of the long Arabic numeral string in the document image.

13. The apparatus according to supplementary note 12, wherein the number string determining means further comprises: a content determination unit configured to:

14. The device described in supplementary note 12, wherein the binarization processing sub-unit is further configured to:

15. The device described in supplementary note 12, wherein the binarization processing sub-unit is further configured to:

16. The device of supplementary note 14 or 15, wherein the binarization processing sub-unit is further configured to: before binarization, each gray value p in the gray image is also subjected to_iThe black pixel enhancement processing is performed as shown by the following formula:

17. The apparatus according to supplementary note 11, wherein the reference image creating means is further configured to:

18. The apparatus according to supplementary note 11, wherein the correction function calculation means includes:

a first feature point acquisition unit configured to: acquiring four characteristic points of a part where the long Arabic numeral string is located;

a second feature point acquisition unit configured to: acquiring four characteristic points of the reference image;

a correction function calculation unit configured to: and calculating a correction function according to the obtained eight characteristic points.

19. The apparatus according to supplementary note 18, wherein the first feature point acquisition unit is further configured to:

20. The apparatus according to supplementary note 18, wherein the second feature point acquisition unit is further configured to:

Claims

perspective correction is performed on the document image according to the correction function,

wherein determining the part of the document image where the long arabic numeral string is located and the content of the long arabic numeral string includes:

extracting all connected domains in the binary image;

2. The method of claim 1, wherein determining the portion of the document image in which the long arabic numeral string is located and the content of the long arabic numeral string further comprises:

3. The method of claim 1, wherein the binarizing the document image to obtain a binarized image comprises:

4. The method of claim 1, wherein the binarizing the document image to obtain a binarized image comprises:

5. The method according to claim 3 or 4, wherein each gray value p in the gray image is also subjected to binarization before binarization_iThe black pixel enhancement processing is performed as shown by the following formula:

wherein p is_iRepresenting gray values α, β are predetermined positive integers.

6. The method of claim 1, wherein calculating a correction function based on the portion of the long Arabic numeral string and the reference picture comprises:

acquiring four characteristic points of the reference image;

7. The method of claim 6, wherein obtaining four feature points of the portion where the long Arabic numeral string is located comprises:

8. The method of claim 6, wherein acquiring four feature points of the reference image comprises:

9. An apparatus for perspective correction of a document image containing a long arabic numeral string, comprising:

a perspective correction device configured to: perspective correction is performed on the document image according to the correction function,

extracting all connected domains in the binary image;