CN115797938A

CN115797938A - Automatic correction method of file picture, electronic equipment and storage medium

Info

Publication number: CN115797938A
Application number: CN202211323131.4A
Authority: CN
Inventors: 简仁贤
Original assignee: Emotibot Technologies Ltd
Current assignee: Emotibot Technologies Ltd
Priority date: 2022-10-27
Filing date: 2022-10-27
Publication date: 2023-03-14

Abstract

The application provides an automatic correction method of a file picture, an electronic device and a storage medium, wherein the method comprises the following steps: determining the boundary inclination angle of each character block by extracting the boundary frame coordinates of each character block in the file picture, and further determining the average inclination angles of the upper side, the lower side and the left side of the file picture according to the boundary inclination angle of each character block; determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture as well as the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks; the corrected character region can be obtained by carrying out perspective transformation on the character region of the file picture, the character region of the file picture can be accurately extracted by the scheme, a large amount of training cost is not needed, OCR recognition is carried out on the corrected character region, and the recognition accuracy can be improved.

Description

Automatic correction method of file picture, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an automatic correction method for a file picture, an electronic device, and a computer-readable storage medium.

Background

A document image (document image) refers to a document in a picture format, the picture contains a large amount of text information, but the text in the picture format cannot be directly read by a computer, and an Optical Character Recognition (OCR) technology must be used to detect a text region in the picture and recognize the text image as a text. The text recognition from the document picture has great application value, for example, the text is automatically analyzed to reduce the cost of manual editing, the document picture search is supported, the document picture information is extracted, classified and compared, and the like.

The sources of the document pictures are diversified, and the document pictures may be photographed and scanned by a user, and the distortion of the images such as inclination, rotation, elevation angle and the like appears in the pictures due to the relationship of the photographing or scanning angles, and the distortion can influence the OCR to detect the character area incorrectly, so that the accuracy of the whole OCR is reduced.

Generally, the computer vision edge detection technology can be used for detecting the paper area in the picture and then intercepting the paper area for perspective conversion, but the method has poor effect in practical application because all situations have paper areas and the edges of the paper and the background image are not obvious in many cases, so that the text area cannot be successfully intercepted by the edge detection technology. Another way is to use deep learning technique to collect a large number of pictures and train the target detection model by adding the target detection algorithm, which has the disadvantage of difficulty in obtaining a large number of training pictures. Therefore, how to correct the file picture with a small amount of training set or even without the training set is a practical pain point.

Disclosure of Invention

The embodiment of the application provides an automatic correction method of a file picture, which is used for reducing training cost and accurately extracting a character area.

The embodiment of the application provides an automatic correction method of a file picture, which comprises the following steps:

extracting character blocks in the file picture and the boundary box coordinates of each character block through an OCR technology;

determining the boundary inclination angle of each character block according to the boundary frame coordinates of each character block;

determining the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture according to the boundary inclination angle of each character block;

determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture as well as the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks;

and carrying out perspective transformation on the character area of the file picture to obtain a corrected character area.

In one embodiment, the determining the boundary inclination angle of each text block according to the boundary box coordinates of each text block includes:

for each character block, calculating direction vectors of four side frames of the character block, namely an upper frame, a lower frame, a left frame and a right frame according to the coordinate of the boundary frame of the character block;

and determining the inclination angles of the upper frame, the lower frame, the left frame and the right frame of the character block according to the direction vectors of the upper frame, the lower frame, the left frame and the right frame of the character block and a preset reference vector to obtain the boundary inclination angle of each character block.

In one embodiment, the determining the average tilt angle of the upper, lower, left and right sides of the file picture according to the boundary tilt angle of each text block includes:

according to the inclination angles of the upper frame, the lower frame, the left frame and the right frame of each character block, respectively removing the angle outliers of the upper frame, the lower frame, the left frame and the right frame;

and calculating the average value of the residual inclination angles aiming at the upper, lower, left and right side frames respectively to obtain the average inclination angles of the upper, lower, left and right sides of the file picture.

In an embodiment, the determining the text area of the document picture according to the average tilt angles of the document picture about the top, the bottom, the left-most point and the right-most point, and the highest point coordinate, the lowest point coordinate, the left-most point coordinate and the right-most point coordinate of all the text blocks includes:

determining the upper boundary of a character area of the file picture according to the average inclination angle of the upper side of the file picture and the highest point coordinates of all character blocks in the file picture;

determining the lower boundary of a character area of the file picture according to the average inclination angle of the lower edge of the file picture and the lowest point coordinates of all character blocks in the file picture;

determining a left boundary of a character area of the file picture according to the average inclination angle of the left side of the file picture and the leftmost point coordinates of all character blocks in the file picture;

and determining the right boundary of the character area of the file picture according to the average inclination angle of the right of the file picture and the rightmost point coordinates of all character blocks in the file picture.

In an embodiment, the determining the text area of the document picture according to the average tilt angles of the document picture about the top and the bottom and the left and the right and the coordinates of the highest point, the lowest point, the leftmost point and the rightmost point of all the text blocks includes:

rotating the file picture according to the average inclination angle of the left side of the file picture to obtain a target picture;

calculating the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture, and finding out the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture;

and determining a character area of the target picture as the character area of the file picture according to the average inclination angles of the upper part, the lower part, the left side and the right side of the target picture and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture.

In one embodiment, the calculating the average tilt angle of the target image comprises:

extracting character blocks in the target picture and the boundary box coordinates of each character block through an OCR technology;

determining the boundary inclination angle of each character block in the target picture according to the boundary frame coordinates of each character block in the target picture;

and determining the average inclination angle of the upper part, the lower part, the left part and the right part of the target picture according to the boundary inclination angle of each character block in the target picture.

In an embodiment, the finding of the highest point coordinate, the lowest point coordinate, the leftmost point coordinate, and the rightmost point coordinate of all the text blocks in the target picture includes:

finding a coordinate point with the minimum y coordinate in all the character blocks as the highest point coordinate according to the boundary frame coordinate of each character block in the target picture;

finding a coordinate point with the maximum x coordinate in all the character blocks as the coordinate of the rightmost point according to the coordinate of the boundary frame of each character block in the target picture;

finding a coordinate point with the maximum y coordinate in all the character blocks as the coordinate of the lowest point according to the coordinate of the boundary frame of each character block in the target picture;

and finding a coordinate point with the minimum x coordinate in all the character blocks as the coordinate of the leftmost point according to the coordinate of the boundary frame of each character block in the target picture.

In an embodiment, the determining the text area of the target picture according to the average tilt angles of the target picture from top to bottom and from left to right and the coordinates of the highest point, the lowest point, the leftmost point and the rightmost point of all text blocks in the target picture includes:

determining the upper boundary of a character area of the target picture according to the average inclination angle of the upper side of the target picture and the highest point coordinates of all character blocks in the target picture;

determining the lower boundary of a character area of the target picture according to the average inclination angle of the lower side of the target picture and the lowest point coordinates of all character blocks in the target picture;

determining the left boundary of the character area of the target picture according to the average inclination angle of the left side of the target picture and the leftmost point coordinates of all character blocks in the target picture;

and determining the right boundary of the character area of the target picture according to the average inclination angle of the right of the target picture and the coordinates of the rightmost points of all character blocks in the target picture.

An embodiment of the present application further provides an electronic device, where the electronic device includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the automatic correction method of the file picture.

The embodiment of the application also provides a computer readable storage medium, wherein the storage medium stores a computer program, and the computer program can be executed by a processor to complete the automatic correction method of the file picture.

According to the technical scheme provided by the embodiment of the application, the boundary inclination angle of each character block is determined by extracting the boundary frame coordinate of each character block in the file picture, and then the average inclination angle of the upper part, the lower part, the left part and the right part of the file picture is determined according to the boundary inclination angle of each character block; determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks; the corrected character region can be obtained by carrying out perspective transformation on the character region of the file picture, so that the character region of the file picture can be accurately extracted, a large amount of training cost is not needed, OCR recognition is carried out on the corrected character region, and the recognition accuracy can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for automatically correcting a document picture according to an embodiment of the present disclosure;

FIG. 3 is a diagram of bounding box coordinates of a text block provided by an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating an inclination angle of a left frame of a text block according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a boundary of a text region provided in an embodiment of the present application;

fig. 6 is a detailed flowchart of step S240 provided in the embodiment of the present application;

FIG. 7 is a schematic view of a document picture provided by an embodiment of the present application;

FIG. 8 is a diagram of bounding box coordinates for a block of characters in the picture of the file of FIG. 7;

FIG. 9 is a diagram illustrating the left frame of all text blocks in the file picture shown in FIG. 7;

FIG. 10 is a schematic view of a rotated target picture of the document shown in FIG. 7;

FIG. 11 is a schematic diagram of the highest point, the rightmost point, the lowest point, and the leftmost point in the target picture of FIG. 10;

FIG. 12 is a diagram of four boundaries of a text region in the target picture shown in FIG. 10;

FIG. 13 is a schematic diagram of the corrected text area of FIG. 12;

fig. 14 is a block diagram of an apparatus for automatically correcting a document picture according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Fig. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device 100 may be configured to execute the automatic correction method for a document picture provided in the embodiment of the present application. As shown in fig. 1, the electronic device 100 includes: one or more processors 102, and one or more memories 104 storing processor-executable instructions. The processor 102 is configured to execute an automatic correction method for a document picture provided by the following embodiments of the present application.

The processor 102 may be a gateway, or may be an intelligent terminal, or may be a device including a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capability and/or instruction execution capability, and may process data of other components in the electronic device 100, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 102 to implement the automatic correction method for file pictures described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

In one embodiment, the electronic device 100 shown in FIG. 1 may further include an input device 106, an output device 108, and a data acquisition device 110, which may be interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are merely exemplary and not limiting, and the electronic device 100 may have other components and structures as desired.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like. The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like. The data acquisition device 110 may acquire an image of a subject and store the acquired image in the memory 104 for use by other components. Illustratively, the data acquisition device 110 may be a camera.

In an embodiment, the devices in the example electronic device 100 for implementing the automatic document picture correction method according to the embodiment of the present application may be integrally disposed, or may be disposed separately, such as integrally disposing the processor 102, the memory 104, the input device 106, and the output device 108, and disposing the data acquisition device 110 separately.

In an embodiment, the example electronic device 100 for implementing the automatic correction method of the document picture according to the embodiment of the present application may be implemented as an intelligent terminal, such as a smart phone, a tablet computer, a desktop computer, a server, an in-vehicle device, and the like.

Fig. 2 is a schematic flowchart of a method for automatically correcting a document picture according to an embodiment of the present application. The method is performed by an electronic device, and as shown in fig. 2, the method comprises the following steps S210-S250.

Step S210: and extracting character blocks in the file picture and the boundary box coordinates of each character block by an OCR technology.

Optical Character Recognition (OCR) technology is a prior art technique for detecting the location of a block of text and identifying the content of the block of text. A text block refers to the smallest circumscribed rectangle of a string. The file picture refers to a text in a picture format, the file picture comprises a plurality of character blocks, and the boundary box coordinates of the character blocks refer to the coordinates of four vertexes of a minimum circumscribed rectangular box of a character string. As shown in fig. 3, the coordinates of the bounding box of a text block are shown, where the coordinates of the top left vertex are (x 0, y 0), the coordinates of the top right vertex are (x 1, y 1), the coordinates of the bottom right vertex are (x 2, y 2), the coordinates of the bottom left vertex are (x 3, y 3), and the top left vertex can be used as the origin of coordinates (0, 0).

Step S220: and determining the boundary inclination angle of each character block according to the boundary frame coordinates of each character block.

The boundary inclination angle comprises the inclination angles of four frames of the upper frame, the lower frame, the left frame and the right frame of the character block. In an embodiment, the step S220 specifically includes: for each character block, calculating direction vectors of four borders on the upper side, the lower side, the left side and the right side of the character block according to the coordinate of the boundary frame of the character block; and determining the inclination angles of the upper frame, the lower frame, the left frame and the right frame of the character block according to the direction vectors of the upper frame, the lower frame, the left frame and the right frame of the character block and a preset reference vector.

For example, with any text block, (1) first calculate the angle of the left frame (from bottom left to top left) of the text block, specifically, let the direction vector of the left frame be

And

included angle therebetween (i.e., the inclination angle of the left frame)

Theta is between-180 and 180. As shown in fig. 4 (1), θ is greater than 0 indicating that the character block needs to be rotated by θ degrees counterclockwise, and as shown in fig. 4 (2), θ is less than 0 indicating that the character block needs to be rotated by θ degrees clockwise. The inclination angle of the left frame of each character block can be recorded as theta _left 。

(2) Calculating the angle of the upper frame (from the upper right to the upper left) of the character block, and setting the direction vector of the upper frame

Reference vector

Computing

And with

Arc of (2)

The calculation method is the same as the step (1), and the calculation method obtains

And

angle therebetween (i.e. the angle of inclination of the upper frame)

The inclination angle of the upper frame of each character block can be recorded as theta _top 。

(3) Calculating the angle of the right frame (from the lower right to the upper right) of each character block, and setting the direction vector of the right frame

Reference vector

Computing

And

arc of

And with

Included angle therebetween (i.e., the inclined angle of the right frame)

The inclination angle of the right frame of each text block can be recorded as theta _right 。

(4) Calculating the angle of the lower frame (from left to right) of each character block, and setting the direction vector of the lower frame

Reference vector

Computing

And

arc of

And

included angle (namely the inclined angle of the lower frame)

The inclination angle of the lower frame of each character block is theta _bottom 。

Step S230: and determining the average inclination angle of the upper part, the lower part, the left part and the right part of the file picture according to the boundary inclination angle of each character block.

The average inclination angles of the left frame and the right frame of the document picture comprise the average value of the inclination angles of the upper frames of all the character blocks, the average value of the inclination angles of the lower frames of all the character blocks, the average value of the inclination angles of the left frames of all the character blocks and the average value of the inclination angles of the right frames of the character blocks.

In an embodiment, the step S230 specifically includes: according to the inclination angles of the upper frame, the lower frame, the left frame and the right frame of each character block, respectively removing the angle outliers of the upper frame, the lower frame, the left frame and the right frame; and calculating the average value of the residual inclination angles aiming at the upper, lower, left and right side frames respectively to obtain the average inclination angles of the upper, lower, left and right sides of the file picture.

The angle outlier refers to an inclination angle that is not within a preset range, for example, the maximum value and the minimum value may be removed. In one embodiment, all the texts may be combinedInclination angle theta of left frame of block _left Sorting according to the order from small to large, finding out the value of the 5 th quantile and the value of the 95 th quantile, namely the inclination angles of the 5 th% and the 95 th%, and reserving theta _left Values between the 5 th to 95 th quantiles, i.e. removing theta _left Theta outside of this range _left These values are the angle outliers. Using the same method, θ of all blocks can be removed _top Angle outlier of (1), remove θ of all text blocks _right Angle outlier of (1), remove θ of all text blocks _bottom Angle outliers of (d).

At theta where all blocks are removed _top After the angle outliers of (c), the remaining θ is calculated _top As the average angle of inclination of the top of the document picture

At theta where all blocks are removed _bottom After the angle outliers of (a), the remaining theta is calculated _bottom As the average angle of inclination of the lower edge of the document picture

At theta where all blocks are removed _left After the angle outliers of (a), the remaining theta is calculated _left As the average tilt angle of the left side of the document picture

At theta of removing all character blocks _right After the angle outliers of (a), the remaining theta is calculated _right As the average tilt angle of the right side of the document picture

Step S240: and determining the character area of the file picture according to the average inclination angle of the file picture in the vertical direction, the horizontal direction and the vertical direction, and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all the character blocks.

Specifically, a coordinate point (i.e., a highest point coordinate) at which y coordinates of all character blocks in the file picture are found to be minimum is set as P _top ＝(x ₀ ，y ₀ ) (ii) a The coordinate point where the x coordinate of all the character blocks is the maximum (i.e. the rightmost coordinate) is set as P _right ＝(x ₁ ，y ₁ ). The coordinate point where the y coordinate of all character blocks is maximum (i.e., the lowest point coordinate) is set to P _bottom ＝(x ₂ ，y ₂ ) (ii) a The coordinate point where the x coordinate of all the character blocks is minimum (i.e. the leftmost point coordinate) is set as P _left ＝(x ₃ ，y ₃ )。

In an embodiment, the step S240 specifically includes:

(1) and determining the upper boundary of the character area of the file picture according to the average inclination angle of the upper side of the file picture and the highest point coordinates of all character blocks in the file picture.

In particular, the method comprises the following steps of,

P′ _top ＝P _top +[cos(a)，sin(a)]＝(x ₀ +cos(a)，y ₀ + sin (a)), P _top And P' _top The line extension may result in the upper boundary of the text region, as shown in FIG. 5. Wherein, the first and the second end of the pipe are connected with each other,

representing the average inclination angle of the upper side of the file picture; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p _top Representing the highest point coordinates.

(2) And determining the lower boundary of the character area of the file picture according to the average inclination angle of the lower edge of the file picture and the lowest point coordinates of all character blocks in the file picture.

(3) In particular, the method comprises the following steps of,

P′ _bottom ＝P _bottom +[cos(a)，sin(a)]＝(x ₂ +cos(a)，y ₂ + sin (a)), P _bottom And P' _bottom The extension of the connecting line may result in the lower boundary of the text region, as shown in fig. 5. Wherein the content of the first and second substances,

representing the average inclination angle of the lower edge of the file picture; [ cos (a), sin (a)]The expression angle vector is the coordinate with the radian a by taking (0, 0) as a central point; p is _bottom Representing the nadir coordinates.

(4) And determining the left boundary of the text area of the file picture according to the average inclination angle of the left side of the file picture and the leftmost point coordinates of all the text blocks in the file picture.

In particular, the method comprises the following steps of,

P′ _left ＝P _left +[cos(a)，sin(a)]＝(x ₃ +cos(a)，y ₃ + sin (a)), P _lef t and P' _left The line extension may result in the left border of the text region, as shown in FIG. 5. Wherein the content of the first and second substances,

representing the average tilt angle of the left side of the file picture; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p is _left Representing the leftmost point coordinates.

(5) And determining the right boundary of the character area of the file picture according to the average inclination angle of the right of the file picture and the coordinates of the rightmost point of all the character blocks in the file picture.

In particular, the method comprises the following steps of,

P′ _right ＝P _right +[cos(a)，sin(a)]＝(x ₁ +cos(a)，y ₁ + sin (a)), P _right And P' _right The extension of the connecting line may result in the right border of the text region, as shown in fig. 5. Wherein the content of the first and second substances,

representing the average inclination angle of the right side of the file picture; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p _right Representing the rightmost point coordinates.

As shown in fig. 5, a straight line corresponding to the upper boundary, a straight line corresponding to the lower boundary, a straight line corresponding to the left boundary, and a straight line corresponding to the right boundary of the text region are extended, so that 4 intersections can be obtained, and a region surrounded by the 4 intersections is the text region of the file and the picture.

Step S250: and carrying out perspective transformation on the character area of the file picture to obtain a corrected character area.

Specifically, an image corresponding to the text region may be captured from the document picture, and then the image is projected to the rectangle by perspective transformation, so that the text region picture with the elevation angle corrected may be obtained. The perspective transformation can be realized by adopting the prior art, specifically, the width and the height of the regular rectangle can be determined according to the coordinates of the four vertexes of the character area before correction, and the coordinates of the four vertexes of the regular rectangle can be obtained. And calculating a transformation matrix of perspective transformation according to the four vertex coordinates of the character area before correction and the four vertex coordinates of the regular rectangle, and then performing transformation of the transformation matrix on the whole character area image before correction to realize image correction.

According to the technical scheme provided by the embodiment of the application, the boundary inclination angle of each text block is determined by extracting the boundary frame coordinates of each text block in the file picture, and then the upper, lower, left and right average inclination angles of the file picture are determined according to the boundary inclination angle of each text block; determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture as well as the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks; the corrected character region can be obtained by carrying out perspective transformation on the character region of the file picture, so that the character region of the file picture can be accurately extracted, a large amount of training cost is not needed, OCR recognition is carried out on the corrected character region, and the recognition accuracy can be improved.

In other embodiments, if the inclination of the file picture is severe, there may be an error in text block detection, which may cause an error in text region detection, so as to improve the accuracy of text region detection. As shown in fig. 6, step S240 may include the following steps S241 to S243.

Step S241: and rotating the file picture according to the average inclination angle of the left side of the file picture to obtain a target picture.

Specifically, the average inclination angle of the left side of the file picture is used

And rotating the file picture Pic as a rotation angle of the whole picture to obtain a target picture Pic'.

Step S242: and calculating the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture, and finding out the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture.

The average tilt angles of the target picture Pic' up, down, left, and right may refer to the calculation process of the average tilt angles of the file picture up, down, left, and right in the above embodiment. Specifically, character blocks in the target picture and the boundary frame coordinates of each character block are extracted through an OCR technology; determining the boundary inclination angle of each character block in the target picture according to the boundary frame coordinates of each character block in the target picture; and determining the average inclination angle of the upper part, the lower part, the left part and the right part of the target picture according to the boundary inclination angle of each character block in the target picture. For a specific process, reference may be made to the above steps S210 to S230, which are not described herein again.

For the purpose of distinction, the average inclination angles of the upper, lower, left and right sides of the document picture are recorded as

The average angle of inclination of the upper, lower, left and right sides of the target picture can be recorded as

Further, according to the boundary frame coordinates of each character block in the target picture, a coordinate point P with the minimum y coordinate in all the character blocks can be found _top ＝(x ₀ ，y ₀ ) As the highest point coordinate; finding out a coordinate point P with the maximum x coordinate in all character blocks according to the boundary frame coordinate of each character block in the target picture _right ＝(x ₁ ，y ₁ ) As the rightmost point coordinate; finding a coordinate point P with the maximum y coordinate in all the character blocks according to the boundary frame coordinate of each character block in the target picture _bottom ＝(x ₂ ，y ₂ ) As the nadir coordinate; finding out a coordinate point P with the minimum x coordinate in all character blocks according to the boundary frame coordinates of each character block in the target picture _left ＝(x ₃ ，y ₃ ) As the leftmost point coordinate.

Step S243: and determining the character area of the target picture as the character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture.

Specifically, the step S243 includes the following steps:

(1) According to the average inclination angle of the upper side of the target picture Pic

And the highest point coordinate P of all character blocks in the target picture Pic _top ＝(x ₀ ，y ₀ ) And determining the upper boundary of the character area of the target picture Pic'.

In particular, the method comprises the following steps of,

P′ _top ＝P _top +[cos(a)，sin(a)]＝(x ₀ +cos(a)，y ₀ + sin (a)), P _top And P' _top The line extension may result in the upper boundary of the text region, as shown in FIG. 5. Wherein

Representing the average inclination angle of the upper side of the target picture Pic'; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p is _top Representing the coordinates of the highest point of all text blocks in the target picture Pic'.

(2) According to the average inclination angle of the lower edge of the target picture Pic

And the lowest point coordinates P of all character blocks in the target picture Pic _bottom ＝(x ₂ ，y ₂ ) And determining the lower boundary of the text area of the target picture Pic' slice.

In particular, the method comprises the following steps of,

P′ _bottom ＝P _bottom +[cos(a)，sin(a)]＝(x ₂ +cos(a)，y ₂ + sin (a)), P _bottom And P' _bottom The extension of the connecting line may result in the lower boundary of the text region, as shown in fig. 5. Wherein, the first and the second end of the pipe are connected with each other,

representing the average inclination angle of the lower side of the target picture Pic'; [ cos (a), sin (a)]The expression angle vector is the coordinate with the radian a by taking (0, 0) as a central point; p _bottom Representing the coordinates of the lowest point of all the text blocks in the target picture Pic'.

(3) According to the average inclination angle of the left side of the target picture Pic

And the coordinates P of the leftmost point of all the character blocks in the target picture Pic _left ＝(x ₃ ，y ₃ ) And determining the left boundary of the character area of the target picture Pic'.

In particular, the method comprises the following steps of,

P′ _left ＝P _left +[cos(a)，sin(a)]＝(x ₃ +cos(a)，y ₃ + sin (a)), adding P _left And P' _left The line extension may result in the left border of the text region, as shown in FIG. 5. Wherein, the first and the second end of the pipe are connected with each other,

representing the average tilt angle to the left of the target picture; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p _left Representing the leftmost point coordinates of all the text blocks in the target picture Pic'.

(4) According to the average inclination angle of the right side of the target picture Pic

And the rightmost point coordinate P of all character blocks in the target picture Pic _right ＝(x ₁ ，y ₁ ) And determining the right boundary of the character area of the target picture Pic'.

In particular, the method comprises the following steps of,

P′ _right ＝P _right +[cos(a)，sin(a)]＝(x ₁ +cos(a)，y ₁ + sin (a)), P _right And P' _right The extension of the connecting line may result in the right border of the text region, as shown in fig. 5. Wherein, the first and the second end of the pipe are connected with each other,

representing the average tilt angle to the right of the target picture Pic'; [ cos (a), sin (a)]The expression angle vector is the coordinate with the radian a by taking (0, 0) as a central point; (ii) a P _right Representing the rightmost point coordinates of all the text blocks in the target picture Pic'.

As shown in fig. 5, a straight line corresponding to the upper boundary, a straight line corresponding to the lower boundary, a straight line corresponding to the left boundary, and a straight line corresponding to the right boundary of the text region are extended, so that 4 intersections can be obtained, and a region surrounded by the 4 intersections is the text region of the file and the picture. Then, the above step S250 may be executed to perform perspective transformation on the text area of the file picture, so as to obtain a corrected text area.

In the following, a practical application scenario is, for example, to input a document picture photographed by a user, and due to a problem of a photographing angle, the document picture exhibits distortion phenomena on an image such as tilt, rotation, and elevation angle, as shown in fig. 7.

(1) The text block of the document picture and its bounding box coordinates are obtained by using an optical character recognition technology (OCR), as shown in fig. 8.

(2) Calculating the inclination angle of the left frame (from the left to the left and up) of each character block according to the frame coordinates of each character block to obtain the inclination angle theta of the left of all the character blocks _left ＝[-9.13，-9.5，-0.0，-4.95，-6.54，-0.0，-3.69，-0.0，-0.0，-6.34，-5.71，-4.57，...]As shown in fig. 9.

(3) Calculating the inclination angles theta of the upper side, the right side and the lower side of all the character blocks by adopting the same method _top 、θ _right 、θ _bottom 。

(4) Remove angular outliers, assume θ _left Score 5 of-6.66, score 95 of 5.60, retained θ _left The value of Li between the 5 th quantile and the 95 th quantile, theta _left ＝[-0.0，-4.95，-6.54，-0.0，-3.69，-0.0，-0.0，-6.34，-5.71，-4.57，...]. To theta _top 、θ _right 、θ _bottom The same process is done.

(5) Respectively calculate theta _left 、θ _top 、θ _right 、θ _bottom To obtain an average value of

(6) By using

The file picture is rotated to obtain the target picture Pic' as shown in fig. 10.

(7) Using an Optical Character Recognition (OCR) module to obtain the character blocks and the boundary coordinates thereof on the target picture Pic', and then executing the steps 2-5 to obtain the target picture Pic

(8) Finding a coordinate point with the minimum y coordinate of all character blocks in the target picture Pic', and setting the coordinate point as P _top ＝(x ₀ ，y ₀ ) = (1864, 588), highest point as shown in fig. 11.

The coordinate point with the maximum x coordinate of all character blocks is set as P _right ＝(x ₁ ，y ₁ ) = (2294, 882), rightmost point as shown in fig. 11.

The coordinate point with the maximum y coordinate of all character blocks is set as P _bottom ＝(x ₂ ，y ₂ ) = (1416, 2974), lowest point as shown in fig. 11.

The coordinate point with the minimum x coordinate of all character blocks is set as P _left ＝(x ₃ ，y ₃ ) = (598, 2862), leftmost point as shown in fig. 11.

(9) The 4 coordinate points plus the angle vector [ cos (a), sin (a) ] can obtain the outermost 4 lines of the character area.

Specifically, step 9-1:

，P′ _top ＝P _top +[cos(a)，sin(a)]= (1864. 99, 587.88), P4 + cos (-0.12), 588+ sin (-0.12)) = (1864.99, 587.88) _t o _p And P' _top The line extension may result in the upper boundary of the text region, as shown in FIG. 12.

Step 9-2:

P′ _right ＝P _right +[cos(a)，sin(a)]= (2293.98, 882.99) = (2294 + cos (1.585), 882+ sin (1.585)) = (P) _right And P' _right The line extension may result in the right boundary of the text region, as shown in FIG. 12.

Step 9-3:

，P′ _bottom ＝P _bottom +[cos(a)，sin(a)]= (1416+cos (3.275), 2974+ sin (3.275)) = (1415.009, 2973.867), P + s _bottom And P' _bottom The line extension may result in the lower boundary of the text region, as shown in fig. 12.

Step 9-4:

，P′ _left ＝P _left +[cos(a)，sin(a)]= (598.10, 2861.005) P598 + cos (4.814), 2862+ sin (4.814)) = _left And P' _left The line extension may result in the left border of the text region, as shown in FIG. 12.

(10) The outermost 4 lines of the text area are extended to obtain 4 intersections, such as 4 black dots shown in fig. 12, and the area surrounded by the 4 intersections is the text area on the document picture.

(11) The character region is projected to a rectangle by perspective transformation, and a picture with corrected elevation angle can be obtained, as shown in fig. 13 below.

The embodiment of the application provides the technical scheme, and the accuracy of subsequent OCR recognition can be improved by correcting the file picture. After the text area is intercepted for the document picture (such as a business request form inside a company) with the same template, the text block with the specific coordinate can be obtained.

The following is an embodiment of an apparatus of the present application, which can be used to execute an embodiment of an automatic correction method for a picture of a document described above in the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the automatic correction method for pictures in the present application.

Fig. 14 is a block diagram of an apparatus for automatically correcting a document picture according to an embodiment of the present application. As shown in fig. 14, the apparatus includes:

the character detection module 1410 is used for extracting character blocks in the file image and the boundary box coordinates of each character block through an OCR technology;

an angle calculation module 1420, configured to determine a boundary inclination angle of each text block according to the boundary box coordinate of each text block;

the inclination calculation module 1430 is configured to determine an average inclination angle of the upper, lower, left, and right sides of the document image according to the boundary inclination angle of each text block;

the area calculating module 1440 is configured to calculate, according to the average tilt angle of the document picture, the highest point coordinate, the lowest point coordinate, the leftmost point coordinate, and the rightmost point coordinate of all the text blocks, determining a text area of the file picture;

the region transformation module 1450 is configured to perform perspective transformation on the text region of the file image to obtain a corrected text region.

The implementation processes of the functions and actions of the modules in the device are specifically described in the implementation processes of the corresponding steps in the automatic correction method for the document and picture, and are not described herein again.

In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

Claims

1. An automatic correction method for file pictures is characterized by comprising the following steps:

extracting character blocks in the file picture and the boundary box coordinates of each character block by an OCR technology;

2. The method of claim 1, wherein determining the boundary tilt angle of each text block according to the bounding box coordinates of each text block comprises:

for each character block, calculating direction vectors of four borders on the upper side, the lower side, the left side and the right side of the character block according to the coordinate of the boundary frame of the character block;

and determining the inclination angles of the upper, lower, left and right side frames of the character block according to the direction vectors of the upper, lower, left and right side frames of the character block and a preset reference vector to obtain the boundary inclination angle of each character block.

3. The method of claim 2, wherein determining the average tilt angle of the file picture from left to right and from the boundary tilt angle of each text block comprises:

and calculating the average value of the residual inclination angles aiming at the upper, lower, left and right frames respectively to obtain the average inclination angles of the upper, lower, left and right sides of the file picture.

4. The method according to claim 1, wherein the determining the text area of the document picture according to the average tilt angle of the document picture, the top point coordinate, the bottom point coordinate, the leftmost point coordinate and the rightmost point coordinate comprises:

determining the left boundary of a character area of the file picture according to the average inclination angle of the left side of the file picture and the coordinates of the leftmost point of all character blocks in the file picture;

5. The method according to claim 1, wherein the determining the text area of the document picture according to the average tilt angle of the document picture, the top, the bottom, the left-most point and the right-most point of all the text blocks comprises:

and determining the character area of the target picture as the character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture.

6. The method of claim 5, wherein said calculating an average tilt angle for the top, bottom, left, and right of the target image comprises:

7. The method of claim 6, wherein the finding of the highest point coordinate, the lowest point coordinate, the leftmost point coordinate, and the rightmost point coordinate of all text blocks in the target picture comprises:

8. The method as claimed in claim 5, wherein the determining the text region of the target picture according to the average tilt angle of the target picture from top to bottom and from left to right and the coordinates of the highest point, the lowest point, the leftmost point and the rightmost point of all text blocks in the target picture comprises:

determining a left boundary of a character area of the target picture according to the average inclination angle of the left side of the target picture and the leftmost point coordinates of all character blocks in the target picture;

and determining the right boundary of the character area of the target picture according to the average inclination angle of the right of the target picture and the coordinates of the rightmost point of all character blocks in the target picture.

9. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the automatic correction method of the document picture according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program executable by a processor to perform the method for automatic correction of a document picture according to any one of claims 1 to 8.