CN115797938A - Automatic correction method of file picture, electronic equipment and storage medium - Google Patents

Automatic correction method of file picture, electronic equipment and storage medium Download PDF

Info

Publication number
CN115797938A
CN115797938A CN202211323131.4A CN202211323131A CN115797938A CN 115797938 A CN115797938 A CN 115797938A CN 202211323131 A CN202211323131 A CN 202211323131A CN 115797938 A CN115797938 A CN 115797938A
Authority
CN
China
Prior art keywords
character
picture
coordinate
boundary
target picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211323131.4A
Other languages
Chinese (zh)
Inventor
简仁贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Emotibot Technologies Ltd
Original Assignee
Emotibot Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Emotibot Technologies Ltd filed Critical Emotibot Technologies Ltd
Priority to CN202211323131.4A priority Critical patent/CN115797938A/en
Publication of CN115797938A publication Critical patent/CN115797938A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application provides an automatic correction method of a file picture, an electronic device and a storage medium, wherein the method comprises the following steps: determining the boundary inclination angle of each character block by extracting the boundary frame coordinates of each character block in the file picture, and further determining the average inclination angles of the upper side, the lower side and the left side of the file picture according to the boundary inclination angle of each character block; determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture as well as the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks; the corrected character region can be obtained by carrying out perspective transformation on the character region of the file picture, the character region of the file picture can be accurately extracted by the scheme, a large amount of training cost is not needed, OCR recognition is carried out on the corrected character region, and the recognition accuracy can be improved.

Description

Automatic correction method of file picture, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an automatic correction method for a file picture, an electronic device, and a computer-readable storage medium.
Background
A document image (document image) refers to a document in a picture format, the picture contains a large amount of text information, but the text in the picture format cannot be directly read by a computer, and an Optical Character Recognition (OCR) technology must be used to detect a text region in the picture and recognize the text image as a text. The text recognition from the document picture has great application value, for example, the text is automatically analyzed to reduce the cost of manual editing, the document picture search is supported, the document picture information is extracted, classified and compared, and the like.
The sources of the document pictures are diversified, and the document pictures may be photographed and scanned by a user, and the distortion of the images such as inclination, rotation, elevation angle and the like appears in the pictures due to the relationship of the photographing or scanning angles, and the distortion can influence the OCR to detect the character area incorrectly, so that the accuracy of the whole OCR is reduced.
Generally, the computer vision edge detection technology can be used for detecting the paper area in the picture and then intercepting the paper area for perspective conversion, but the method has poor effect in practical application because all situations have paper areas and the edges of the paper and the background image are not obvious in many cases, so that the text area cannot be successfully intercepted by the edge detection technology. Another way is to use deep learning technique to collect a large number of pictures and train the target detection model by adding the target detection algorithm, which has the disadvantage of difficulty in obtaining a large number of training pictures. Therefore, how to correct the file picture with a small amount of training set or even without the training set is a practical pain point.
Disclosure of Invention
The embodiment of the application provides an automatic correction method of a file picture, which is used for reducing training cost and accurately extracting a character area.
The embodiment of the application provides an automatic correction method of a file picture, which comprises the following steps:
extracting character blocks in the file picture and the boundary box coordinates of each character block through an OCR technology;
determining the boundary inclination angle of each character block according to the boundary frame coordinates of each character block;
determining the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture according to the boundary inclination angle of each character block;
determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture as well as the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks;
and carrying out perspective transformation on the character area of the file picture to obtain a corrected character area.
In one embodiment, the determining the boundary inclination angle of each text block according to the boundary box coordinates of each text block includes:
for each character block, calculating direction vectors of four side frames of the character block, namely an upper frame, a lower frame, a left frame and a right frame according to the coordinate of the boundary frame of the character block;
and determining the inclination angles of the upper frame, the lower frame, the left frame and the right frame of the character block according to the direction vectors of the upper frame, the lower frame, the left frame and the right frame of the character block and a preset reference vector to obtain the boundary inclination angle of each character block.
In one embodiment, the determining the average tilt angle of the upper, lower, left and right sides of the file picture according to the boundary tilt angle of each text block includes:
according to the inclination angles of the upper frame, the lower frame, the left frame and the right frame of each character block, respectively removing the angle outliers of the upper frame, the lower frame, the left frame and the right frame;
and calculating the average value of the residual inclination angles aiming at the upper, lower, left and right side frames respectively to obtain the average inclination angles of the upper, lower, left and right sides of the file picture.
In an embodiment, the determining the text area of the document picture according to the average tilt angles of the document picture about the top, the bottom, the left-most point and the right-most point, and the highest point coordinate, the lowest point coordinate, the left-most point coordinate and the right-most point coordinate of all the text blocks includes:
determining the upper boundary of a character area of the file picture according to the average inclination angle of the upper side of the file picture and the highest point coordinates of all character blocks in the file picture;
determining the lower boundary of a character area of the file picture according to the average inclination angle of the lower edge of the file picture and the lowest point coordinates of all character blocks in the file picture;
determining a left boundary of a character area of the file picture according to the average inclination angle of the left side of the file picture and the leftmost point coordinates of all character blocks in the file picture;
and determining the right boundary of the character area of the file picture according to the average inclination angle of the right of the file picture and the rightmost point coordinates of all character blocks in the file picture.
In an embodiment, the determining the text area of the document picture according to the average tilt angles of the document picture about the top and the bottom and the left and the right and the coordinates of the highest point, the lowest point, the leftmost point and the rightmost point of all the text blocks includes:
rotating the file picture according to the average inclination angle of the left side of the file picture to obtain a target picture;
calculating the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture, and finding out the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture;
and determining a character area of the target picture as the character area of the file picture according to the average inclination angles of the upper part, the lower part, the left side and the right side of the target picture and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture.
In one embodiment, the calculating the average tilt angle of the target image comprises:
extracting character blocks in the target picture and the boundary box coordinates of each character block through an OCR technology;
determining the boundary inclination angle of each character block in the target picture according to the boundary frame coordinates of each character block in the target picture;
and determining the average inclination angle of the upper part, the lower part, the left part and the right part of the target picture according to the boundary inclination angle of each character block in the target picture.
In an embodiment, the finding of the highest point coordinate, the lowest point coordinate, the leftmost point coordinate, and the rightmost point coordinate of all the text blocks in the target picture includes:
finding a coordinate point with the minimum y coordinate in all the character blocks as the highest point coordinate according to the boundary frame coordinate of each character block in the target picture;
finding a coordinate point with the maximum x coordinate in all the character blocks as the coordinate of the rightmost point according to the coordinate of the boundary frame of each character block in the target picture;
finding a coordinate point with the maximum y coordinate in all the character blocks as the coordinate of the lowest point according to the coordinate of the boundary frame of each character block in the target picture;
and finding a coordinate point with the minimum x coordinate in all the character blocks as the coordinate of the leftmost point according to the coordinate of the boundary frame of each character block in the target picture.
In an embodiment, the determining the text area of the target picture according to the average tilt angles of the target picture from top to bottom and from left to right and the coordinates of the highest point, the lowest point, the leftmost point and the rightmost point of all text blocks in the target picture includes:
determining the upper boundary of a character area of the target picture according to the average inclination angle of the upper side of the target picture and the highest point coordinates of all character blocks in the target picture;
determining the lower boundary of a character area of the target picture according to the average inclination angle of the lower side of the target picture and the lowest point coordinates of all character blocks in the target picture;
determining the left boundary of the character area of the target picture according to the average inclination angle of the left side of the target picture and the leftmost point coordinates of all character blocks in the target picture;
and determining the right boundary of the character area of the target picture according to the average inclination angle of the right of the target picture and the coordinates of the rightmost points of all character blocks in the target picture.
An embodiment of the present application further provides an electronic device, where the electronic device includes:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute the automatic correction method of the file picture.
The embodiment of the application also provides a computer readable storage medium, wherein the storage medium stores a computer program, and the computer program can be executed by a processor to complete the automatic correction method of the file picture.
According to the technical scheme provided by the embodiment of the application, the boundary inclination angle of each character block is determined by extracting the boundary frame coordinate of each character block in the file picture, and then the average inclination angle of the upper part, the lower part, the left part and the right part of the file picture is determined according to the boundary inclination angle of each character block; determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks; the corrected character region can be obtained by carrying out perspective transformation on the character region of the file picture, so that the character region of the file picture can be accurately extracted, a large amount of training cost is not needed, OCR recognition is carried out on the corrected character region, and the recognition accuracy can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for automatically correcting a document picture according to an embodiment of the present disclosure;
FIG. 3 is a diagram of bounding box coordinates of a text block provided by an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating an inclination angle of a left frame of a text block according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a boundary of a text region provided in an embodiment of the present application;
fig. 6 is a detailed flowchart of step S240 provided in the embodiment of the present application;
FIG. 7 is a schematic view of a document picture provided by an embodiment of the present application;
FIG. 8 is a diagram of bounding box coordinates for a block of characters in the picture of the file of FIG. 7;
FIG. 9 is a diagram illustrating the left frame of all text blocks in the file picture shown in FIG. 7;
FIG. 10 is a schematic view of a rotated target picture of the document shown in FIG. 7;
FIG. 11 is a schematic diagram of the highest point, the rightmost point, the lowest point, and the leftmost point in the target picture of FIG. 10;
FIG. 12 is a diagram of four boundaries of a text region in the target picture shown in FIG. 10;
FIG. 13 is a schematic diagram of the corrected text area of FIG. 12;
fig. 14 is a block diagram of an apparatus for automatically correcting a document picture according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Fig. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device 100 may be configured to execute the automatic correction method for a document picture provided in the embodiment of the present application. As shown in fig. 1, the electronic device 100 includes: one or more processors 102, and one or more memories 104 storing processor-executable instructions. The processor 102 is configured to execute an automatic correction method for a document picture provided by the following embodiments of the present application.
The processor 102 may be a gateway, or may be an intelligent terminal, or may be a device including a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capability and/or instruction execution capability, and may process data of other components in the electronic device 100, and may control other components in the electronic device 100 to perform desired functions.
The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 102 to implement the automatic correction method for file pictures described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
In one embodiment, the electronic device 100 shown in FIG. 1 may further include an input device 106, an output device 108, and a data acquisition device 110, which may be interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are merely exemplary and not limiting, and the electronic device 100 may have other components and structures as desired.
The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like. The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like. The data acquisition device 110 may acquire an image of a subject and store the acquired image in the memory 104 for use by other components. Illustratively, the data acquisition device 110 may be a camera.
In an embodiment, the devices in the example electronic device 100 for implementing the automatic document picture correction method according to the embodiment of the present application may be integrally disposed, or may be disposed separately, such as integrally disposing the processor 102, the memory 104, the input device 106, and the output device 108, and disposing the data acquisition device 110 separately.
In an embodiment, the example electronic device 100 for implementing the automatic correction method of the document picture according to the embodiment of the present application may be implemented as an intelligent terminal, such as a smart phone, a tablet computer, a desktop computer, a server, an in-vehicle device, and the like.
Fig. 2 is a schematic flowchart of a method for automatically correcting a document picture according to an embodiment of the present application. The method is performed by an electronic device, and as shown in fig. 2, the method comprises the following steps S210-S250.
Step S210: and extracting character blocks in the file picture and the boundary box coordinates of each character block by an OCR technology.
Optical Character Recognition (OCR) technology is a prior art technique for detecting the location of a block of text and identifying the content of the block of text. A text block refers to the smallest circumscribed rectangle of a string. The file picture refers to a text in a picture format, the file picture comprises a plurality of character blocks, and the boundary box coordinates of the character blocks refer to the coordinates of four vertexes of a minimum circumscribed rectangular box of a character string. As shown in fig. 3, the coordinates of the bounding box of a text block are shown, where the coordinates of the top left vertex are (x 0, y 0), the coordinates of the top right vertex are (x 1, y 1), the coordinates of the bottom right vertex are (x 2, y 2), the coordinates of the bottom left vertex are (x 3, y 3), and the top left vertex can be used as the origin of coordinates (0, 0).
Step S220: and determining the boundary inclination angle of each character block according to the boundary frame coordinates of each character block.
The boundary inclination angle comprises the inclination angles of four frames of the upper frame, the lower frame, the left frame and the right frame of the character block. In an embodiment, the step S220 specifically includes: for each character block, calculating direction vectors of four borders on the upper side, the lower side, the left side and the right side of the character block according to the coordinate of the boundary frame of the character block; and determining the inclination angles of the upper frame, the lower frame, the left frame and the right frame of the character block according to the direction vectors of the upper frame, the lower frame, the left frame and the right frame of the character block and a preset reference vector.
For example, with any text block, (1) first calculate the angle of the left frame (from bottom left to top left) of the text block, specifically, let the direction vector of the left frame be
Figure BDA0003911263200000081
Figure BDA0003911263200000082
And
Figure BDA0003911263200000083
included angle therebetween (i.e., the inclination angle of the left frame)
Figure BDA0003911263200000084
Theta is between-180 and 180. As shown in fig. 4 (1), θ is greater than 0 indicating that the character block needs to be rotated by θ degrees counterclockwise, and as shown in fig. 4 (2), θ is less than 0 indicating that the character block needs to be rotated by θ degrees clockwise. The inclination angle of the left frame of each character block can be recorded as theta left
(2) Calculating the angle of the upper frame (from the upper right to the upper left) of the character block, and setting the direction vector of the upper frame
Figure BDA0003911263200000091
Reference vector
Figure BDA0003911263200000092
Computing
Figure BDA0003911263200000093
And with
Figure BDA0003911263200000094
Arc of (2)
Figure BDA0003911263200000095
The calculation method is the same as the step (1), and the calculation method obtains
Figure BDA0003911263200000096
And
Figure BDA0003911263200000097
angle therebetween (i.e. the angle of inclination of the upper frame)
Figure BDA0003911263200000098
The inclination angle of the upper frame of each character block can be recorded as theta top
(3) Calculating the angle of the right frame (from the lower right to the upper right) of each character block, and setting the direction vector of the right frame
Figure BDA0003911263200000099
Reference vector
Figure BDA00039112632000000910
Computing
Figure BDA00039112632000000911
And
Figure BDA00039112632000000912
arc of
Figure BDA00039112632000000913
The calculation method is the same as the step (1), and the calculation method obtains
Figure BDA00039112632000000914
And with
Figure BDA00039112632000000915
Included angle therebetween (i.e., the inclined angle of the right frame)
Figure BDA00039112632000000916
The inclination angle of the right frame of each text block can be recorded as theta right
(4) Calculating the angle of the lower frame (from left to right) of each character block, and setting the direction vector of the lower frame
Figure BDA00039112632000000917
Reference vector
Figure BDA00039112632000000918
Computing
Figure BDA00039112632000000919
And
Figure BDA00039112632000000920
arc of
Figure BDA00039112632000000921
The calculation method is the same as the step (1), and the calculation method obtains
Figure BDA00039112632000000922
And
Figure BDA00039112632000000923
included angle (namely the inclined angle of the lower frame)
Figure BDA00039112632000000924
The inclination angle of the lower frame of each character block is theta bottom
Step S230: and determining the average inclination angle of the upper part, the lower part, the left part and the right part of the file picture according to the boundary inclination angle of each character block.
The average inclination angles of the left frame and the right frame of the document picture comprise the average value of the inclination angles of the upper frames of all the character blocks, the average value of the inclination angles of the lower frames of all the character blocks, the average value of the inclination angles of the left frames of all the character blocks and the average value of the inclination angles of the right frames of the character blocks.
In an embodiment, the step S230 specifically includes: according to the inclination angles of the upper frame, the lower frame, the left frame and the right frame of each character block, respectively removing the angle outliers of the upper frame, the lower frame, the left frame and the right frame; and calculating the average value of the residual inclination angles aiming at the upper, lower, left and right side frames respectively to obtain the average inclination angles of the upper, lower, left and right sides of the file picture.
The angle outlier refers to an inclination angle that is not within a preset range, for example, the maximum value and the minimum value may be removed. In one embodiment, all the texts may be combinedInclination angle theta of left frame of block left Sorting according to the order from small to large, finding out the value of the 5 th quantile and the value of the 95 th quantile, namely the inclination angles of the 5 th% and the 95 th%, and reserving theta left Values between the 5 th to 95 th quantiles, i.e. removing theta left Theta outside of this range left These values are the angle outliers. Using the same method, θ of all blocks can be removed top Angle outlier of (1), remove θ of all text blocks right Angle outlier of (1), remove θ of all text blocks bottom Angle outliers of (d).
At theta where all blocks are removed top After the angle outliers of (c), the remaining θ is calculated top As the average angle of inclination of the top of the document picture
Figure BDA0003911263200000101
At theta where all blocks are removed bottom After the angle outliers of (a), the remaining theta is calculated bottom As the average angle of inclination of the lower edge of the document picture
Figure BDA0003911263200000102
At theta where all blocks are removed left After the angle outliers of (a), the remaining theta is calculated left As the average tilt angle of the left side of the document picture
Figure BDA0003911263200000103
At theta of removing all character blocks right After the angle outliers of (a), the remaining theta is calculated right As the average tilt angle of the right side of the document picture
Figure BDA0003911263200000104
Step S240: and determining the character area of the file picture according to the average inclination angle of the file picture in the vertical direction, the horizontal direction and the vertical direction, and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all the character blocks.
Specifically, a coordinate point (i.e., a highest point coordinate) at which y coordinates of all character blocks in the file picture are found to be minimum is set as P top =(x 0 ,y 0 ) (ii) a The coordinate point where the x coordinate of all the character blocks is the maximum (i.e. the rightmost coordinate) is set as P right =(x 1 ,y 1 ). The coordinate point where the y coordinate of all character blocks is maximum (i.e., the lowest point coordinate) is set to P bottom =(x 2 ,y 2 ) (ii) a The coordinate point where the x coordinate of all the character blocks is minimum (i.e. the leftmost point coordinate) is set as P left =(x 3 ,y 3 )。
In an embodiment, the step S240 specifically includes:
(1) and determining the upper boundary of the character area of the file picture according to the average inclination angle of the upper side of the file picture and the highest point coordinates of all character blocks in the file picture.
In particular, the method comprises the following steps of,
Figure BDA0003911263200000111
P′ top =P top +[cos(a),sin(a)]=(x 0 +cos(a),y 0 + sin (a)), P top And P' top The line extension may result in the upper boundary of the text region, as shown in FIG. 5. Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003911263200000112
representing the average inclination angle of the upper side of the file picture; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p top Representing the highest point coordinates.
(2) And determining the lower boundary of the character area of the file picture according to the average inclination angle of the lower edge of the file picture and the lowest point coordinates of all character blocks in the file picture.
(3) In particular, the method comprises the following steps of,
Figure BDA0003911263200000113
P′ bottom =P bottom +[cos(a),sin(a)]=(x 2 +cos(a),y 2 + sin (a)), P bottom And P' bottom The extension of the connecting line may result in the lower boundary of the text region, as shown in fig. 5. Wherein the content of the first and second substances,
Figure BDA0003911263200000114
representing the average inclination angle of the lower edge of the file picture; [ cos (a), sin (a)]The expression angle vector is the coordinate with the radian a by taking (0, 0) as a central point; p is bottom Representing the nadir coordinates.
(4) And determining the left boundary of the text area of the file picture according to the average inclination angle of the left side of the file picture and the leftmost point coordinates of all the text blocks in the file picture.
In particular, the method comprises the following steps of,
Figure BDA0003911263200000115
P′ left =P left +[cos(a),sin(a)]=(x 3 +cos(a),y 3 + sin (a)), P lef t and P' left The line extension may result in the left border of the text region, as shown in FIG. 5. Wherein the content of the first and second substances,
Figure BDA0003911263200000116
representing the average tilt angle of the left side of the file picture; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p is left Representing the leftmost point coordinates.
(5) And determining the right boundary of the character area of the file picture according to the average inclination angle of the right of the file picture and the coordinates of the rightmost point of all the character blocks in the file picture.
In particular, the method comprises the following steps of,
Figure BDA0003911263200000117
P′ right =P right +[cos(a),sin(a)]=(x 1 +cos(a),y 1 + sin (a)), P right And P' right The extension of the connecting line may result in the right border of the text region, as shown in fig. 5. Wherein the content of the first and second substances,
Figure BDA0003911263200000118
representing the average inclination angle of the right side of the file picture; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p right Representing the rightmost point coordinates.
As shown in fig. 5, a straight line corresponding to the upper boundary, a straight line corresponding to the lower boundary, a straight line corresponding to the left boundary, and a straight line corresponding to the right boundary of the text region are extended, so that 4 intersections can be obtained, and a region surrounded by the 4 intersections is the text region of the file and the picture.
Step S250: and carrying out perspective transformation on the character area of the file picture to obtain a corrected character area.
Specifically, an image corresponding to the text region may be captured from the document picture, and then the image is projected to the rectangle by perspective transformation, so that the text region picture with the elevation angle corrected may be obtained. The perspective transformation can be realized by adopting the prior art, specifically, the width and the height of the regular rectangle can be determined according to the coordinates of the four vertexes of the character area before correction, and the coordinates of the four vertexes of the regular rectangle can be obtained. And calculating a transformation matrix of perspective transformation according to the four vertex coordinates of the character area before correction and the four vertex coordinates of the regular rectangle, and then performing transformation of the transformation matrix on the whole character area image before correction to realize image correction.
According to the technical scheme provided by the embodiment of the application, the boundary inclination angle of each text block is determined by extracting the boundary frame coordinates of each text block in the file picture, and then the upper, lower, left and right average inclination angles of the file picture are determined according to the boundary inclination angle of each text block; determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture as well as the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks; the corrected character region can be obtained by carrying out perspective transformation on the character region of the file picture, so that the character region of the file picture can be accurately extracted, a large amount of training cost is not needed, OCR recognition is carried out on the corrected character region, and the recognition accuracy can be improved.
In other embodiments, if the inclination of the file picture is severe, there may be an error in text block detection, which may cause an error in text region detection, so as to improve the accuracy of text region detection. As shown in fig. 6, step S240 may include the following steps S241 to S243.
Step S241: and rotating the file picture according to the average inclination angle of the left side of the file picture to obtain a target picture.
Specifically, the average inclination angle of the left side of the file picture is used
Figure BDA0003911263200000131
And rotating the file picture Pic as a rotation angle of the whole picture to obtain a target picture Pic'.
Step S242: and calculating the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture, and finding out the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture.
The average tilt angles of the target picture Pic' up, down, left, and right may refer to the calculation process of the average tilt angles of the file picture up, down, left, and right in the above embodiment. Specifically, character blocks in the target picture and the boundary frame coordinates of each character block are extracted through an OCR technology; determining the boundary inclination angle of each character block in the target picture according to the boundary frame coordinates of each character block in the target picture; and determining the average inclination angle of the upper part, the lower part, the left part and the right part of the target picture according to the boundary inclination angle of each character block in the target picture. For a specific process, reference may be made to the above steps S210 to S230, which are not described herein again.
For the purpose of distinction, the average inclination angles of the upper, lower, left and right sides of the document picture are recorded as
Figure BDA0003911263200000132
Figure BDA0003911263200000133
The average angle of inclination of the upper, lower, left and right sides of the target picture can be recorded as
Figure BDA0003911263200000134
Figure BDA0003911263200000135
Further, according to the boundary frame coordinates of each character block in the target picture, a coordinate point P with the minimum y coordinate in all the character blocks can be found top =(x 0 ,y 0 ) As the highest point coordinate; finding out a coordinate point P with the maximum x coordinate in all character blocks according to the boundary frame coordinate of each character block in the target picture right =(x 1 ,y 1 ) As the rightmost point coordinate; finding a coordinate point P with the maximum y coordinate in all the character blocks according to the boundary frame coordinate of each character block in the target picture bottom =(x 2 ,y 2 ) As the nadir coordinate; finding out a coordinate point P with the minimum x coordinate in all character blocks according to the boundary frame coordinates of each character block in the target picture left =(x 3 ,y 3 ) As the leftmost point coordinate.
Step S243: and determining the character area of the target picture as the character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture.
Specifically, the step S243 includes the following steps:
(1) According to the average inclination angle of the upper side of the target picture Pic
Figure BDA0003911263200000141
And the highest point coordinate P of all character blocks in the target picture Pic top =(x 0 ,y 0 ) And determining the upper boundary of the character area of the target picture Pic'.
In particular, the method comprises the following steps of,
Figure BDA0003911263200000142
P′ top =P top +[cos(a),sin(a)]=(x 0 +cos(a),y 0 + sin (a)), P top And P' top The line extension may result in the upper boundary of the text region, as shown in FIG. 5. Wherein
Figure BDA0003911263200000143
Representing the average inclination angle of the upper side of the target picture Pic'; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p is top Representing the coordinates of the highest point of all text blocks in the target picture Pic'.
(2) According to the average inclination angle of the lower edge of the target picture Pic
Figure BDA0003911263200000144
And the lowest point coordinates P of all character blocks in the target picture Pic bottom =(x 2 ,y 2 ) And determining the lower boundary of the text area of the target picture Pic' slice.
In particular, the method comprises the following steps of,
Figure BDA0003911263200000145
P′ bottom =P bottom +[cos(a),sin(a)]=(x 2 +cos(a),y 2 + sin (a)), P bottom And P' bottom The extension of the connecting line may result in the lower boundary of the text region, as shown in fig. 5. Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003911263200000146
representing the average inclination angle of the lower side of the target picture Pic'; [ cos (a), sin (a)]The expression angle vector is the coordinate with the radian a by taking (0, 0) as a central point; p bottom Representing the coordinates of the lowest point of all the text blocks in the target picture Pic'.
(3) According to the average inclination angle of the left side of the target picture Pic
Figure BDA0003911263200000147
And the coordinates P of the leftmost point of all the character blocks in the target picture Pic left =(x 3 ,y 3 ) And determining the left boundary of the character area of the target picture Pic'.
In particular, the method comprises the following steps of,
Figure BDA0003911263200000148
P′ left =P left +[cos(a),sin(a)]=(x 3 +cos(a),y 3 + sin (a)), adding P left And P' left The line extension may result in the left border of the text region, as shown in FIG. 5. Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003911263200000151
representing the average tilt angle to the left of the target picture; [ cos (a), sin (a)]Representing an angle vector, which is a coordinate with (0, 0) as a central point and a radian a; p left Representing the leftmost point coordinates of all the text blocks in the target picture Pic'.
(4) According to the average inclination angle of the right side of the target picture Pic
Figure BDA0003911263200000152
And the rightmost point coordinate P of all character blocks in the target picture Pic right =(x 1 ,y 1 ) And determining the right boundary of the character area of the target picture Pic'.
In particular, the method comprises the following steps of,
Figure BDA0003911263200000153
P′ right =P right +[cos(a),sin(a)]=(x 1 +cos(a),y 1 + sin (a)), P right And P' right The extension of the connecting line may result in the right border of the text region, as shown in fig. 5. Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003911263200000154
representing the average tilt angle to the right of the target picture Pic'; [ cos (a), sin (a)]The expression angle vector is the coordinate with the radian a by taking (0, 0) as a central point; (ii) a P right Representing the rightmost point coordinates of all the text blocks in the target picture Pic'.
As shown in fig. 5, a straight line corresponding to the upper boundary, a straight line corresponding to the lower boundary, a straight line corresponding to the left boundary, and a straight line corresponding to the right boundary of the text region are extended, so that 4 intersections can be obtained, and a region surrounded by the 4 intersections is the text region of the file and the picture. Then, the above step S250 may be executed to perform perspective transformation on the text area of the file picture, so as to obtain a corrected text area.
In the following, a practical application scenario is, for example, to input a document picture photographed by a user, and due to a problem of a photographing angle, the document picture exhibits distortion phenomena on an image such as tilt, rotation, and elevation angle, as shown in fig. 7.
(1) The text block of the document picture and its bounding box coordinates are obtained by using an optical character recognition technology (OCR), as shown in fig. 8.
(2) Calculating the inclination angle of the left frame (from the left to the left and up) of each character block according to the frame coordinates of each character block to obtain the inclination angle theta of the left of all the character blocks left =[-9.13,-9.5,-0.0,-4.95,-6.54,-0.0,-3.69,-0.0,-0.0,-6.34,-5.71,-4.57,...]As shown in fig. 9.
(3) Calculating the inclination angles theta of the upper side, the right side and the lower side of all the character blocks by adopting the same method top 、θ right 、θ bottom
(4) Remove angular outliers, assume θ left Score 5 of-6.66, score 95 of 5.60, retained θ left The value of Li between the 5 th quantile and the 95 th quantile, theta left =[-0.0,-4.95,-6.54,-0.0,-3.69,-0.0,-0.0,-6.34,-5.71,-4.57,...]. To theta top 、θ right 、θ bottom The same process is done.
(5) Respectively calculate theta left 、θ top 、θ right 、θ bottom To obtain an average value of
Figure BDA0003911263200000161
Figure BDA0003911263200000162
(6) By using
Figure BDA0003911263200000163
The file picture is rotated to obtain the target picture Pic' as shown in fig. 10.
(7) Using an Optical Character Recognition (OCR) module to obtain the character blocks and the boundary coordinates thereof on the target picture Pic', and then executing the steps 2-5 to obtain the target picture Pic
Figure BDA0003911263200000164
(8) Finding a coordinate point with the minimum y coordinate of all character blocks in the target picture Pic', and setting the coordinate point as P top =(x 0 ,y 0 ) = (1864, 588), highest point as shown in fig. 11.
The coordinate point with the maximum x coordinate of all character blocks is set as P right =(x 1 ,y 1 ) = (2294, 882), rightmost point as shown in fig. 11.
The coordinate point with the maximum y coordinate of all character blocks is set as P bottom =(x 2 ,y 2 ) = (1416, 2974), lowest point as shown in fig. 11.
The coordinate point with the minimum x coordinate of all character blocks is set as P left =(x 3 ,y 3 ) = (598, 2862), leftmost point as shown in fig. 11.
(9) The 4 coordinate points plus the angle vector [ cos (a), sin (a) ] can obtain the outermost 4 lines of the character area.
Specifically, step 9-1:
Figure BDA0003911263200000165
,P′ top =P top +[cos(a),sin(a)]= (1864. 99, 587.88), P4 + cos (-0.12), 588+ sin (-0.12)) = (1864.99, 587.88) t o p And P' top The line extension may result in the upper boundary of the text region, as shown in FIG. 12.
Step 9-2:
Figure BDA0003911263200000171
P′ right =P right +[cos(a),sin(a)]= (2293.98, 882.99) = (2294 + cos (1.585), 882+ sin (1.585)) = (P) right And P' right The line extension may result in the right boundary of the text region, as shown in FIG. 12.
Step 9-3:
Figure BDA0003911263200000172
,P′ bottom =P bottom +[cos(a),sin(a)]= (1416+cos (3.275), 2974+ sin (3.275)) = (1415.009, 2973.867), P + s bottom And P' bottom The line extension may result in the lower boundary of the text region, as shown in fig. 12.
Step 9-4:
Figure BDA0003911263200000173
,P′ left =P left +[cos(a),sin(a)]= (598.10, 2861.005) P598 + cos (4.814), 2862+ sin (4.814)) = left And P' left The line extension may result in the left border of the text region, as shown in FIG. 12.
(10) The outermost 4 lines of the text area are extended to obtain 4 intersections, such as 4 black dots shown in fig. 12, and the area surrounded by the 4 intersections is the text area on the document picture.
(11) The character region is projected to a rectangle by perspective transformation, and a picture with corrected elevation angle can be obtained, as shown in fig. 13 below.
The embodiment of the application provides the technical scheme, and the accuracy of subsequent OCR recognition can be improved by correcting the file picture. After the text area is intercepted for the document picture (such as a business request form inside a company) with the same template, the text block with the specific coordinate can be obtained.
The following is an embodiment of an apparatus of the present application, which can be used to execute an embodiment of an automatic correction method for a picture of a document described above in the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the automatic correction method for pictures in the present application.
Fig. 14 is a block diagram of an apparatus for automatically correcting a document picture according to an embodiment of the present application. As shown in fig. 14, the apparatus includes:
the character detection module 1410 is used for extracting character blocks in the file image and the boundary box coordinates of each character block through an OCR technology;
an angle calculation module 1420, configured to determine a boundary inclination angle of each text block according to the boundary box coordinate of each text block;
the inclination calculation module 1430 is configured to determine an average inclination angle of the upper, lower, left, and right sides of the document image according to the boundary inclination angle of each text block;
the area calculating module 1440 is configured to calculate, according to the average tilt angle of the document picture, the highest point coordinate, the lowest point coordinate, the leftmost point coordinate, and the rightmost point coordinate of all the text blocks, determining a text area of the file picture;
the region transformation module 1450 is configured to perform perspective transformation on the text region of the file image to obtain a corrected text region.
The implementation processes of the functions and actions of the modules in the device are specifically described in the implementation processes of the corresponding steps in the automatic correction method for the document and picture, and are not described herein again.
In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

Claims (10)

1. An automatic correction method for file pictures is characterized by comprising the following steps:
extracting character blocks in the file picture and the boundary box coordinates of each character block by an OCR technology;
determining the boundary inclination angle of each character block according to the boundary frame coordinates of each character block;
determining the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture according to the boundary inclination angle of each character block;
determining a character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the file picture as well as the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks;
and carrying out perspective transformation on the character area of the file picture to obtain a corrected character area.
2. The method of claim 1, wherein determining the boundary tilt angle of each text block according to the bounding box coordinates of each text block comprises:
for each character block, calculating direction vectors of four borders on the upper side, the lower side, the left side and the right side of the character block according to the coordinate of the boundary frame of the character block;
and determining the inclination angles of the upper, lower, left and right side frames of the character block according to the direction vectors of the upper, lower, left and right side frames of the character block and a preset reference vector to obtain the boundary inclination angle of each character block.
3. The method of claim 2, wherein determining the average tilt angle of the file picture from left to right and from the boundary tilt angle of each text block comprises:
according to the inclination angles of the upper frame, the lower frame, the left frame and the right frame of each character block, respectively removing the angle outliers of the upper frame, the lower frame, the left frame and the right frame;
and calculating the average value of the residual inclination angles aiming at the upper, lower, left and right frames respectively to obtain the average inclination angles of the upper, lower, left and right sides of the file picture.
4. The method according to claim 1, wherein the determining the text area of the document picture according to the average tilt angle of the document picture, the top point coordinate, the bottom point coordinate, the leftmost point coordinate and the rightmost point coordinate comprises:
determining the upper boundary of a character area of the file picture according to the average inclination angle of the upper side of the file picture and the highest point coordinates of all character blocks in the file picture;
determining the lower boundary of a character area of the file picture according to the average inclination angle of the lower edge of the file picture and the lowest point coordinates of all character blocks in the file picture;
determining the left boundary of a character area of the file picture according to the average inclination angle of the left side of the file picture and the coordinates of the leftmost point of all character blocks in the file picture;
and determining the right boundary of the character area of the file picture according to the average inclination angle of the right of the file picture and the rightmost point coordinates of all character blocks in the file picture.
5. The method according to claim 1, wherein the determining the text area of the document picture according to the average tilt angle of the document picture, the top, the bottom, the left-most point and the right-most point of all the text blocks comprises:
rotating the file picture according to the average inclination angle of the left side of the file picture to obtain a target picture;
calculating the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture, and finding out the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture;
and determining the character area of the target picture as the character area of the file picture according to the average inclination angles of the upper part, the lower part, the left part and the right part of the target picture and the highest point coordinate, the lowest point coordinate, the leftmost point coordinate and the rightmost point coordinate of all character blocks in the target picture.
6. The method of claim 5, wherein said calculating an average tilt angle for the top, bottom, left, and right of the target image comprises:
extracting character blocks in the target picture and the boundary box coordinates of each character block through an OCR technology;
determining the boundary inclination angle of each character block in the target picture according to the boundary frame coordinates of each character block in the target picture;
and determining the average inclination angle of the upper part, the lower part, the left part and the right part of the target picture according to the boundary inclination angle of each character block in the target picture.
7. The method of claim 6, wherein the finding of the highest point coordinate, the lowest point coordinate, the leftmost point coordinate, and the rightmost point coordinate of all text blocks in the target picture comprises:
finding a coordinate point with the minimum y coordinate in all the character blocks as the highest point coordinate according to the boundary frame coordinate of each character block in the target picture;
finding a coordinate point with the maximum x coordinate in all the character blocks as the coordinate of the rightmost point according to the coordinate of the boundary frame of each character block in the target picture;
finding a coordinate point with the maximum y coordinate in all the character blocks as the coordinate of the lowest point according to the coordinate of the boundary frame of each character block in the target picture;
and finding a coordinate point with the minimum x coordinate in all the character blocks as the coordinate of the leftmost point according to the coordinate of the boundary frame of each character block in the target picture.
8. The method as claimed in claim 5, wherein the determining the text region of the target picture according to the average tilt angle of the target picture from top to bottom and from left to right and the coordinates of the highest point, the lowest point, the leftmost point and the rightmost point of all text blocks in the target picture comprises:
determining the upper boundary of a character area of the target picture according to the average inclination angle of the upper side of the target picture and the highest point coordinates of all character blocks in the target picture;
determining the lower boundary of a character area of the target picture according to the average inclination angle of the lower side of the target picture and the lowest point coordinates of all character blocks in the target picture;
determining a left boundary of a character area of the target picture according to the average inclination angle of the left side of the target picture and the leftmost point coordinates of all character blocks in the target picture;
and determining the right boundary of the character area of the target picture according to the average inclination angle of the right of the target picture and the coordinates of the rightmost point of all character blocks in the target picture.
9. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute the automatic correction method of the document picture according to any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program executable by a processor to perform the method for automatic correction of a document picture according to any one of claims 1 to 8.
CN202211323131.4A 2022-10-27 2022-10-27 Automatic correction method of file picture, electronic equipment and storage medium Pending CN115797938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211323131.4A CN115797938A (en) 2022-10-27 2022-10-27 Automatic correction method of file picture, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211323131.4A CN115797938A (en) 2022-10-27 2022-10-27 Automatic correction method of file picture, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115797938A true CN115797938A (en) 2023-03-14

Family

ID=85434004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211323131.4A Pending CN115797938A (en) 2022-10-27 2022-10-27 Automatic correction method of file picture, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115797938A (en)

Similar Documents

Publication Publication Date Title
WO2022148192A1 (en) Image processing method, image processing apparatus, and non-transitory storage medium
US7697776B2 (en) Model-based dewarping method and apparatus
EP2536122B1 (en) Image processing method, image processing device and scanner
WO2021233266A1 (en) Edge detection method and apparatus, and electronic device and storage medium
US10140691B2 (en) Correcting perspective distortion in double-page spread images
CN109598185B (en) Image recognition translation method, device and equipment and readable storage medium
CN110647882A (en) Image correction method, device, equipment and storage medium
EP3182365B1 (en) Writing board detection and correction
US10977511B2 (en) Optical character recognition of series of images
WO2021051527A1 (en) Image segmentation-based text positioning method, apparatus and device, and storage medium
WO2023284502A1 (en) Image processing method and apparatus, device, and storage medium
CN112528776B (en) Text line correction method and device
JP6542230B2 (en) Method and system for correcting projected distortion
CN112926421A (en) Image processing method and apparatus, electronic device, and storage medium
JP3471578B2 (en) Line direction determining device, image tilt detecting device, and image tilt correcting device
US20160343142A1 (en) Object Boundary Detection in an Image
CN112906532A (en) Image processing method and apparatus, electronic device, and storage medium
CN115797938A (en) Automatic correction method of file picture, electronic equipment and storage medium
US11699294B2 (en) Optical character recognition of documents having non-coplanar regions
CN113920525A (en) Text correction method, device, equipment and storage medium
JP3303246B2 (en) Image processing device
CN112434696A (en) Text direction correction method, device, equipment and storage medium
CN113538449A (en) Image correction method, device, server and storage medium
CN113486828B (en) Image processing method, device, equipment and storage medium
EP4156095A1 (en) Image processing system, image processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination