CN113065536B - Method of processing table, computing device, and computer-readable storage medium - Google Patents

Method of processing table, computing device, and computer-readable storage medium Download PDF

Info

Publication number
CN113065536B
CN113065536B CN202110616829.4A CN202110616829A CN113065536B CN 113065536 B CN113065536 B CN 113065536B CN 202110616829 A CN202110616829 A CN 202110616829A CN 113065536 B CN113065536 B CN 113065536B
Authority
CN
China
Prior art keywords
cell
text box
target
determining
cells
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110616829.4A
Other languages
Chinese (zh)
Other versions
CN113065536A (en
Inventor
张世坤
李景阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ALLIN TECHNOLOGY CO.,LTD.
Original Assignee
Beijing Ouying Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ouying Information Technology Co Ltd filed Critical Beijing Ouying Information Technology Co Ltd
Priority to CN202110616829.4A priority Critical patent/CN113065536B/en
Publication of CN113065536A publication Critical patent/CN113065536A/en
Application granted granted Critical
Publication of CN113065536B publication Critical patent/CN113065536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a method, a computing device, and a computer-readable storage medium for processing a table. The method comprises the following steps: performing text box detection on the distorted image to acquire position data of a plurality of text boxes in the distorted image; performing table line detection on the distorted image to acquire a plurality of table lines in the distorted image; determining a plurality of first cells through a maximum matching row and column algorithm based on the position data of the plurality of text boxes; curve fitting the plurality of first cells based on the plurality of table lines to determine fitted curve segments and extended line data around each first cell; and performing connected component correction on the plurality of first cells based on the fitted curve segments and the extended line data around each first cell to determine a plurality of cells of the table and text content in each cell.

Description

Method of processing table, computing device, and computer-readable storage medium
Technical Field
The present invention relates generally to the field of image processing, and more particularly, to a method, computing device, and computer-readable storage medium for processing a table.
Background
Currently, there is a need for recognizing text in images in many application fields, and thus various Optical Character Recognition (OCR) technologies have been developed. Tables are often included in academic papers or scientific reports to describe academic progress or experimental results. In the medical field, various test results are also often presented in a table format. Various schemes have also been proposed for table recognition in images. However, in practical application scenarios, distorted images are often encountered, so that the tables in such images are also distorted tables, and how to accurately identify these distorted tables to reconstruct the structured tables does not currently have a good solution.
For the distortion table recognition, the mainstream method at present is to perform distortion correction, affine transformation, and the like on an image on the basis of the conventional table recognition technology to solve simple deformation. Further, there are a graph convolution method based on deep learning, a detection method based on a table line, and the like.
However, the conventional method is low in efficiency and weak in robustness, an algorithm of a warped table is not solved, a graph convolution method based on deep learning excessively depends on a text box detection result, if the result is missing, the whole result is easy to be wrong, the detection method based on table lines requires too many types of data of the warped picture to be trained, the data labeling workload is huge, the problem of line breakage exists, the relation between each line breakage cannot be accurately judged, and the table structure cannot be determined.
Layout analysis after OCR recognition for natural scene applications is much affected by the scene of picture distortion. Therefore, the problem that structured extraction cannot be carried out after distortion is effectively solved, and the adaptability of OCR is improved.
Disclosure of Invention
In view of at least one of the above problems, the present invention provides a scheme for processing a table, which accurately determines each cell and its text content in a warped table by performing a reasonable operation on text box detection information and table line detection information using a topological principle.
According to one aspect of the present invention, a method of processing a form is provided. The method comprises the following steps: performing text box detection on the distorted image to acquire position data of a plurality of text boxes in the distorted image; performing table line detection on the distorted image to acquire a plurality of table lines in the distorted image; determining a plurality of first cells through a maximum matching row and column algorithm based on the position data of the plurality of text boxes; curve fitting the plurality of first cells based on the plurality of table lines to determine fitted curve segments and extended line data around each first cell; and performing connected component correction on the plurality of first cells based on the fitted curve segments and the extended line data around each first cell to determine a plurality of cells of the table and text content in each cell.
According to another aspect of the invention, a computing device is provided. The computing device includes: at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform steps according to the above-described method.
According to yet another aspect of the present invention, a computer-readable storage medium is provided, having stored thereon computer program code, which when executed performs the method as described above.
In some embodiments, the location data for the plurality of text boxes comprises vertex coordinates, center point coordinates, width data, and height data for each text box, and wherein determining the first plurality of cells by the maximum match row-column algorithm comprises: arranging the plurality of text boxes according to the coordinates of the central point; determining at least one horizontal pairing combination and at least one vertical pairing combination of the plurality of text boxes respectively, wherein each horizontal pairing combination comprises one or more horizontal pairings, each horizontal pairing comprises one text box and a next horizontal text box, each vertical pairing combination comprises one or more vertical pairings, and each vertical pairing comprises one text box and a next vertical text box; selecting one horizontal pairing combination with the largest number of horizontal pairings from the at least one horizontal pairing combination to determine horizontal vectors of the plurality of text boxes, and selecting one vertical pairing combination with the largest number of vertical pairings from the at least one vertical pairing combination to determine vertical vectors of the plurality of text boxes; determining a first direction and a second direction of the plurality of text boxes based on horizontal vectors and vertical vectors of the plurality of text boxes, wherein the second direction is perpendicular to the first direction; and determining the plurality of first cells based on the position data of the plurality of text boxes, the first direction, and the second direction.
In some embodiments, determining at least one horizontal pairing combination and at least one vertical pairing combination of the plurality of text boxes, respectively, comprises: for a target text box of the plurality of text boxes, determining whether a top margin or a bottom margin of a next horizontal text box of the target text box from the target text box is less than a first predetermined threshold; if the upper margin or the lower margin of the next horizontal text box of the target text box is determined to be smaller than the first preset threshold value, determining whether the distance between the center position of the next horizontal text box and the target text box is larger than a second preset threshold value; if it is determined that the next horizontal text box is farther from the target text box by the second predetermined threshold distance, organizing the target text box and the next horizontal text box into a horizontal pair labeled as a first value; and organizing the target text box and the next horizontal text box into a horizontal pair labeled as a second value if it is determined that the top margin or the bottom margin of the next horizontal text box from the target text box is greater than or equal to the first predetermined threshold or it is determined that the distance between the center positions of the next horizontal text box and the target text box is less than or equal to the second predetermined threshold.
In some embodiments, determining at least one horizontal pairing combination and at least one vertical pairing combination of the plurality of text boxes, respectively, comprises: for a target text box of the plurality of text boxes, determining whether a left or right margin of a next vertical text box of the target text box from the target text box is less than a first predetermined threshold; if the left distance or the right distance between the next vertical text box of the target text box and the target text box is determined to be smaller than the first preset threshold value, determining whether the distance between the center position of the next vertical text box and the target text box is larger than a second preset threshold value; if it is determined that the next vertical text box is farther from the target text box by the distance greater than the second predetermined threshold, organizing the target text box and the next vertical text box into a vertical pair labeled as a first value; and organizing the target text box and the next vertical text box into a vertical pair labeled as a second value if it is determined that the left or right distance between the next vertical text box and the target text box is greater than or equal to the first predetermined threshold or that the center position distance between the next vertical text box and the target text box is less than or equal to the second predetermined threshold.
In some embodiments, determining fitted curve segments and extended line data around each first cell comprises: fitting a peripheral curve of the plurality of first cells based on the table lines around the plurality of first cells and recording, for each first cell, a fitted curve segment of the peripheral curve around the first cell; and determining a fitted curve segment of a previous first cell of the first cells as extended line data of the first cells based on surrounding curves of the plurality of first cells.
In some embodiments, determining the plurality of cells of the table and the text content in each cell comprises: for each first cell of the plurality of first cells, determining a perimeter line score for four perimeter lines of the first cell; determining a text box score for the first cell based on the location data for the text box; determining a total score for the first cell based on the first cell's peripheral line score and text box score; selecting a first cell with the highest total score from the plurality of first cells as a target first cell; determining row and column connected domains of the target first cell based on the fitted curve segment and the extended line data around the target first cell; and correcting the plurality of first cells based on the row and column connected domains of the target first cell to determine a plurality of cells of the table.
In some embodiments, determining the total score value for the first cell based on the peripheral line score and the text box score for the first cell comprises: determining a plurality of primary intersection points based on the plurality of table lines; determining a first cell of the plurality of first cells based on the positional relationship of the plurality of primary intersection points and the plurality of first cells; and sequentially determining the total score of each first cell in the plurality of first cells from the first cell.
In some embodiments, selecting the first cell with the highest total score from the plurality of first cells as the target first cell comprises: determining whether there are at least two first cells with the highest total score among the plurality of first cells; and in response to determining that there are at least two highest overall score first cells in the plurality of first cells, determining the target first cell based on location data of the at least two highest overall score first cells.
In some embodiments, determining the row connected domain of the target first cell comprises: determining a pixel abscissa and a pixel ordinate of pixels of the plurality of table lines in the region of the target first cell; determining a fitted ordinate of the pixel based on the pixel abscissa and fitted curve segment or horizontally extended line data of the target first cell; determining whether an average difference between the fitted ordinate and the pixel ordinate is less than a predetermined threshold; and in response to determining that the average difference between the fitted ordinate and the pixel ordinate is less than the predetermined threshold, row-connecting the target first cell with its horizontally-oriented previous first cell.
In some embodiments, determining the column connected domain of the target first cell comprises: determining a pixel abscissa and a pixel ordinate of pixels of the plurality of table lines in the region of the target first cell; determining a fitted abscissa of the pixel based on the pixel ordinate and fitted curve segment or vertically extended line data of the target first cell; determining whether an average difference between the fitted abscissa and the pixel abscissa is less than a predetermined threshold; and column communicating the target first cell with a first cell vertically preceding the target first cell in response to determining that the average difference between the fitted abscissa and the pixel abscissa is less than the predetermined threshold.
In some embodiments, determining the plurality of cells and the text content in each cell of the table further comprises: determining whether the cell contains one text box or at least two text boxes; in response to determining that the cell contains a text box, taking the text content in the text box as the text content in the cell; and in response to determining that the cell contains at least two text boxes, merging text content in the at least two text boxes as text content in the cell.
Drawings
The invention will be better understood and other objects, details, features and advantages thereof will become more apparent from the following description of specific embodiments of the invention given with reference to the accompanying drawings.
FIG. 1 shows a schematic diagram of a system for implementing a method of processing a form according to an embodiment of the invention.
FIG. 2 illustrates a flow diagram of a method for processing a form according to some embodiments of the invention.
FIG. 3 sets forth a flow chart illustrating an exemplary method for determining a first plurality of cells via a maximum match rank algorithm according to embodiments of the present invention.
FIG. 4 illustrates a flow diagram of an exemplary method for determining a horizontal pairing in a plurality of text boxes according to some embodiments of the invention.
FIG. 5 illustrates a flow diagram of an exemplary method for determining a vertical pairing in a plurality of text boxes according to some embodiments of the invention.
FIG. 6 sets forth a flow chart illustrating an exemplary method for connected component correction of a first cell to determine a cell of a table according to embodiments of the present invention.
FIG. 7 shows a schematic of a peripheral curve according to an example of the invention.
FIG. 8 illustrates a flow diagram of a method for determining a row connected domain of a target first cell in accordance with an embodiment of the present invention.
FIG. 9 illustrates a flow diagram of a method for determining a column connected domain of a target first cell in accordance with an embodiment of the present invention.
FIG. 10 illustrates a block diagram of a computing device suitable for implementing embodiments of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In the following description, for the purposes of illustrating various inventive embodiments, certain specific details are set forth in order to provide a thorough understanding of the various inventive embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details. In other instances, well-known devices, structures and techniques associated with this application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.
Throughout the specification and claims, the word "comprise" and variations thereof, such as "comprises" and "comprising," are to be understood as an open, inclusive meaning, i.e., as being interpreted to mean "including, but not limited to," unless the context requires otherwise.
Reference throughout this specification to "one embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment. Thus, the appearances of the phrases "in one embodiment" or "in some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the terms first, second, third, fourth, etc. used in the description and in the claims, are used for distinguishing between various objects for clarity of description only and do not limit the size, other order, etc. of the objects described therein.
Fig. 1 shows a schematic diagram of a system 1 for implementing a method of processing a form according to an embodiment of the invention. As shown in fig. 1, system 1 includes a computing device 10, a server 20, and a network 30. Computing device 10 and server 20 may interact with data via network 30. Here, the server 20 may be, for example, a server of a service provider dedicated to providing the form restructuring service, and the computing device 10 is connected to the server 20 to perform a corresponding operation based on a command from the server 20. The computing device 10 may include at least one processor 110 and at least one memory 120 coupled to the at least one processor 110, the memory 120 having stored therein instructions 130 executable by the at least one processor 110, the instructions 130 when executed by the at least one processor 110 performing at least a portion of the methods 200 and 800 as described below. Note that in this context, computing device 10 may be part of server 20 or may be separate from server 20. The specific structure of computing device 10 or server 20 may be described, for example, in connection with FIG. 10, as follows.
FIG. 2 illustrates a flow diagram of a method 200 for processing a form according to some embodiments of the invention. Method 200 may be performed, for example, by computing device 10 or server 20 in system 1 shown in fig. 1. The method 200 is described below in conjunction with fig. 1-9, taking as an example execution in the computing device 10.
As shown in fig. 2, method 200 includes step 210, where computing device 10 may perform text box detection on the warped image to obtain location data for a plurality of text boxes in the warped image.
In some implementations, computing device 10 may utilize various known or future developed OCR schemes to detect text boxes in an image. For example, an open source OCR project of google or Tencent, etc. may be utilized to identify text boxes in an image. Wherein the recognition result may include position data of each text box. In one example, the position data for each text box may include vertex coordinates (e.g., four two-dimensional arrays), center point coordinates (e.g., one two-dimensional array), height data (e.g., pixel height of the text box), width data (e.g., pixel width of the text box), and the like for the text box.
In other embodiments, computing device 10 may utilize various object detection models to detect text boxes in images. The target detection model refers to a machine learning model that detects a specific target object from an image. Depending on the actual application requirements, either a two-stage (2-stage) or a single-stage (1-stage) target detection model may be used.
At step 220, computing device 10 may perform table line detection on the warped image to obtain a plurality of table lines in the warped image.
Here, the table line detection of the distorted image may be performed by graying, binarizing, expanding, and corroding the input image to fill the holes in the table lines using a table line detection method in the related art, such as a tool in JAVA OPENCV, to obtain a horizontal line binary image and a vertical line binary image in the image, and combining the horizontal line binary image and the vertical line binary image to obtain the table line binary image.
Alternatively, the method for detecting the form lines of the distorted image may be a convolution neural network-based form line detection method specifically designed by the applicant and the inventor of the present application. Specifically, a plurality of predicted binary maps and a plurality of predicted gradient maps corresponding to a plurality of table line types of the target image may be generated by training a convolutional neural network model such as Resnet, Unet, or the like, and using the trained model. Wherein the plurality of form line types may include horizontal and visible, vertical and visible, horizontal and invisible, and vertical and invisible. Each of the plurality of predicted binary maps indicates a first predicted rectangular region of a table line corresponding to the corresponding table line type. Each of the plurality of predicted gradation maps indicates a gradation value between a table line corresponding to the corresponding table line type and a corresponding rectangular edge in a second predicted rectangular region of the table line. Then, a plurality of prediction table line binary maps corresponding to the plurality of table line types may be determined based on the plurality of prediction binary maps and the plurality of prediction gradient maps, and cell coordinates of the table in the target image may be determined via a connected component algorithm based on the plurality of prediction table line binary maps.
The table line detection method can determine the predicted table line binary image of the continuous line by comprehensively considering the predicted binary image and the predicted gradual change image, thereby avoiding the problem of identifying the broken line of the table line and accurately identifying the table line in the image.
Next, at step 230, computing device 10 may determine a first plurality of cells via a maximum match rank algorithm based on the location data of the plurality of text boxes detected in step 210.
Here, the maximum matching line algorithm refers to matching out a row or a column, which is the most paired text boxes in the same row or the same column, from the plurality of text boxes based on the positions of the plurality of text boxes of the warped image, and regarding each text box satisfying such a condition as a cell. The cells thus determined may be incomplete or inaccurate cells and are therefore also referred to herein as pseudo-cells or first cells to distinguish them from cells subsequently further determined. An exemplary method of determining the first cell by the maximum match row column algorithm is described in more detail below in conjunction with FIG. 3.
At step 240, computing device 10 may curve fit the plurality of first cells determined at step 230 based on the plurality of table lines obtained at step 220 to determine fitted curve segments and extended line data around each first cell.
In the present invention, two scorers and two memories may be provided for each first cell. The two scorers include a perimeter scorer for recording the perimeter line scores of the four perimeter lines of the first cell, and a text box scorer for recording the text box score of the first cell. The two memories include a peripheral line relation memory for recording fitted curve segments of peripheral curves fitted for the plurality of first cells around the first cells, and a peripheral extension line memory for recording the fitted curve segment of a preceding first cell as extension line data of the first cell in the case where the preceding first cell exists.
In step 240, computing device 10 may curve fit the plurality of first cells determined in step 230 according to the table line determined in step 220 to determine a peripheral curve for the plurality of first cells, and record, for each first cell, a fitted curve segment of the peripheral curve around the first cell, e.g., in a peripheral line relationship memory for the first cell.
In one example, computing device 10 may fit a plurality of first cells to determine a surrounding curve for the first cells according to the following fitting function:
Figure 654522DEST_PATH_IMAGE001
the order k may take different values depending on the fitting accuracy requirements. For example, in one simulation experiment, k =6 was chosen and the fitted surrounding curve is shown in fig. 7.
Furthermore, in step 240, the computing device 10 may also record, for each first cell, extension data of the first cell, the extension data referring to a fitted curve segment of a preceding first cell of the first cell, including horizontal extension data and vertical extension data, the extension data being recordable, for example, in a peripheral extension memory of the first cell.
At step 250, computing device 10 may perform connected-domain correction on the plurality of first cells based on the fitted curve segments and extended line data around each first cell determined at step 240 to determine the text content in the plurality of cells and each cell of the table.
Specifically, connected component correction refers to the computing device 10 determining the row connected component and the column connected component of each first unit cell according to the fitting curve segment around the first unit cell recorded in the peripheral line relation memory of the first unit cell or the extension line data recorded in the peripheral extension line memory of the first unit cell until the maximum row connected component and the maximum column connected component are found iteratively. Here, the computing device 10 may determine the peripheral line score of each first cell from the position of the first cell and the table line acquired in step 220, determine the text box score of the first cell from the positional relationship of the text box detected in step 210 with the first cell, thereby determining the total score of each first cell, and select a first cell having the highest total score from among the plurality of first cells as the target first cell to start the above-described connected component correction. In this way, the computing device 10 is able to find the maximum row and column connected domains more quickly.
Computing device 10 may then row-column correct the first plurality of cells with the maximum row and column connected fields to determine the exact cells of the table, and may determine the text content in each cell based on the location of the cell and the location of the text box. The method for determining the cells in step 250 will be described in detail below with reference to fig. 6.
Using the scheme of method 200, the method of processing tables described herein can complement the results of text box detection and table line detection and determine an accurate table by the maximum match row column algorithm and connected component correction algorithm, which is particularly useful for table identification and extraction in warped images.
FIG. 3 sets forth a flow chart illustrating an exemplary method 300 for determining a first plurality of cells via a maximum match rank algorithm according to embodiments of the present invention.
As shown in fig. 3, method 300 may include step 310, where computing device 10 may arrange the plurality of text boxes acquired in step 210 according to center point coordinates. As previously described, the position data of the text box may include a center point coordinate of each text box, which may be a two-dimensional array indicating a center point abscissa and a center point ordinate, respectively. In step 310, computing device 10 may initially arrange the text boxes according to the coordinates of the center point of the respective text boxes.
At step 320, computing device 10 may determine at least one horizontal pair combination and at least one vertical pair combination of the plurality of text boxes of the warped image, respectively. Wherein each horizontal pair combination comprises one or more horizontal pairs, each horizontal pair comprises one text box and its next horizontal text box, each vertical pair combination comprises a plurality of vertical pairs, and each vertical pair comprises one text box and its next vertical text box.
Here, a horizontal pair or a vertical pair whose positional relationship between two adjacent text boxes satisfies a predetermined condition is obtained by traversing a plurality of text boxes in a predetermined order (e.g., from left to right, from top to bottom), thereby obtaining one horizontal pair combination or vertical pair combination. For example, the predetermined conditions include for a horizontal pairing, a vertical offset between two text boxes being less than a first predetermined threshold and a center position distance being greater than a second predetermined threshold, and for a vertical pairing, a horizontal offset between two text boxes being less than a third predetermined threshold and a center position distance being greater than a fourth predetermined threshold, as described in more detail below. An exemplary method of determining horizontal and vertical pairings of multiple text boxes is described in more detail below in conjunction with fig. 4 and 5.
Further, in step 320, the predetermined condition may be updated (e.g., the first/third predetermined threshold and/or the second/fourth predetermined threshold is changed) and repeated traversal of the plurality of text boxes may be performed, each traversal may result in one horizontal pair combination and/or vertical pair combination, such that repeated traversal may result in one or more horizontal pair combinations and/or vertical pair combinations. Here, the number of repeated traversals may be designated in advance or determined according to the size of the original image.
Next, at step 330, computing device 10 may select the one of the at least one horizontal pair of combinations determined at step 320 that has the greatest number of horizontal pairs to determine the horizontal vector for the plurality of text boxes and select the one of the at least one vertical pair of combinations determined at step 320 that has the greatest number of vertical pairs to determine the vertical vector for the plurality of text boxes.
One pair in each pair combination includes two text boxes adjacent in the horizontal or vertical direction, and thus by connecting the text boxes included in the respective pairs in the pair combination with the largest number of pairs in sequence, the horizontal vector or the vertical vector of the text boxes can be determined.
At step 340, computing device 10 may determine a first direction and a second direction for the plurality of text boxes based on the horizontal vectors and the vertical vectors of the plurality of text boxes determined at step 330, wherein the second direction is perpendicular to the first direction. As described above, the horizontal vector or the vertical vector obtained in sub-step 233 may not be a smooth straight line vector, and thus in sub-step 234, two straight line directions may be further determined according to the horizontal vector and the vertical vector, respectively. Since the table may be a warped table, the two linear directions may not be absolutely perpendicular but substantially perpendicular.
At step 350, computing device 10 may determine a first plurality of cells based on the location data of the plurality of text boxes, the first orientation determined at step 340, and the second orientation.
Specifically, the first direction and the second direction may be considered as a warping direction of the table, and thus line segments of the first direction and the second direction may be constructed around each text box to generate a cell enclosing the text box. The cells so produced may be incomplete or inaccurate and are therefore also referred to as pseudo cells or first cells in the present invention.
Since the size and the position relation of the cells in the actual table are unknown, a plurality of horizontal or vertical pairing combinations are obtained by changing the first/third predetermined threshold and/or the second/fourth predetermined threshold, and the combination with the largest number of pairs selected from the plurality of horizontal or vertical pairing combinations is closest to the arrangement of the cells in the actual table, so that the cells in the table can be preliminarily determined by the positions of the text boxes.
As described above, in step 320, one or more horizontal pairs and one or more vertical pairs may be determined by traversing the plurality of text boxes after being arranged by the center point coordinates in step 310, thereby determining corresponding horizontal pair combinations and vertical pair combinations.
FIG. 4 illustrates a flow diagram of an exemplary method 400 for determining a horizontal pairing in a plurality of text boxes according to some embodiments of the invention.
As shown in fig. 4, at step 410, computing device 10 may determine, for a target text box of the plurality of text boxes, whether a top margin or a bottom margin of a next horizontal text box of the target text box from the target text box is less than a first predetermined threshold.
As described above, a plurality of text boxes are arranged in accordance with the center point coordinates. Thus, in step 410, each text box may be traversed from left to right, top to bottom, with its next text box in the horizontal direction (also referred to herein as the next horizontal text box) and its next text box in the vertical direction (also referred to herein as the next vertical text box) to determine whether the text box and its next horizontal text box or next vertical text box can form a horizontal pairing or a vertical pairing.
Specifically, for a text box (referred to as a target text box), it may be determined whether its next horizontal text box is above or below the target text box. For example, whether the next horizontal text box is above or below the target text box may be determined by comparing the center point ordinates of the target text box and its next horizontal text box. If the center point ordinate of the next horizontal text box is larger than the target text box (assuming the coordinate rule of the image is increasing from bottom to top), then the next horizontal text box is determined to be above the target text box, in which case the distance between the next horizontal text box and the upper boundary of the target text box, i.e., the upper margin, may be determined. Conversely, if the center point ordinate of the next horizontal text box is less than the target text box, then the next horizontal text box is determined to be down relative to the target text box, in which case the distance between the next horizontal text box and the lower boundary of the target text box, i.e., the lower margin, may be determined.
Computing device 10 may determine whether the upper margin or the lower margin, as determined above, is less than a first predetermined threshold. The first predetermined threshold may be set differently according to the image size, and a different first predetermined threshold is set each time a plurality of text boxes are traversed. In one example, the initial first predetermined threshold may be set to 12 pixels.
If it is determined at step 410 that the next horizontal text box of the target text box is less than the first predetermined threshold above or below the target text box, then at step 420, computing device 10 may continue to determine whether the next horizontal text box is greater than a second predetermined threshold from the center position of the target text box.
Specifically, as described above, the position data of the text box acquired in step 210 may include the center point coordinates of each text box, and thus the center position distance of the next horizontal text box and the target text box may be determined based on the center point coordinates of the two text boxes.
Computing device 10 may determine whether the center location distance determined above is greater than a second predetermined threshold. The second predetermined threshold may be set differently according to the image size, and a different second predetermined threshold is set each time a plurality of text boxes are traversed. In one example, the initial second predetermined threshold may be set to 5 pixels.
If it is determined at step 420 that the next horizontal text box is located farther from the center position of the target text box than the second predetermined threshold, then, at step 430, computing device 10 may organize the target text box and the next horizontal text box into a horizontal pair labeled as a first value. For example, a flag 2 may be set for a horizontal pair of two text boxes that satisfy the first predetermined threshold and the second predetermined threshold described above.
Conversely, if it is determined at step 410 that the next horizontal text box is more than or equal to the first predetermined threshold from the top or bottom of the target text box, or it is determined at step 420 that the next horizontal text box is less than or equal to the second predetermined threshold from the center position of the target text box, computing device 10 may organize the target text box and the next horizontal text box into a horizontal pair labeled as a second value at step 440. For example, for such a horizontal pairing of two text boxes, a flag of 1 may be set.
Note that the first value and the second value are set here only for distinguishing the association between the two text boxes constituting the horizontal pair, and in some other embodiments, only the two text boxes meeting the predetermined condition may be organized as the horizontal pair in step 430, while the two text boxes not meeting the predetermined condition are not paired in step 440.
Through the method, the adjacent text boxes can be paired differently according to the position relation between the two horizontally adjacent text boxes, so that all horizontal pairs can be obtained after all the text boxes are traversed, and a horizontal pair combination is generated.
FIG. 5 illustrates a flow diagram of an exemplary method 500 for determining a vertical pairing in a plurality of text boxes according to some embodiments of the invention.
As shown in fig. 5, at step 510, computing device 10 may determine, for a target text box of the plurality of text boxes, whether a left or right margin of a next vertical text box of the target text box from the target text box is less than a third predetermined threshold.
As described above, a plurality of text boxes are arranged in accordance with the center point coordinates. Thus, in step 510, each text box may be traversed from left to right, top to bottom, with its next text box in the horizontal direction (also referred to herein as the next horizontal text box) and its next text box in the vertical direction (also referred to herein as the next vertical text box) to determine whether the text box and its next horizontal text box or next vertical text box can form a horizontal pair or a vertical pair.
Specifically, for one text box (referred to as the target text box), it may be determined whether its next vertical text box is to the left or right relative to the target text box. For example, it may be determined whether the next vertical text box is to the left or to the right relative to the target text box by comparing the center point abscissas of the target text box and its next vertical text box. If the center point abscissa of the next vertical text box is larger than the target text box (assuming that the coordinate rule of the image is increasing from left to right), the next vertical text box is determined to be right relative to the target text box, in which case the distance between the next vertical text box and the right boundary of the target text box, i.e., the right margin, may be determined. Conversely, if the center abscissa of the next vertical text box is less than the target text box, the next vertical text box is determined to be to the left relative to the target text box, in which case the distance between the next vertical text box and the left boundary of the target text box, i.e., the left margin, may be determined.
Computing device 10 may determine whether the left or right margin determined above is less than a third predetermined threshold. The third predetermined threshold may be set differently according to the image size, and a different third predetermined threshold is set each time a plurality of text boxes are traversed. In one example, the initial third predetermined threshold may be set to 12 pixels.
If it is determined at step 510 that the distance to the left or right of the next vertical text box of the target text box is less than the third predetermined threshold, then at step 520 computing device 10 may continue to determine whether the distance to the center position of the next vertical text box from the target text box is greater than a fourth predetermined threshold.
Specifically, as described above, the position data of the text box acquired in step 210 may include the center point coordinates of each text box, and thus the center position distance of the next vertical text box and the target text box may be determined based on the center point coordinates of the two text boxes.
Computing device 10 may determine whether the center location distance determined above is greater than a fourth predetermined threshold. The fourth predetermined threshold may be set differently according to the image size, and a different fourth predetermined threshold is set each time a plurality of text boxes are traversed. In one example, the initial fourth predetermined threshold may be set to 5 pixels.
If it is determined at step 520 that the next vertical text box is located farther from the center of the target text box than the fourth predetermined threshold, then, at step 530, computing device 10 may organize the target text box and the next vertical text box into a vertical pair labeled as the first value. For example, a flag 2 may be set for a vertical pair of two text boxes that satisfy the above-described third and fourth predetermined thresholds.
Conversely, if it is determined at step 510 that the distance to the left or right of the next vertical text box from the target text box is greater than or equal to the third predetermined threshold, or it is determined at step 520 that the distance to the center position of the next vertical text box from the target text box is less than or equal to the fourth predetermined threshold, at step 540, computing device 10 may organize the target text box and the next vertical text box into a vertical pair labeled as the second value. For example, for such a vertical pairing of two text boxes, a flag of 1 may be set.
Similarly, the first value and the second value are set here only for distinguishing the association between the two text boxes constituting the vertical pair, and in some other embodiments, only the two text boxes meeting the predetermined condition may be organized as a vertical pair in step 530, while the two text boxes not meeting the predetermined condition are not paired in step 540.
Through the method, the adjacent text boxes can be paired differently according to the position relation between the two vertically adjacent text boxes, so that all vertical pairings can be obtained after all the text boxes are traversed, and a vertical pairing combination is generated.
FIG. 6 sets forth a flow chart illustrating an exemplary method 600 of connected component correction of a first cell to determine a cell of a table according to embodiments of the present invention.
As previously described, computing device 10 may select the first cell from the plurality of first cells that scores the highest to determine the row and column connected components of the first cell and correct the plurality of first cells based on the row and column connected components.
Specifically, at step 610, for each first cell in the plurality of first cells, computing device 10 may determine a peripheral line score for four peripheral lines of the first cell. Specifically, computing device 10 may determine whether each first cell has a corresponding table line (referred to herein as a peripheral line of the first cell) within a predetermined range of four directions (four weeks) based on the result of the table line detection of step 220. If it is determined that a direction of a first cell has a corresponding table line within a predetermined range of the direction, the direction of the first cell may be determined to have a table line and scores of surrounding lines of the direction may be labeled as a first score (e.g., 2), otherwise scores of surrounding lines of the direction may be labeled as a second score (e.g., 0).
Thus, for each first cell, four peripheral line scores of the first cell in four directions can be determined according to the result of table line detection, which can be represented as a four-dimensional array, and the value of each element in the array is the first score or the second score.
At step 620, computing device 10 may determine a text box score for the first cell based on the location data of the text box.
In particular, computing device 10 may determine whether the center position coordinate of the text box falls within the first cell, and may set the text box score of the first cell to a third score (e.g., 2) if it falls within the first cell, and to a fourth score (e.g., 0) otherwise.
At step 630, computing device 10 may determine a total score for the first cell based on the peripheral line score of the first cell determined at step 610 and the text box score of the first cell determined at step 620. In some embodiments, the peripheral line score and the text box score of the first cell may be added to yield a total score for the first cell. In other embodiments, the peripheral line score and the text box score of the first cell may be weighted differently and summed to obtain the total score for the first cell.
Steps 610 through 630 may be repeated to obtain the total score for all first cells. In one embodiment, the total score may be calculated for each first cell in the order of the first cell's location and the table line intersection.
Specifically, computing device 10 may determine a plurality of primary intersection points based on the plurality of table lines determined in step 220. As previously mentioned, the various types of table lines may be determined by way of a binary map, and these table lines may be divided into table lines in a first direction (e.g., substantially transverse) and a second direction (e.g., substantially longitudinal). The primary intersection of the table lines can be determined from their orientation.
Computing device 10 may determine a first cell of the first cells based on the primary intersections and the positional relationships of the first cells. Here, the first cell means a cell at the top row and the top column among the first cells.
Computing device 10 may then determine the total score for each of the plurality of first cells in turn, starting with the first determined cell, as described above in steps 610 through 630.
At step 640, computing device 10 may select a first cell from the plurality of first cells having the highest total score for the connected component determination, also referred to herein as the target first cell.
In some cases, there may be more than one first cell of the plurality of first cells having the highest total score.
In this case, in step 640, computing device 10 may determine whether there are at least two first cells with the highest overall score among the plurality of first cells, and if it is determined that there are at least two first cells with the highest overall score, determine the target first cell based on location data of the at least two first cells with the highest overall score. For example, the upper left-most cell of the at least two first cells may be selected as the target first cell.
At step 650, computing device 10 may determine the row and column connected domains of the target first cell based on the fitted curve segments and extended line data around the target first cell.
At step 660, computing device 10 may correct the plurality of first cells based on the row and column connected fields of the target first cell to determine a plurality of cells of the table.
Specifically, computing device 10 may determine whether to row communicate or column communicate the target first cell with its previous first cell in either its horizontal or its vertical direction based on the table line around the target first cell and the relationship between the fitted curve segment and the extended line data.
FIG. 8 illustrates a flow diagram of a method 800 for determining a row connected domain of a target first cell in accordance with an embodiment of the present invention.
As shown in FIG. 8, at step 810, computing device 10 may determine a pixel abscissa x [ m ] and a pixel ordinate y [ m ] of the pixels of the plurality of form lines determined at step 220 at the region of the target first cell.
At step 820, computing device 10 may determine a fitted ordinate y [ m' ] of the pixels based on the pixel abscissas of the pixels and the fitted curve segment or horizontally extending line data of the target first cell.
Specifically, when the target first cell has a fitted curve segment recorded around it in the peripheral line relationship memory, the computing device 10 may determine a fitted ordinate y [ m' ] corresponding to the pixel abscissa x [ m ] based on the pixel abscissa of each pixel and the fitted curve segment of the target first cell. On the other hand, if the target first cell does not have a fitted curve segment recorded around it, the computing device 10 may determine whether extension data is recorded in its surrounding extension memory. If extension line data is recorded, the computing device 10 may derive a fitting ordinate y [ m' ] corresponding to the pixel abscissa x [ m ] from horizontal extension line data in the extension line data.
At step 830, computing device 10 may determine whether the average difference between the fitted ordinate y [ m' ] and the pixel ordinate y [ m ] of the pixels is less than a predetermined threshold.
If it is determined at step 830 that the average difference between the fitted ordinate y [ m' ] and the pixel ordinate y [ m ] of the pixels is less than the predetermined threshold, computing device 10 may row-communicate the target first cell with its horizontally previous first cell at step 840.
The method 800 described above is repeated in row-wise directions to continue traversing additional first cells following the target first cell until the target first cell is determined to have no fitted curve segments or horizontally extending line data at step 820 or the average difference between the fitted ordinate y [ m' ] and the pixel ordinate y [ m ] is greater than or equal to the predetermined threshold at step 830. All first cells that are row connected together at this time constitute the row connected field of the target first cell.
FIG. 9 illustrates a flow diagram of a method 900 for determining a column connected domain of a target first cell in accordance with an embodiment of the present invention.
As shown in FIG. 9, at step 910, computing device 10 may determine a pixel abscissa x [ m ] and a pixel ordinate y [ m ] of the pixels of the plurality of form lines determined at step 220 at the region of the target first cell.
At step 920, computing device 10 may determine a fitted abscissa x [ m' ] of the pixels based on the pixel ordinate of the pixels and the fitted curve segment or vertically extended line data of the target first cell.
Specifically, when the target first cell has a fitted curve segment recorded around it in the peripheral line relation memory, the computing device 10 may determine a fitted ordinate x [ m' ] corresponding to the pixel ordinate y [ m ] based on the pixel ordinate of each pixel and the fitted curve segment of the target first cell. On the other hand, if the target first cell does not have a fitted curve segment recorded around it, the computing device 10 may determine whether extension data is recorded in its surrounding extension memory. If extension line data is recorded, the computing device 10 may derive a fitting abscissa x [ m' ] corresponding to the pixel ordinate y [ m ] from perpendicular extension line data in the extension line data.
At step 930, computing device 10 may determine whether the average difference between the fitted abscissa x [ m' ] and the pixel abscissa x [ m ] of the pixels is less than a predetermined threshold.
If it is determined at step 930 that the average difference between the fitted abscissa x [ m' ] and the pixel abscissa x [ m ] of the pixels is less than the predetermined threshold, computing device 10 may column-communicate the target first cell with its vertically preceding first cell at step 940.
The method 900 is repeated in the column direction to continue traversing additional first cells following the target first cell until it is determined at step 920 that there is no fitted curve segment or vertically extended line data for the target first cell or it is determined at step 930 that the average difference between the fitted abscissa x [ m' ] and the pixel abscissa x [ m ] is greater than or equal to the predetermined threshold. All first cells whose columns are now connected together constitute the column connected domain for the target first cell.
After determining the row and column connected domains, computing device 10 may determine, at step 660, each first cell in the row and column connected domains as a cell of the table.
In some cases, there may be one or more of the row and column connected domains as determined above. In this case, the method 600 may further include determining a maximum link row and a maximum link column based on the row and column connected fields and determining each first cell in the maximum link row and the maximum link column as a cell of the table.
Furthermore, each cell determined according to the method 600 described above may not have a one-to-one correspondence with the text box determined in step 210, but may contain more text boxes.
In this regard, in some embodiments, the method 600 may also determine whether each cell contains one text box or at least two text boxes. For example, whether each cell contains one or at least two text boxes may be determined based on the location information of each cell determined at step 650 and the location data of the text boxes determined at step 210.
If it is determined that the cell contains a text box, computing device 10 may treat the text content in the text box as the text content in the cell; if it is determined that the cell contains at least two text boxes, computing device 10 may merge the text content in the at least two text boxes as the text content in the cell.
FIG. 10 illustrates a block diagram of a computing device 1000 suitable for implementing embodiments of the present invention. Computing device 1000 may be, for example, computing device 10 or server 20 as described above.
As shown in fig. 10, computing device 1000 may include one or more Central Processing Units (CPUs) 1010 (only one shown schematically) that may perform various suitable actions and processes in accordance with computer program instructions stored in Read Only Memory (ROM) 1020 or loaded from storage unit 1080 into Random Access Memory (RAM) 1030. In the RAM 1030, various programs and data required for the operation of the computing device 1000 may also be stored. The CPU 1010, ROM 1020, and RAM 1030 are connected to each other via a bus 1040. An input/output (I/O) interface 1050 is also connected to bus 1040.
A number of components in computing device 1000 are connected to I/O interface 1050, including: an input unit 1060 such as a keyboard, a mouse, or the like; an output unit 1070 such as various types of displays, speakers, and the like; a storage unit 1080, such as a magnetic disk, optical disk, or the like; and a communication unit 1090 such as a network card, modem, wireless communication transceiver, or the like. A communication unit 1090 allows computing device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The methods 200, 600, and 800, 900 described above may be performed, for example, by the CPU 1010 of a computing device 1000, such as computing device 10 or server 20. For example, in some embodiments, methods 200-600 and 800-900 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1080. In some embodiments, part or all of the computer program may be loaded and/or installed onto computing device 1000 via ROM 1020 and/or communication unit 1090. When the computer program is loaded into RAM 1030 and executed by CPU 1010, one or more of the operations of methods 200 and 800, 900 described above may be performed. Further, the communication unit 1090 may support wired or wireless communication functions.
Those skilled in the art will appreciate that the computing device 1000 illustrated in FIG. 10 is merely illustrative. In some embodiments, computing device 10 or server 20 may contain more or fewer components than computing device 1000.
The methods 200, 600, and 800 for processing forms 900 and the computing device 1000 that may be used as the computing device 10 or the server 20 according to the present invention are described above in conjunction with the drawings. However, it will be understood by those skilled in the art that the steps of the methods 200-600-800-900 and their sub-steps are not limited to the order shown in the figures and described above, but may be performed in any other reasonable order. Further, the computing device 1000 also need not include all of the components shown in FIG. 10, it may include only some of the components necessary to perform the functions described in the present invention, and the manner in which these components are connected is not limited to the form shown in the figures.
The present invention may be methods, apparatus, systems and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therein for carrying out aspects of the present invention.
In one or more exemplary designs, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. For example, if implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The units of the apparatus disclosed herein may be implemented using discrete hardware components, or may be integrally implemented on a single hardware component, such as a processor. For example, the various illustrative logical blocks, modules, and circuits described in connection with the invention may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both.
The previous description of the invention is provided to enable any person skilled in the art to make or use the invention. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the present invention is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A method of processing a form, comprising:
performing text box detection on the distorted image to acquire position data of a plurality of text boxes in the distorted image;
performing table line detection on the distorted image to acquire a plurality of table lines in the distorted image;
determining a plurality of first cells through a maximum matching row and column algorithm based on the position data of the plurality of text boxes;
curve fitting the plurality of first cells based on the plurality of table lines to determine fitted curve segments and extended line data around each first cell; and
performing connected component correction on the plurality of first cells based on the fitted curve segments and extended line data around each first cell to determine a plurality of cells of the table and text content in each cell,
wherein the location data for the plurality of text boxes comprises vertex coordinates, center point coordinates, width data, and height data for each text box, and wherein determining the first plurality of cells by the maximum match row column algorithm comprises:
arranging the plurality of text boxes according to the coordinates of the central point;
determining at least one horizontal pairing combination and at least one vertical pairing combination of the plurality of text boxes respectively, wherein each horizontal pairing combination comprises one or more horizontal pairings, each horizontal pairing comprises one text box and a next horizontal text box, each vertical pairing combination comprises one or more vertical pairings, and each vertical pairing comprises one text box and a next vertical text box;
selecting one horizontal pairing combination with the largest number of horizontal pairings from the at least one horizontal pairing combination to determine horizontal vectors of the plurality of text boxes, and selecting one vertical pairing combination with the largest number of vertical pairings from the at least one vertical pairing combination to determine vertical vectors of the plurality of text boxes;
determining a first direction and a second direction of the plurality of text boxes based on horizontal vectors and vertical vectors of the plurality of text boxes, wherein the second direction is perpendicular to the first direction; and
determining the first plurality of cells based on the position data of the plurality of text boxes, the first direction, and the second direction.
2. The method of claim 1, wherein determining at least one horizontal pairing combination and at least one vertical pairing combination of the plurality of text boxes, respectively, comprises:
for a target text box of the plurality of text boxes, determining whether a top margin or a bottom margin of a next horizontal text box of the target text box from the target text box is less than a first predetermined threshold;
if the upper margin or the lower margin of the next horizontal text box of the target text box is determined to be smaller than the first preset threshold value, determining whether the distance between the center position of the next horizontal text box and the target text box is larger than a second preset threshold value;
if it is determined that the next horizontal text box is farther from the target text box by the second predetermined threshold distance, organizing the target text box and the next horizontal text box into a horizontal pair labeled as a first value; and
organizing the target text box and the next horizontal text box into a horizontal pair labeled as a second value if it is determined that the top margin or the bottom margin of the next horizontal text box from the target text box is greater than or equal to the first predetermined threshold or that the distance between the center positions of the next horizontal text box and the target text box is less than or equal to the second predetermined threshold.
3. The method of claim 1, wherein determining at least one horizontal pairing combination and at least one vertical pairing combination of the plurality of text boxes, respectively, comprises:
for a target text box of the plurality of text boxes, determining whether a left or right margin of a next vertical text box of the target text box from the target text box is less than a first predetermined threshold;
if the left distance or the right distance between the next vertical text box of the target text box and the target text box is determined to be smaller than the first preset threshold value, determining whether the distance between the center position of the next vertical text box and the target text box is larger than a second preset threshold value;
if it is determined that the next vertical text box is farther from the target text box by the distance greater than the second predetermined threshold, organizing the target text box and the next vertical text box into a vertical pair labeled as a first value; and
organizing the target text box and the next vertical text box into a vertical pair labeled as a second value if it is determined that the left or right distance between the next vertical text box and the target text box is greater than or equal to the first predetermined threshold or that the center position distance between the next vertical text box and the target text box is less than or equal to the second predetermined threshold.
4. The method of claim 1, wherein determining fitted curve segments and extended line data around each first cell comprises:
fitting a peripheral curve of the plurality of first cells based on the table lines around the plurality of first cells and recording, for each first cell, a fitted curve segment of the peripheral curve around the first cell; and
determining a fitted curve segment of a previous first cell of the first cells as extended line data of the first cells based on surrounding curves of the plurality of first cells.
5. The method of claim 1, wherein determining the plurality of cells of the table and the text content in each cell comprises:
for each first cell of the plurality of first cells, determining a perimeter line score for four perimeter lines of the first cell;
determining a text box score for the first cell based on the location data for the text box;
determining a total score for the first cell based on the first cell's peripheral line score and text box score;
selecting a first cell with the highest total score from the plurality of first cells as a target first cell;
determining row and column connected domains of the target first cell based on the fitted curve segment and the extended line data around the target first cell; and
correcting the plurality of first cells based on the row and column connected fields of the target first cell to determine a plurality of cells of the table.
6. The method of claim 5, wherein determining the total score value for the first cell based on the first cell's peripheral line score and text box score comprises:
determining a plurality of primary intersection points based on the plurality of table lines;
determining a first cell of the plurality of first cells based on the positional relationship of the plurality of primary intersection points and the plurality of first cells; and
determining a total score for each of the first cells in the plurality of first cells in order, starting with the first cell.
7. The method of claim 5, wherein selecting a first cell from the plurality of first cells with a highest overall score as a target first cell comprises:
determining whether there are at least two first cells with the highest total score among the plurality of first cells; and
in response to determining that there are at least two first cells with the highest total score among the plurality of first cells, determining the target first cell based on location data of the at least two first cells with the highest total score.
8. The method of claim 5, wherein determining the row connected domain of the target first cell comprises:
determining a pixel abscissa and a pixel ordinate of pixels of the plurality of table lines in the region of the target first cell;
determining a fitted ordinate of the pixel based on the pixel abscissa and fitted curve segment or horizontally extended line data of the target first cell;
determining whether an average difference between the fitted ordinate and the pixel ordinate is less than a predetermined threshold; and
in response to determining that the average difference between the fitted ordinate and the pixel ordinate is less than the predetermined threshold, row-communicating the target first cell with its horizontally-oriented previous first cell.
9. The method of claim 5, wherein determining the column connected domain of the target first cell comprises:
determining a pixel abscissa and a pixel ordinate of pixels of the plurality of table lines in the region of the target first cell;
determining a fitted abscissa of the pixel based on the pixel ordinate and fitted curve segment or vertically extended line data of the target first cell;
determining whether an average difference between the fitted abscissa and the pixel abscissa is less than a predetermined threshold; and
in response to determining that the average difference between the fitted abscissa and the pixel abscissa is less than the predetermined threshold, column-communicating the target first cell with a previous first cell in a vertical direction thereof.
10. The method of claim 5, wherein determining the plurality of cells and the text content in each cell of the table further comprises:
determining whether the cell contains one text box or at least two text boxes;
in response to determining that the cell contains a text box, taking the text content in the text box as the text content in the cell; and
in response to determining that the cell contains at least two text boxes, merging text content in the at least two text boxes as text content in the cell.
11. A computing device, comprising:
at least one processor; and
at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform the steps of the method of any of claims 1-10.
12. A computer readable storage medium having stored thereon computer program code which, when executed, performs the method of any of claims 1 to 10.
CN202110616829.4A 2021-06-03 2021-06-03 Method of processing table, computing device, and computer-readable storage medium Active CN113065536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110616829.4A CN113065536B (en) 2021-06-03 2021-06-03 Method of processing table, computing device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110616829.4A CN113065536B (en) 2021-06-03 2021-06-03 Method of processing table, computing device, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN113065536A CN113065536A (en) 2021-07-02
CN113065536B true CN113065536B (en) 2021-09-14

Family

ID=76568571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110616829.4A Active CN113065536B (en) 2021-06-03 2021-06-03 Method of processing table, computing device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113065536B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4125066B1 (en) * 2021-07-26 2023-12-06 Tata Consultancy Services Limited Method and system for table structure recognition via deep spatial association of words
CN113836878A (en) * 2021-09-02 2021-12-24 北京来也网络科技有限公司 Table generation method and device combining RPA and AI, electronic equipment and storage medium
CN114639107B (en) * 2022-04-21 2023-03-24 北京百度网讯科技有限公司 Table image processing method, apparatus and storage medium
CN115082939B (en) * 2022-05-12 2023-11-17 吉林省吉林祥云信息技术有限公司 System and method for correcting distortion table in image based on arc differentiation
CN116092105B (en) * 2023-04-07 2023-06-16 北京中关村科金技术有限公司 Method and device for analyzing table structure

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933756A (en) * 2019-03-22 2019-06-25 腾讯科技(深圳)有限公司 Image based on OCR turns shelves method, apparatus, equipment and readable storage medium storing program for executing
CN111523541A (en) * 2020-04-21 2020-08-11 上海云从汇临人工智能科技有限公司 Data generation method, system, equipment and medium based on OCR
CN111709339A (en) * 2020-06-09 2020-09-25 北京百度网讯科技有限公司 Bill image recognition method, device, equipment and storage medium
CN112364834A (en) * 2020-12-07 2021-02-12 上海叠念信息科技有限公司 Form identification restoration method based on deep learning and image processing
CN112528863A (en) * 2020-12-14 2021-03-19 中国平安人寿保险股份有限公司 Identification method and device of table structure, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577817B (en) * 2012-07-24 2017-03-01 阿里巴巴集团控股有限公司 Form recognition method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933756A (en) * 2019-03-22 2019-06-25 腾讯科技(深圳)有限公司 Image based on OCR turns shelves method, apparatus, equipment and readable storage medium storing program for executing
CN111523541A (en) * 2020-04-21 2020-08-11 上海云从汇临人工智能科技有限公司 Data generation method, system, equipment and medium based on OCR
CN111709339A (en) * 2020-06-09 2020-09-25 北京百度网讯科技有限公司 Bill image recognition method, device, equipment and storage medium
CN112364834A (en) * 2020-12-07 2021-02-12 上海叠念信息科技有限公司 Form identification restoration method based on deep learning and image processing
CN112528863A (en) * 2020-12-14 2021-03-19 中国平安人寿保险股份有限公司 Identification method and device of table structure, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113065536A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN113065536B (en) Method of processing table, computing device, and computer-readable storage medium
CN109117848B (en) Text line character recognition method, device, medium and electronic equipment
US10762376B2 (en) Method and apparatus for detecting text
US10896349B2 (en) Text detection method and apparatus, and storage medium
CN105868758B (en) method and device for detecting text area in image and electronic equipment
KR102435365B1 (en) Certificate recognition method and apparatus, electronic device, computer readable storage medium
Samra et al. Localization of license plate number using dynamic image processing techniques and genetic algorithms
CN111062365B (en) Method, apparatus, chip circuit and computer readable storage medium for recognizing mixed typeset text
US20140023278A1 (en) Feature Extraction And Use With A Probability Density Function (PDF) Divergence Metric
KR20210125955A (en) Information processing methods, information processing devices, electronic equipment and storage media
US9280725B2 (en) Information processing apparatus, information processing method, and non-transitory computer readable medium
CN108182457B (en) Method and apparatus for generating information
CN111680678A (en) Target area identification method, device, equipment and readable storage medium
CN110942473A (en) Moving target tracking detection method based on characteristic point gridding matching
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN111104941B (en) Image direction correction method and device and electronic equipment
CN108960247B (en) Image significance detection method and device and electronic equipment
CN111626250A (en) Line dividing method and device for text image, computer equipment and readable storage medium
CN115082935A (en) Method, apparatus and storage medium for correcting document image
CN113936288A (en) Inclined text direction classification method and device, terminal equipment and readable storage medium
CN108734164A (en) Card, identification card method, paint this reading machine people and storage device
CN111325194B (en) Character recognition method, device and equipment and storage medium
CN108764344A (en) A kind of method, apparatus and storage device based on limb recognition card
CN115937875A (en) Text recognition method and device, storage medium and terminal
CN112287763A (en) Image processing method, apparatus, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220401

Address after: Room 603, 6 / F, building 9, Guanghua Road, Chaoyang District, Beijing 100020

Patentee after: BEIJING ALLIN TECHNOLOGY CO.,LTD.

Address before: 100020 room 702, 7 / F, building 9, Guanghua Road, Chaoyang District, Beijing

Patentee before: Beijing ouying Information Technology Co.,Ltd.