CN117475459A

CN117475459A - Table information processing method and device, electronic equipment and storage medium

Info

Publication number: CN117475459A
Application number: CN202311824952.0A
Authority: CN
Inventors: 李杨; 于业达; 刘奕晨
Original assignee: Shanghai Hengsheng Juyuan Data Service Co ltd; Hangzhou Hengsheng Juyuan Information Technology Co ltd
Current assignee: Shanghai Hengsheng Juyuan Data Service Co ltd; Hangzhou Hengsheng Juyuan Information Technology Co ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-01-30
Anticipated expiration: 2043-12-28
Also published as: CN117475459B

Abstract

The application provides a table information processing method, a table information processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: detecting a picture to be processed to obtain a table area of the picture to be processed and a first table line set of the table area, performing text detection on the table area to obtain a plurality of texts in the table area and text detection boxes of the texts, generating a second table line set of the table area according to the text detection boxes and the first table line set, removing the first table line set and the second table line set according to the confidence degree of each table grid line in the first table line set and the second table line set to obtain a target table grid line set of the table area, and reconstructing the target table based on the target table grid line set and the texts in the table area. More accurate table line description can be obtained, and the accuracy of table restoration is improved.

Description

Table information processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for processing table information, an electronic device, and a storage medium.

Background

Documents are the most important information carriers in the financial field, but a large number of documents are of non-editable PDF (Portable Document Format, portable file format) type or picture type, wherein various forms hold key information, and how to extract and structurally reproduce the form information is a key problem.

In the prior art, the whole idea of table restoration is to detect a table first to obtain a table line in a cell picture, then to perform semantic segmentation in a detected table area, and to perform structural reconstruction of the table.

However, when performing table restoration based on the prior art, obvious table lines are lacking as references in some scenes, so it is difficult to accurately determine table ruled lines in cells. And the accuracy of table restoration is low depending on the quality of the input cell pictures.

Disclosure of Invention

The present invention aims to solve the problems in the prior art that it is difficult to accurately determine table lines and the accuracy of table restoration is low.

In order to achieve the above purpose, the technical scheme adopted in the application is as follows:

In a first aspect, the present application provides a table information processing method, where the method includes:

performing table detection on a picture to be processed to obtain a table area of the picture to be processed and a first table line set of the table area;

performing text detection on the table area to obtain a plurality of texts in the table area and text detection boxes of the texts, and generating a second table line set of the table area according to the text detection boxes and the first table line set;

performing rejection processing on the first table line set and the second table line set according to the confidence of each table grid line in the first table line set and the second table line set to obtain a target table line set of the table area;

and reconstructing the target table based on the target table line set and a plurality of texts in the table area to obtain a target table.

In a second aspect, the present application provides a form information processing apparatus, the apparatus including:

the table detection module is used for carrying out table detection on the picture to be processed to obtain a table area of the picture to be processed and a first table line set of the table area;

The text detection module is used for carrying out text detection on the table area to obtain a plurality of texts in the table area and text detection boxes of the texts, and generating a second table line set of the table area according to the text detection boxes and the first table line set;

the rejecting module is used for rejecting the first table line set and the second table line set according to the confidence degree of each table line in the first table line set and the second table line set to obtain a target table line set of the table area;

and the reconstruction module is used for reconstructing and obtaining a target table based on the target table line set and a plurality of texts in the table area.

In a third aspect, the present application provides an electronic device, including: a processor, a storage medium, and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of a tabular information processing method as described above.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a table information processing method as described above.

The beneficial effects of this application are: by performing table detection and text detection on the picture to be processed, as many initial prediction table lines as possible can be obtained, so that the generated table lines are denser and can cover all actually existing table lines. On the basis, more accurate table line description can be obtained by eliminating the table lines based on the confidence of the table lines. Aiming at the conditions of table line discontinuity, table line disappearance and cell merging in the table, the method can improve the accuracy of table line restoration.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flowchart of a table information processing method provided in an embodiment of the present application;

fig. 2 shows an exemplary diagram of a to-be-processed picture according to an embodiment of the present application;

FIG. 3 shows a tabular exemplary diagram including text detection boxes provided by embodiments of the present application;

FIG. 4 is a diagram of an example of a text-detected form provided in an embodiment of the present application;

FIG. 5 illustrates a flowchart for obtaining a first set of table lines provided by an embodiment of the present application;

FIG. 6 illustrates an exemplary diagram of a grid line structure provided by an embodiment of the present application;

FIG. 7 illustrates a flowchart for obtaining a second set of table lines provided by an embodiment of the present application;

FIG. 8 shows a flowchart for performing extension completion according to an embodiment of the present application;

FIG. 9 illustrates a flowchart for obtaining a target set of table lines provided by an embodiment of the present application;

FIG. 10 illustrates a flow chart for determining a second confidence level provided by an embodiment of the present application;

FIG. 11 illustrates a flow chart for determining confidence in grid lines provided by an embodiment of the present application;

fig. 12 is a schematic diagram showing a configuration of a table information processing apparatus provided in an embodiment of the present application;

fig. 13 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the present application are only for the purpose of illustration and description, and are not intended to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this application, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.

In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the presence of the features stated hereinafter, but not to exclude the addition of other features.

The existing table restoration method is to detect the table of the picture containing the table first and then to reconstruct the structure of the table. In this process, the methods employed in the prior art include the following three types:

the first is table detection based on manually designed rules. Alignment and fitting of form lines are performed by using the positions and areas of text blocks and characters, or the internal structure and the external boundary of the form are judged by the intersection points of the form lines. Finally, whether the position of a certain block of document is a table or not and the internal cell division probability are given. However, this method has a problem of poor flexibility in detecting a table including merging cells.

The second way is to use machine learning to construct the manual features of the table for table detection. Features are constructed by using node information of the table, vertical distances and angles between lines, etc., and then the information is passed to classifiers such as SVMs (Support Vector Machine, support vector machines), decision trees, etc. Finally, the probability of the table position and the cell division is predicted. However, this approach depends on the quality of the input picture, and it is difficult to accurately determine the table ruled lines in the cells for the case where there is a lack of obvious table ruled lines.

The third way is to use a deep learning image detection and segmentation method for table detection. And (3) carrying out table target detection through a CNN network (Convolutional Neural Networks, convolutional neural network) or a GNN network (Graph Neural Networks, graph neural network) to obtain a table region, and carrying out semantic segmentation of cell pixels in the detected table region to realize cell division in the table region. However, this method also depends on the existing table lines in the picture and the quality of the picture when detecting the table region, the noise and definition of the picture itself, the thickness and integrity of the table lines, and even the background color of the table may affect the results of table detection and cell division, so this method still has the problems of inaccurate table division and poor accuracy of table restoration.

Based on the above, the application provides a form information processing mode, on the basis of detecting a form area, all form lines possibly existing in a form are restored by combining a form line detection result of a text area, and then the form lines are removed based on the confidence level of the form lines, so that accurate form line description is obtained. And the table line detection and the table structure restoration of the table are realized through reconstructing the table twice, so that the accuracy of table division and table restoration is improved.

Next, a table information processing method of the present application will be described with reference to fig. 1, and an execution subject of the method may be an electronic device, as shown in fig. 1, and the method includes:

s101, carrying out table detection on a picture to be processed to obtain a table area of the picture to be processed and a first table line set of the table area.

Alternatively, the picture to be processed may be a picture containing a table or a PDF file containing a table. The to-be-processed picture comprises at least one table, one table can be detected as a table area, and after the to-be-processed picture is detected, the table area of each table in the to-be-processed picture and the first table line set of each table area can be obtained.

Wherein the first set of table lines may characterize the overall structure of the table, and the first set of table lines may be a set of line segments detected in the table region that make up the table. In storing the line segments that make up the table, the minimum granularity of line segment division may be such that the line segments do not contain any intersection points other than the starting point.

For example, assuming that the picture to be processed is shown in fig. 2, after the detection of fig. 2, the resulting table area may be shown in table 1.

Table 1 example of table after first table inspection

When the picture to be processed is detected, the form area detection can be performed by adopting a mode of combining pixel segmentation with target detection. As an example, a pixel segmentation manner such as U-net (a CNN-based image segmentation network), deepLab, PSPNet (Pyramid Scene ParsingNetwork, pyramid scene analysis network) and the like may be adopted, and a table region detection is performed by combining a target detection algorithm such as a master-RCNN (Faster Regions with CNN features, object detection is performed by using deep learning), YOLO (You Only Look Once, real-time target detection algorithm) and the like, so as to obtain a table region of a picture to be processed, and a first table line set of the table region.

S102, performing text detection on the table area to obtain a plurality of texts in the table area and text detection boxes of the texts, and generating a second table line set of the table area according to the text detection boxes and the first table line set.

Referring to table 1, in table 1 obtained after the detection of fig. 2, where the table ruled lines disappear or are discontinuous in fig. 2, no table ruled lines are detected in table 1, for example, no table ruled lines are detected between columns in the "2016" and "2017" under the "sales" field. Therefore, if the grid lines are detected only by the step S101, it is difficult to obtain an accurate detection result when the grid lines disappear, the grid lines are discontinuous, or there is cell combination.

According to the method and the device for detecting the text of the grid lines, text detection is further carried out on the table area, and a second table line set is obtained according to the text detection box and the first table line set, so that the detection result of the grid lines is more accurate.

When the text detection is performed on the form area, the text detection can be realized based on a target detection algorithm. As an example, the target detection algorithm may be YOLO, eat (An Efficient and Accurate Scene Text Detector, method for scene text detection), CTC (Connectionist Temporal Classification, deep learning algorithm for sequence classification problem), or the like.

After text detection, the resulting table may be as shown in fig. 3. Referring to fig. 3, a text detection box is generated for the text of each cell in the form area. The second set of table lines of the table area can be obtained by clustering the text detection boxes and combining the first set of table lines.

The second table line set comprises line segments of a table area obtained after text detection.

For example, after the step S102 is performed on table 1, the obtained table lines may be as shown in fig. 4, and the dotted line 1 represents the table lines generated according to the text detection box and the first table line set after the text detection. Referring to fig. 4, by clustering the edges of the text detection boxes of columns "2016", "2017", and "2018", a broken line 1 between "2016" and "2017", and a broken line 1 between "2018" and "2017" can be obtained.

S103, eliminating the first table line set and the second table line set according to the confidence degree of each table grid line in the first table line set and the second table line set to obtain a target table grid line set of the table area.

After the first table line set and the second table line set are obtained, the first table line set and the second table line set can be combined, confidence coefficients are calculated for all the table lines in the combined set, and rejection processing is carried out on the table lines in the combined set according to the confidence coefficients.

Where the confidence may characterize how close the table grid lines are to the actual table grid lines. The higher the confidence of the table grid line, the closer the table grid line is to the actual table line.

Optionally, the target table grid line set of the table area includes a plurality of line segments, and the line segments are restored according to the complete topology structure of the table, so that the table structure of the table area can be obtained.

S104, reconstructing the target table based on the target table line set and a plurality of texts in the table area to obtain the target table.

Optionally, the target table line set may represent a table structure of the table area, after the table structure is restored based on the target table line set, the table lines in the table structure may be subjected to closed detection and anomaly filtering to obtain a final table detection result, and then the text in the table area is filled into the table structure, so that the structured target table may be restored.

If a plurality of tables are included in the picture to be processed, the steps S101-S104 described above may be performed for each table, enabling the reconstruction of the respective image table into the structured table.

In the embodiment of the application, a picture to be processed is detected to obtain a table area of the picture to be processed and a first table line set of the table area, text detection is performed on the table area to obtain a plurality of texts in the table area and text detection boxes of the texts, a second table line set of the table area is generated according to the text detection boxes and the first table line set, and rejection processing is performed on the first table line set and the second table line set according to confidence degrees of the table lines in the first table line set and the second table line set to obtain a target table line set of the table area and a target table is obtained based on the target table line set and the reconstruction of the texts in the table area.

According to the method and the device, through the form detection and the text detection of the picture to be processed, as many form lines as possible can be obtained, so that the generated form lines are denser, and all the form lines actually existing can be covered. On the basis, more accurate table line description can be obtained by eliminating the table lines based on the confidence of the table lines. Aiming at the conditions of table line discontinuity, table line disappearance and cell merging in the table, the method can improve the accuracy of table line restoration.

It should be noted that, in order to improve the accuracy of the table line detection, so that the table dividing line can cover all actually existing table lines, in the present application, a denser table line may be generated based on the table line detected in step S101, for example, an extension line or a supplementary line of the table line is generated, and the generated table line is also added to the first table line set. Specifically, as shown in fig. 5, the step S101 includes:

s501, carrying out table detection on the picture to be processed to obtain a table area of the picture to be processed and a plurality of initial table grid lines of the table area.

S502, performing extension completion processing on each initial table grid line according to the table grid line structure of each initial table grid line to obtain extension lines and full complement lines corresponding to each initial table line.

Alternatively, the initial table line may be a table line obtained after the table detection, for example, a line segment forming the table obtained after the table detection in table 1.

The grid line structure may be a structure of crossing points in a table, and an arrow indicates an extending direction of the grid line. As shown in fig. 6, several manifestations of the grid line structure are presented herein. Referring to fig. 6, the grid line structures at the four corners of the outer edges of the table may be "L" shaped, as particularly shown in the a-D structures of fig. 6. The grid line structure on the grid edge line may be in a "T" shape, as shown in particular in the E-H structures of FIG. 6. The grid line structure in the table may be a "cross" shape, specifically as shown in the type I structure in fig. 6, or a "T" shape.

According to the table line structure of each initial table line, the initial table line can be complemented to the edge of the table area along the arrow direction, and the complement line of the initial table line is obtained. For the initial table grid line with the end point not reaching the edge of the table area in the table, the table line can be extended to the edge of the table area, and an extension line of the initial table grid line is obtained.

Referring to fig. 4, the extension line indicated by the broken line 2 in the second row is obtained by extending the initial table ruled line of the second row obtained after the table detection to the edge of the table area. If a partial discontinuity exists in the identified initial table grid line of the third row, the initial table grid line of the third row can be supplemented, and the line segment supplemented in the third row is the supplement line.

S503, combining a plurality of initial table lines, extension lines and complement lines into a first table line set.

If repeated line segments exist in the initial table grid lines, the extension lines and the full complement lines, the repeated line segments are eliminated, and the union of the table grid lines, the extension lines and the full complement lines is taken as a first table line set.

After the text detection box is processed, a denser table line, such as an extension line or a supplementary line of the table line, may be generated based on the table line detected in step S102, and the table line is added to the second table line set, so that the table division line can cover all actually existing table lines. Specifically, as shown in fig. 7, the step S102 includes:

s701, clustering is carried out on each text detection box according to the first table line set, and a plurality of clustered table lines are obtained.

The clustering process may be performed on a border of the text detection box.

In a first implementation manner, the edges of the text detection boxes in the same row can be clustered in the longitudinal direction, the text detection boxes in the same line are clustered in the transverse direction, the clustered line segments are fitted to obtain a plurality of line segments, then the line segments are screened based on a first table line set, the line segments with the distance smaller than a preset distance threshold value from the existing line segments in the first table line set are removed, and the remaining line segments are the clustered table grid lines.

For example, after clustering and fitting the edges of the text detection box in column "2016" in fig. 3, two fitted line segments may be obtained in the longitudinal direction, which are the left line segment and the right line segment of the text detection box in column "2016", and since the distance between the left line segment and the existing line segment in the first table line set is smaller than the preset distance threshold, the left line segment may be removed, and the right line segment may be used as the table line obtained by fitting.

In the second implementation manner, a sub-region containing a plurality of text detection boxes can be determined in a table region based on the first table line set, then the side lines of the text detection boxes in the sub-region are clustered and fitted to obtain a plurality of line segments, and the line segments with the distance smaller than a preset threshold value are removed to obtain a plurality of clustered table grid lines in the sub-region.

S702, performing extension completion processing on each clustered table line according to the table line structure of each clustered table line to obtain extension lines and full complement lines corresponding to each clustered table line.

S703, merging the clustered table grid lines, the extension lines and the complement lines into a second table line set.

After the clustered form lines are obtained through text detection, the clustered form lines can be complemented to the edges of the form areas along the arrow direction, and the complement lines of the clustered form lines are obtained. For the table lines which exist in the table and have endpoints not reached the table edge area, the clustered table lines can be extended to the edge of the table area, and the extension lines of the clustered table lines are obtained. The specific process may refer to the steps S502 to S503, and will not be described herein.

As a second possible embodiment, the steps S702 to S703 may be performed simultaneously on the detected initial table lines and the post-clustering table lines after the text detection without performing the extension completion processing. Specifically, the process includes:

and performing extension complement processing on the table lines in the first table line set and the table lines in the second table line set to obtain a third table line set.

And eliminating the first table line set, the second table line set and the third table line set according to the confidence coefficient of each table line in the first table line set, the confidence coefficient of each table line in the second table line set and the confidence coefficient of each table line in the third table line set to obtain a target table line set of the table area.

Optionally, after the first table line set and the second table line set are obtained, extension complement processing may be performed on table lines in the first table line set and table lines in the second table line set to obtain a plurality of extension complement line segments, repeated line segments are removed, and all line segments with the extension complement after the repeated line segments are removed form a third table line set.

In the method, the first table line set, the second table line set and the third table line set can be combined into one set, the confidence coefficient of each table line in the combined set is calculated, and then the combined set is subjected to rejection processing based on the confidence coefficient of each table line in the combined set to obtain the target table line set of the table region.

The following is a further description of the process of performing extension complement processing on each table line to obtain extension lines and complement lines corresponding to each table line according to the table line structure of each table line, and it should be understood that the table lines to be processed mentioned below may be initial table lines of the steps S501-S503 or clustered table lines of the steps S702-S703, as shown in fig. 8, including:

s801, performing extension processing on the to-be-processed table grid lines based on edge position information of the table area to obtain a plurality of extension lines, wherein the to-be-processed table grid lines are initial table grid lines or clustered table grid lines, and the clustered table grid lines are table grid lines in a second table line set.

Alternatively, the table grid lines to be processed may be initial table grid lines or clustered table grid lines satisfying the processing conditions. Specifically, the processing condition may be that there is a discontinuity in the table ruled line, the line segment is partially disappeared, or that both end points of the table ruled line do not reach the edge position of the table area.

S802, determining the intersection points of the to-be-processed table lines, determining the extending directions of the to-be-processed table grid lines according to the table line structures of the intersection points, and performing complement processing on the to-be-processed table lines in the extending directions of the intersection points to obtain a plurality of complement lines.

In the application, the longitudinal to-be-processed grid lines can be extended to the longitudinal edge positions of the table area, and the transverse to-be-processed grid lines of the transverse lines are extended to the transverse edge positions of the table area, so that a plurality of extension lines are obtained.

Optionally, the intersecting points of the table lines include vertices of table lines, intersecting points on table edge lines, and intersecting points in a table, wherein the table line structure of the vertices of the table lines is any one of the structures a-D in fig. 6, the intersecting point table line structure on the table edge lines is any one of the structures E-H in fig. 6, and the table line structure in the table may be any one of the structures E-I in fig. 6. According to the table line structure of each table line, the table line can be complemented to the edge of the table area along the arrow direction, and the complement line of the table line is obtained.

In the embodiment of the application, the extension complement processing is performed on the table lines, so that the generated table lines can more comprehensively cover the actual table lines, and the accuracy of table detection is improved.

After the extension supplementing process, some redundant table lines may be introduced, and at this time, the table lines may be removed, specifically, as shown in fig. 9, the step S103 includes:

s901, combining the first table line set and the second table line set into an intermediate table line set, and determining a first confidence of each table line in the intermediate table line set based on a preset semantic segmentation network.

In the application, repeated table lines in the first table line set and the second table line set can be removed first, and the first table line set and the second table line set after repeated table line removal are combined into an intermediate table line set.

As a first implementation, the intermediate set of table lines may consist of a first set of table lines after the extension process and a second set of table lines after the extension process. It should be understood that, as a second possible implementation manner, after the first table line set and the second table line set are obtained, the table lines in the first table line set and the table lines in the second table line set may also be subjected to extension complement processing, the table lines obtained in the extension complement processing are combined into a third table line set, and the first table line set, the second table line set, and the third table line set are combined into an intermediate table line set.

The preset semantic segmentation network can be a two-classification pixel semantic segmentation network and is used for judging whether the current table grid line is correct or not, and carrying out two-classification probability prediction to obtain the confidence coefficient of each table grid line.

When the semantic segmentation network is trained, training samples of the network can be generated by manual labeling or analysis of PDF files, form line pixels in the form are labeled as positive samples, and other labels are negative samples. Because the task is a two-class pixel segmentation network, the BCE loss is adopted,representing the probability that the class of the i-th sample is P-class,>representing the probability that the class of the i-th sample is N class, the standard loss function can be represented by the following formula (1).

（1）

S902, determining a second confidence of each table grid line in the middle table line set according to the position information of each text detection box in the table area and the position information of each table grid line in the middle table grid line set.

Alternatively, the second confidence level of the table ruled line may be determined according to the degree of coincidence of the table ruled line with the text detection box. The higher the degree of coincidence of the table grid lines with the detection frame, the lower the confidence of the table grid lines, which indicates that the table grid lines should be rejected.

S903, determining the confidence of each grid line in the middle grid line set according to the first confidence of each grid line in the middle grid line set and the second confidence of each grid line.

Optionally, in the present application, the experience and the actual service requirement may be combined to set weights of the first confidence and the second confidence, and the first confidence of the table grid line and the second confidence of the table grid line in the middle table grid line set are weighted and calculated, so as to obtain the confidence of the table grid line.

It should be noted that, as shown in table 2, the text in the merging unit cell is shorter, for example, the "growth rate" field, and the text detection frame of the text is also small, so after the extension complement processing of the table lines is performed based on the above steps, the table lines extending longitudinally in the second row may not overlap with the text detection frame of the "growth rate". If the table grid lines are removed only through the confidence coefficient determined in the step S902, a large error exists, so that the semantic segmentation network in the step S901 can be trained for the situation, so that the semantic segmentation network can obtain a more reliable confidence coefficient, and the confidence coefficient error of the situation is reduced.

Table 2 table example

S904, eliminating the table lines in the middle table line set according to the confidence degree of each table line in the middle table line set to obtain a target table line set.

In a first implementation manner, after determining the confidence degrees of all table grid lines in the middle table line set, all table grid lines in the middle table line set can be traversed, and table lines with the confidence degrees smaller than a preset threshold value are eliminated, so that a target table line set is obtained.

In a second implementation manner, for each table line in the middle table line set, a first confidence coefficient and a second confidence coefficient can be calculated first, the confidence coefficient of the table line is obtained according to the first confidence coefficient and the second confidence coefficient, then whether the table line is rejected is determined based on a preset threshold, after the current table line is processed, the confidence coefficient is calculated for the next table line, and rejection processing is performed based on the confidence coefficient until all the table lines in the middle table line set are processed.

As an example, in fig. 4, the cell where the "growth rate" is located includes two vertical dashed lines 2, after calculating the confidence coefficients for the two vertical dashed lines, if it is determined that the confidence coefficients of the two vertical dashed lines of the cell where the "growth rate" is located are smaller than the preset threshold, the two vertical dashed lines may be eliminated. Meanwhile, the horizontal broken line in the cell of the area and the vertical broken line of the cell of the selling amount (millions of dollars) can be removed after the confidence is determined to be smaller than the preset threshold.

The following is a further explanation of the determination of the second confidence of each table ruled line based on the positional information of each text detection box in the table area and the positional information of each table ruled line in the intermediate table ruled line set.

In a first implementation manner, the text detection boxes of the texts obtained in the step S102 may be traversed, whether each table grid line falls into the text detection box is judged for each text detection box, and if the table grid line coincides with the area of the text detection box, the second confidence level of the table grid line may be determined according to the overlapping degree of the text detection box and the table grid line.

In a second implementation, the table lines in the intermediate table line set may be traversed, and whether a text detection box is overlapped with the table line is determined for each table line, and if the text detection box overlapped with the table line is present, the second confidence level of the table line is determined according to the overlapping degree of the table line and the text detection box.

In the second implementation manner, in order to further reduce the number of traversals and improve the processing efficiency, the processing may be further performed in combination with the position information of the table lines and the text detection boxes in the table area, and this process is further described below, as shown in fig. 10, and includes:

S1001, determining a target text detection box overlapped with the middle table line according to the position information of the middle table line and the position information of each text detection box, wherein the middle table line is any table line in the middle table line set.

S1002, determining a second confidence coefficient of the middle table grid line according to the superposition ratio of the middle table grid line and the target text detection box.

For each middle table grid line in the middle table grid line set, position information of the middle table grid line in a table area can be determined first, a plurality of text detection boxes with the distance from the middle table grid line in a preset range are determined, and the text detection boxes which are overlapped with the middle table grid line are taken as target text detection boxes.

After the target text detection box is determined, the overlapping proportion of the middle table grid line and the target text detection box, namely the proportion of the overlapping length of the middle table grid line and the target text detection box to the total length of the middle table grid line, can be determined, and the second confidence of the middle table grid line is determined according to the overlapping proportion.

As a possible implementation manner, the value range of the superposition ratio is between 0 and 1, the interval can be divided into a plurality of subintervals, and a value of the second confidence coefficient is mapped to each subinterval, based on the fact, after the superposition ratio is determined, the value of the second confidence coefficient corresponding to the superposition ratio can be determined according to the mapping relation between the subinterval where the superposition ratio is located and the value of the second confidence coefficient.

The following is a further description of determining the confidence of each table line based on the first confidence of each table line and the second confidence of each table line, and it should be understood that in the first implementation, the first confidence or the second confidence may be directly used as the confidence of the table line. In a second implementation manner, the confidence coefficient of the table grid line can be obtained by performing weighted calculation on the first confidence coefficient and the second confidence coefficient. As shown in fig. 11, the specific procedure of the second implementation includes:

s1101, determining a first product of a first confidence coefficient of the middle grid line and a preset first weight, and determining a second product of a second confidence coefficient of the middle grid line and a preset second weight, wherein the first weight is used for representing the importance degree of the first confidence coefficient, the second weight is used for representing the importance degree of the second confidence coefficient, and the middle grid line is any grid line in the middle grid line set.

S1102, taking the sum of the first product and the second product as the confidence of the middle table grid line.

The first weight and the second weight can be obtained by performing statistical analysis on the historical dataset, the Score is used for representing the confidence coefficient of the middle grid line, k1 is used for representing the first weight, sr1 is used for representing the first confidence coefficient, k2 is used for representing the second weight, sr2 is used for representing the second confidence coefficient, and the Score can be calculated according to the following formula (2).

（2）

Further, the process of removing the table lines in the middle table line set according to the confidence of each table line to obtain the target table line set includes:

traversing the middle table grid line set, and eliminating the table lines with the confidence coefficient smaller than a preset threshold value to obtain a target table line set.

Taking the first implementation manner in the step S904 as an example, for each table line in the middle table line set, table lines with a confidence level smaller than a preset threshold value may be removed, and the remaining table lines with a confidence level equal to or greater than the preset threshold value may be combined to form a target table line set.

Specifically, the process of reconstructing the target table based on the target table grid line set and the plurality of texts in the table area includes:

and combining the table lines in the target table line set based on a preset business strategy to obtain an initial table structure.

And filling each text to the corresponding position of the initial form to obtain the target form.

The preset business strategy may be to combine the table lines in the target table line set into an initial table by using a regularization method and combining business experiences such as the vertical table line and the minimum interval proportion of the table lines.

After the initial table structure is obtained, the text can be filled into the cells corresponding to the initial table structure according to the position information of the text in the table area, so that the target table is obtained.

Furthermore, the abnormal state detection can be performed on the target table, so that the consistency of the target table and the table in the picture to be processed is judged.

As an example, after processing the table in the picture to be processed shown in fig. 2, the resulting target table may be as shown in table 3.

Table 3 target table example

Based on the same inventive concept, the embodiment of the present application further provides a table information processing device corresponding to the table information processing method, and since the principle of solving the problem by the device in the embodiment of the present application is similar to that of the table information processing method in the embodiment of the present application, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 12, a schematic diagram of a table information processing apparatus according to an embodiment of the present application is shown, where the apparatus includes: a form detection module 1201, a text detection module 1202, a culling module 1203, and a reconstruction module 1204.

The table detection module 1201 is configured to perform table detection on a picture to be processed to obtain a table area of the picture to be processed and a first table line set of the table area;

The text detection module 1202 is configured to perform text detection on the table area to obtain a plurality of texts in the table area and text detection boxes of the texts, and generate a second table line set of the table area according to the text detection boxes and the first table line set;

the rejecting module 1203 is configured to reject the first table line set and the second table line set according to the confidence degrees of the table grid lines in the first table line set and the second table line set, so as to obtain a target table line set of the table area;

and a reconstruction module 1204, configured to reconstruct the target table based on the target table line set and the plurality of texts in the table region.

Optionally, the table detection module 1201 is specifically configured to:

performing table detection on the picture to be processed to obtain a table area of the picture to be processed and a plurality of initial table grid lines of the table area;

according to the grid line structure of each initial grid line, carrying out extension and completion treatment on each initial grid line to obtain extension lines and completion lines corresponding to each initial grid line;

combining the initial table lines, the extension lines and the complement lines into a first table line set.

Optionally, the text detection module 1202 is specifically configured to:

Clustering each text detection box according to the first form line set to obtain a plurality of clustered form lines;

according to the grid line structure of each clustered grid line, carrying out extension completion processing on each clustered grid line to obtain extension lines and completion lines corresponding to each clustered grid line;

and merging the clustered table lines, the extension lines and the complement lines into a second table line set.

Optionally, the apparatus further comprises an extension complement module for:

performing extension complement processing on the table lines in the first table line set and the table lines in the second table line set to obtain a third table line set;

optionally, the rejection module 1203 is further configured to reject the first table line set, the second table line set, and the third table line set according to the confidence level of each table line in the first table line set, the confidence level of each table line in the second table line set, and the confidence level of each table line in the third table line set, to obtain a target table line set of the table area.

Optionally, the extension complement module is further configured to:

performing extension processing on the to-be-processed table grid lines based on the edge position information of the table area to obtain a plurality of extension lines, wherein the to-be-processed table grid lines are initial table grid lines or clustered table grid lines, and the clustered table grid lines are table grid lines in a second table line set;

Determining the cross point of the to-be-processed table line, determining the extending direction of the to-be-processed table line according to the table line structure of the cross point, and carrying out complement processing on the to-be-processed table line in the extending direction of the cross point to obtain a plurality of complement lines.

Optionally, the rejection module 1203 is specifically configured to:

combining the first table line set and the second table line set into an intermediate table line set, and determining a first confidence coefficient of each table line in the intermediate table line set based on a preset semantic segmentation network;

determining a second confidence coefficient of each table grid line in the middle table line set according to the position information of each text detection frame in the table area and the position information of each table grid line in the middle table grid line set;

determining the confidence coefficient of each table grid line in the middle table grid line set according to the first confidence coefficient of each table grid line in the middle table grid line set and the second confidence coefficient of each table grid line;

and eliminating the table lines in the middle table line set according to the confidence degree of each table line in the middle table line set to obtain a target table line set.

Optionally, the rejection module 1203 is specifically configured to:

determining a target text detection box overlapped with the middle table line according to the position information of the middle table line and the position information of each text detection box, wherein the middle table line is any table line in the middle table line set;

And determining a second confidence coefficient of the middle table grid line according to the superposition proportion of the middle table grid line and the target text detection box.

Optionally, the rejection module 1203 is specifically configured to:

determining a first product of a first confidence coefficient of an intermediate table grid line and a preset first weight, and determining a second product of a second confidence coefficient of the intermediate table grid line and a preset second weight, wherein the first weight is used for representing the importance degree of the first confidence coefficient, the second weight is used for representing the importance degree of the second confidence coefficient, and the intermediate table grid line is any table grid line in the intermediate table grid line set;

and taking the sum of the first product and the second product as the confidence of the middle table grid line.

Optionally, the rejection module 1203 is specifically configured to:

Optionally, the reconstruction module 1204 is specifically configured to:

combining the table lines in the target table line set based on a preset service strategy to obtain an initial table structure;

and filling each text into a corresponding position of the initial table structure to obtain a target table.

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

According to the method and the device for detecting the table, the table lines can be obtained more densely by detecting the table and the text of the picture to be processed, so that the generated table lines can cover all the table lines which exist actually. On the basis, more accurate table line description can be obtained by eliminating the table lines based on the confidence of the table lines. Aiming at the conditions of table line discontinuity, table line disappearance and cell merging in the table, the method can improve the accuracy of table line restoration.

The embodiment of the application also provides an electronic device, as shown in fig. 13, which is a schematic structural diagram of the electronic device provided in the embodiment of the application, including: processor 1301, memory 1302, and bus. The memory 1302 stores machine-readable instructions executable by the processor 1301 (for example, execution instructions corresponding to the table detection module 1201, the text detection module 1202, the culling module 1203, and the reconstruction module 1204 in the apparatus of fig. 12), and when the computer device is running, the processor 1301 communicates with the memory 1302 through a bus, and the machine-readable instructions are executed by the processor 1301 to perform the processing of the table information processing method described above.

The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores a computer program which executes the steps of the table information processing method when being executed by a processor.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, which are not described in detail in this application. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are covered in the protection scope of the present application.

Claims

1. A form information processing method, characterized by comprising:

2. The method according to claim 1, wherein performing table detection on the to-be-processed picture to obtain a table area of the to-be-processed picture and a first table line set of the table area, includes:

According to the table line structure of each initial table line, carrying out extension complement processing on each initial table line to obtain extension lines and complement lines corresponding to each initial table line;

combining the plurality of initial table lines, each extension line, and each complement line into the first set of table lines.

3. The method of claim 1, wherein the generating a second set of table lines for the table area from each text detection box and the first set of table lines comprises:

clustering each text detection box according to the first table line set to obtain a plurality of clustered table lines;

according to the form line structure of each clustered form line, performing extension complement processing on each clustered form line to obtain an extension line and a complement line corresponding to each clustered form line;

and merging the clustered table lines, the extension lines and the complement lines into the second table line set.

4. The method according to claim 1, wherein the method further comprises:

5. The method according to claim 2, wherein the method further comprises:

performing extension processing on the to-be-processed table grid lines based on the edge position information of the table area to obtain a plurality of extension lines, wherein the to-be-processed table grid lines are the initial table grid lines or clustered table lines, and the clustered table lines are table grid lines in the second table line set;

determining the intersection point of the to-be-processed table line, determining the extending direction of the to-be-processed table line according to the table line structure of the intersection point, and performing complement processing on the to-be-processed table line in the extending direction of the intersection point to obtain a plurality of complement lines.

6. The method of claim 1, wherein the performing a culling process on the first table line set and the second table line set according to the confidence level of each table line in the first table line set and the second table line set to obtain a target table line set of the table region includes:

Combining the first table line set and the second table line set into an intermediate table line set, and determining a first confidence of each table line in the intermediate table line set based on a preset semantic segmentation network;

determining a second confidence coefficient of each table grid line in the middle table grid line set according to the position information of each text detection box in the table area and the position information of each table grid line in the middle table grid line set;

and eliminating the table lines in the middle table line set according to the confidence degree of each table line in the middle table line set to obtain the target table line set.

7. The method of claim 6, wherein determining the second confidence level for each table ruled line in the intermediate set of table ruled lines based on the location information of each text detection box in the table area and the location information of each table ruled line in the intermediate set of table ruled lines, comprises:

8. The method of claim 6, wherein determining the confidence level for each table grid line in the set of intermediate table grid lines based on the first confidence level for each table grid line and the second confidence level for each table grid line comprises:

9. The method of claim 6, wherein the culling of the table lines in the intermediate set of table lines according to the confidence level of each table line in the intermediate set of table lines to obtain the target set of table lines comprises:

And traversing the middle table line set, and eliminating table lines with confidence less than a preset threshold value to obtain the target table line set.

10. The method of claim 1, wherein reconstructing the target table based on the set of target table grid lines and the plurality of text within the table region comprises:

combining the table lines in the target table line set based on a preset business strategy to obtain an initial table structure;

and filling each text into a corresponding position of the initial table structure to obtain the target table.

11. A form information processing apparatus, characterized by comprising:

12. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing program instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is running, the processor executing the program instructions to perform the steps of the table information processing method according to any one of claims 1 to 10 when executed.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the table information processing method according to any one of claims 1 to 10.