CN114463765A - Table information extraction method and device and storage medium - Google Patents

Table information extraction method and device and storage medium Download PDF

Info

Publication number
CN114463765A
CN114463765A CN202210126160.5A CN202210126160A CN114463765A CN 114463765 A CN114463765 A CN 114463765A CN 202210126160 A CN202210126160 A CN 202210126160A CN 114463765 A CN114463765 A CN 114463765A
Authority
CN
China
Prior art keywords
target text
target
region
cell
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210126160.5A
Other languages
Chinese (zh)
Inventor
潘宇
陈琳
吴伟佳
李羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weimin Insurance Agency Co Ltd
Original Assignee
Weimin Insurance Agency Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weimin Insurance Agency Co Ltd filed Critical Weimin Insurance Agency Co Ltd
Priority to CN202210126160.5A priority Critical patent/CN114463765A/en
Publication of CN114463765A publication Critical patent/CN114463765A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application relates to the technical field of computers, and discloses a table information extraction method, a table information extraction device and a storage medium, wherein the method comprises the following steps: acquiring an initial table of a target image and a corresponding relation between each target text area of the target image and each cell in the initial table; determining a target cell group in the initial table; if m is2If the first sum of the number of the target text areas corresponding to each unit cell in the row meets a first preset condition, the mth number of the initial table is added2Row and m1Merging rows to obtain a target table; finally, based on the corresponding relation between each target text area and each cell in the initial table, corresponding each cell in the target tableAnd filling the text data in the target text area into the target table to obtain table data. By adopting the embodiment of the application, the table structuring of the image containing the non-full line table can be realized, so that the table data of the image containing the non-full line table can be extracted.

Description

Table information extraction method and device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for extracting table information, and a storage medium.
Background
The detection and structuring of the tables in the text data image, i.e. the generation of the tables based on the text data in the image, can be applied to a variety of business scenarios, such as a bill of charge or a policy. At present, many table structuring technologies are used for the existence of complete table lines, but for an image containing a non-complete line table, how to perform table structuring on the image is a technical problem which needs to be solved at present.
Disclosure of Invention
The embodiment of the application provides a table information extraction method, a table information extraction device and a storage medium, which can realize table structuring of an image containing a non-full-line table, so that table data of the image containing the non-full-line table is extracted.
In one aspect, an embodiment of the present application provides a table information extraction method, including:
acquiring an initial table of a target image and a corresponding relation between each target text area of the target image and each cell in the initial table, wherein each target text area is obtained by performing text detection processing on the target image;
determining a target cell group in the initial table, wherein the target cell group comprises a first cell and a second cell, the first cell and the second cell are positioned in the same column, and the first cell is positioned in the mth cell in the initial table1Line, the second cell being located m-th in the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions;
if said m is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1Merging rows to obtain a target table, wherein the m-th table of the target table1The target text area corresponding to each cell in the row includes the m-th cell of the initial table1Target text region corresponding to corresponding cell in line and m < th > cell2Target text regions corresponding to respective cells in the row;
and filling text data in the target text areas corresponding to the cells in the target table into the target table based on the corresponding relation between the target text areas and the cells in the initial table to obtain table data.
In one aspect, an embodiment of the present application provides a table information extraction apparatus, where the table information extraction apparatus includes an obtaining unit, a determining unit, a merging unit, and a filling unit, where:
the acquiring unit is configured to acquire an initial table of a target image and a corresponding relationship between each target text region of the target image and each cell in the initial table, where each target text region is obtained by performing text detection processing on the target image;
the determining unit is configured to determine a target cell group in the initial table, where the target cell group includes a first cell and a second cell, the first cell and the second cell are located in the same column, and the first cell is located at the mth cell in the initial table1Line, the second cell being located m-th in the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions;
the merging unit is used for if the m < th > is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1Merging rows to obtain a target table, wherein the mth row in the target table is1The target text area corresponding to each cell in the row includes the m-th cell of the initial table1Target text region corresponding to corresponding cell in line and m < th > cell2Target text regions corresponding to respective cells in the row;
the filling unit is configured to fill, based on the correspondence between each target text region and each cell in the initial table, text data in the target text region corresponding to each cell in the target table into the target table to obtain table data.
In one aspect, an embodiment of the present application provides an electronic device, where the electronic device includes an input interface and an output interface, and further includes:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to perform the above table information extraction method.
In one aspect, an embodiment of the present application provides a computer storage medium, where computer program instructions are stored in the computer storage medium, and when the computer program instructions are executed by a processor, the computer storage medium is configured to perform the above table information extraction method.
In one aspect, embodiments of the present application provide a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium; the processor of the electronic device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, and the computer instructions, when executed by the processor, are used for executing the above table information extraction method.
In the embodiment of the application, an initial table of a target image and a corresponding relation between each target text area of the target image and each cell in the initial table are obtained, and then a target cell group is determined in the initial table, wherein the target cell group comprises a first cell and a second cell, the first cell and the second cell are located in the same column, and the first cell is located at the mth cell in the initial table1Line, second cell m of the initial table2A row; then m is judged2The corresponding purpose of each cell in the rowWhether the first number sum of the text marking areas meets a first preset condition or not is judged, if yes, the mth number of the initial table is judged2Row and m1Merging rows to obtain a target table; and finally, filling the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relation between each target text area and each cell in the initial table to obtain table data. In the embodiment of the application, the initial positions of the target text regions in the initial table can be determined through the obtained initial table and the corresponding relation between each target text region of the target image and each cell in the initial table; then, the target cell group in the initial table is determined, and whether the first sum of the quantities meets a first preset condition is judged, so that the cells which should be merged in the initial table are determined, and the target table is obtained, wherein the target text areas in the two cells which should belong to the same cell but correspond to different rows in the target table are merged. Therefore, the method avoids the situation that text data in the extracted table is disordered due to the fact that no explicit table line is filled in different cells of the text data in the line feed, and can determine the cells of certain target text areas in the table without depending on the table with the explicit table line such as a full line table, further realize the table structuring of the image containing the non-full line table, and extract the table data of the image containing the non-full line table.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a table information extraction system provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a table information extraction method provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a merged cell provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of another merged cell provided in embodiments of the present application;
FIG. 5 is a schematic diagram of another merged cell provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of another merged cell provided in an embodiment of the present application;
fig. 7 is a schematic flowchart of another table information extraction method provided in the embodiment of the present application;
FIG. 8a is a schematic diagram of an initial arrangement provided by an embodiment of the present application;
FIG. 8b is a schematic diagram of a rearrangement scheme provided by an embodiment of the present application;
FIG. 8c is a schematic illustration of another rearrangement provided by an embodiment of the present application;
FIG. 9 is a schematic diagram of another merged cell provided in an embodiment of the present application;
fig. 10 is a schematic flowchart of another table information extraction method provided in the embodiment of the present application;
fig. 11 is a pixel distribution histogram provided in an embodiment of the present application;
FIG. 12 is a schematic diagram of a sticky text segmentation provided in an embodiment of the present application;
fig. 13 is a schematic flowchart of another table information extraction method provided in the embodiment of the present application;
FIG. 14 is a schematic diagram of another merged cell provided by an embodiment of the present application;
fig. 15 is a schematic flowchart of another table information extraction method provided in the embodiment of the present application;
fig. 16 is a schematic structural diagram of a table information extraction apparatus according to an embodiment of the present application;
fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to perform table structuring on an image containing a non-full line table and further extract table data of the image containing the non-full line table, the embodiment of the application provides a table information extraction scheme, and an initial table of a target image and a corresponding relation between each target text area of the target image and each cell in the initial table are obtained firstly; determining a target cell group in the initial table, wherein the target cell group comprises a first cell and a second cell, the first cell and the second cell are positioned in the same column, and the first cell is positioned at the mth cell in the initial table1Line, second cell m of the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions; then, if m is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1Merging rows to obtain a target table, wherein the mth in the target table1The target text area corresponding to each cell in the row includes the mth of the initial table1Target text region corresponding to corresponding cell in line and m < th > cell2Target text regions corresponding to respective cells in the row; and finally, filling the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relation between each target text area and each cell in the initial table to obtain table data.
The image containing the non-full line table may be an image containing a half line table, an image containing a radio table, or an image containing both the full line table, the half line table, and the radio table, and is not limited thereto. The full line table refers to a table with table lines on four frames of the upper side, the lower side, the left side and the right side of each unit cell, the half line table refers to a table with table lines on at most three sides in the four frames of the upper side, the lower side, the left side and the right side of each unit cell, and the wireless table refers to a table without table lines on the four sides of the upper side, the lower side, the left side and the right side of each unit cell. Optionally, the present solution is a general table information extraction method, and is not limited herein, except that the method is applicable to table extraction of an image containing a non-full-line table, and may also be applicable to table extraction of an image containing a full-line table.
In one embodiment, the table information extraction scheme may be executed by a terminal device, where the terminal device may include any one or more of a smartphone, a tablet, a laptop, a desktop computer, a smart car, and a smart wearable device; the terminal device may be an independent server, a cloud server, a server cluster, a distributed system, or the like, and is not limited herein. The terminal device can take an image shot by a user as a target image, or select one image from an image database in the terminal device as the target image, and then the terminal device acquires an initial form of the target image and the corresponding relation between each target text area of the target image and each cell in the initial form; determining a target cell group in the initial table by the terminal equipment; then, the m-th judgment is made by the terminal device2Whether the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition or not is judged, if yes, the terminal equipment judges that the mth number of the initial table is m2Row and m1Merging rows to obtain a target table; finally, the terminal device fills text data in the target text areas corresponding to the cells in the target table into the target table based on the corresponding relation between the target text areas and the cells in the initial table to obtain table data; in addition, the terminal equipment can also output the table data, so that the user can conveniently check or confirm the extracted table data.
Based on the table information extraction scheme, the embodiment of the application provides a table information extraction system. Referring to fig. 1, a schematic structural diagram of a table information extraction system provided in an embodiment of the present application is shown. The form information extraction system shown in fig. 1 may include a terminal apparatus 101 and a server 102. The terminal device 101 may include any one or more of a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart car, and a smart wearable device. The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The terminal device 101 and the server 102 may be directly or indirectly connected in a wired or wireless communication manner, and the present application is not limited thereto.
In one embodiment, the user of the terminal apparatus 101 may take a photographed image as a target image, or select an image from an image database in the terminal apparatus 101 as a target image, and then the terminal apparatus 101 uploads the target image to the server 102. The server 102 acquires an initial table of the target image and the corresponding relation between each target text area of the target image and each cell in the initial table; then the server 102 determines the target cell group in the initial table; then, the m-th judgment is made by the server 1022Whether the first sum of the number of the target text areas corresponding to each cell in the row meets a first preset condition or not, if yes, the server 102 compares the mth number of the initial table with the number of the target text areas corresponding to each cell in the row2Row and m1Merging rows to obtain a target table; finally, the server 102 fills the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relationship between each target text area and each cell in the initial table to obtain table data; finally, the server 102 transmits the form data to the terminal apparatus 101, and the terminal apparatus 101 outputs the form data toThe user can check and confirm the extracted form data conveniently.
Based on the table information extraction scheme and the table information extraction system, the embodiment of the application provides a table information extraction method. Referring to fig. 2, a flow chart of a table information extraction method provided in the embodiment of the present application is schematically illustrated. The table information extraction method shown in fig. 2 may be performed by the server or the terminal device shown in fig. 1. The table information extraction method shown in fig. 2 may include the steps of:
s201, acquiring an initial table of the target image and the corresponding relation between each target text area of the target image and each cell in the initial table.
In the embodiment of the present application, the target image may be a photographed image, an image selected from a plurality of images in an image database, or an image received by a server or a terminal device, which is not limited herein. The method for selecting the image by the terminal device or the server may be that the user selects one or more images among the plurality of images as the target image, or the terminal device or the server arbitrarily selects one or more images among the plurality of images as the target image, which is not limited herein.
In an embodiment of the present application, the initial table may include m × n cells, m is determined according to the position information of the minimum ordinate target text region and the maximum ordinate target text region in the target image, n is determined according to the position information of the minimum abscissa target text region and the maximum abscissa target text region in the target image, m and n are positive integers, m refers to the number of rows of the initial table, and n refers to the number of columns of the initial table.
Specifically, the maximum number of target text areas existing between the target text area of the smallest abscissa and the target text area of the largest abscissa in the target image within the same ordinate range may be detected, and then 2 is added to the maximum number, so that n, i.e., the number of columns, in the initial table may be determined.
For example, the order ofIn the target image, a target text area A with the smallest abscissa and the ordinate of 200-300 pixels1And the target text area A with the maximum abscissa 24 target text areas exist between 200-300 pixels in the vertical coordinate, and the target text area B with the minimum horizontal coordinate and the vertical coordinate located at 500-600 pixels in the target image1And a target text region B with the largest abscissa2And 3 target text regions exist between 500-600 pixels in the ordinate, it can be determined that the maximum number of target text regions existing between the target text region of the minimum abscissa and the target text region of the maximum abscissa, which are located within the same ordinate range, is 4, and thus n in the initial table is determined to be 6. Optionally, m in the initial table is determined in a manner similar to that of n, and is not described herein in detail.
Alternatively, the abscissa of all the target text regions in the target image may be determined, the target text regions with the difference value between the ordinate within the preset difference value are determined as the same text region group, and then the number of the text region groups is counted to determine the text region group as m in the initial table.
For example, the preset difference value is set to be 20 pixels, the ordinate of the target text area 1 in the target image is 100-200 pixels in the target image, the ordinate of the target text area 2 in the target image is 110-210 pixels in the target image, the ordinate of the target text area 3 in the target image is 300-400 pixels in the target image, the ordinate of the target text area 4 in the target image is 298-390 pixels, the ordinate of the target text area 5 in the target image is 396-550 pixels, the ordinate of the target text area 6 in the target image is 700-800 pixels, and the ordinate of the target text area 7 in the target image is 712-790 pixels; therefore, it can be determined that the target text region 1 and the target text region 2 are a text region group, the target text region 3 and the target text region 4 are a text region group, and the target text region 6 and the target text region 7 are a text region group; although the vertical coordinate of the target text area 5 is partially overlapped with the horizontal coordinate of the target text area 3, the difference value between the vertical coordinates is more than 20 pixel points, so that the target text area 5 is independently a text area group; it can finally be determined that m in the initial table is 4.
Optionally, n in the initial table is determined in a manner similar to that of m, and is not described herein in detail.
In this embodiment of the present application, the manner of obtaining the correspondence between each target text region of the target image and each cell in the initial table refers to: each target text area should belong to each cell in the initial table. That is, if the target text region P is located at the 1 st column of the 2 nd row in the target image, the target text region P also corresponds to the cell at the 1 st column of the 2 nd row in the initial table.
Optionally, each target text region is obtained by performing text detection processing on the target image. Specifically, the target image may be detected by an Optical Character Recognition (OCR) technique to obtain each target text region; or performing text detection processing on the target image through the trained text detection model to obtain each target text region. Illustratively, the text detection model may be a deep learning model such as DBNet (a differentiable binary segmentation network for text detection), CTPN (a network for text detection including a convolutional neural network and a cyclic neural network), SegLink (a convolutional neural network capable of detecting text with a rotation angle), and the like, and different models may be flexibly selected based on different requirements, which is not limited herein. The process of training the text detection model is a technical means commonly used by those skilled in the art, and is not described herein again.
Optionally, after performing text detection processing on the target image to obtain all text regions in the target image, table detection processing may be performed on the target image to determine a table region of the target image, and then a text region belonging to the table region in all the text regions of the target image is determined as the target text region. Specifically, the form detection processing on the target image may be performed by training a machine learning model such as Yolov5 (a real-time object detection model), detecting forms such as a wireless form, a half-line form and a full-line form in the target image by the trained model, obtaining coordinates of each form, determining a form region, and further determining which form each text region obtained by the text detection processing belongs to, so as to distinguish text contents in different forms, thereby facilitating the subsequent form structuring processing.
S202, determining a target cell group in the initial table, wherein the target cell group comprises a first cell and a second cell, and the first cell is positioned at the m-th cell in the initial table1Line, second cell m of the initial table2And (6) rows. Wherein m is1、m2Are all positive integers, m1Less than m2
In the embodiment of the present application, the first cell and the second cell of the target cell group are located in the same column, and corresponding target text regions exist in both the first cell and the second cell. Specifically, the first cell and the second cell of the target cell group may be two cells located in the same column in two adjacent rows in the initial table; the first cell and the second cell of the target cell group may also be two non-adjacent rows in the initial table, but a third cell located between the first cell and the second cell has no corresponding target text region; wherein the third cell is in the same column as the first cell, and the row in the initial table is in the m-th1Row and m2The cells between the rows.
For example, cell W in the initial table1At row 5, column 4, cell W2At row 3, column 3, cell W3At row 4, column 4, cell W4At row 2, column 3, cell W5At row 4, column 3, cell W6At row 5, column 3. Wherein, the cells W are removed2And cell W5The corresponding target text region exists in all the other cells. Due to the unit cell W1And cell W3Are positioned in two adjacent rows and in the same column, so that the unit cell W can be determined1And cell W3Is a target cell group(ii) a At the same time, the cell W4And cell W6In the same column and the unit cell W4And cell W6Cell W in between2And cell W5If there is no corresponding target text region, then the cell W may also be determined4And cell W6Is a target cell group.
S203, if m is2If the first sum of the number of the target text areas corresponding to each unit cell in the row meets a first preset condition, the mth number of the initial table is added2Row and m1And merging the rows to obtain the target table.
S204, based on the corresponding relation between each target text area and each cell in the initial table, filling the text data in the target text area corresponding to each cell in the target table into the target table to obtain table data.
In this embodiment of the application, the filling, based on the correspondence between each target text region and each cell in the initial table, text data in the target text region corresponding to each cell in the target table into the target table to obtain table data may be: and writing the text data in each target text area into each cell corresponding to each target text area to obtain a table containing the text data, thereby generating the table data. Optionally, the format of the table data may be a table such as Excel and the like which is convenient to view and use, and may also be a table such as JSON (JSON Object Notation) and the like which is convenient to perform subsequent service development through the table data, which is not limited herein.
In this embodiment, the first preset condition may be: the sum of the first quantity is less than the m1And summing the second number of the target text regions corresponding to the cells in the row. Then the way of determining whether the first sum of the quantities satisfies the first preset condition may be: get the m < th > of1The sum of the second number of target text regions corresponding to each cell in the row; and if the second quantity sum is larger than the first quantity sum, determining that the first quantity sum meets a first preset condition.
For example, referring to fig. 3, fig. 3 shows a schematic diagram of a merged cell. The initial table of the acquired target image 301 and the target text regions corresponding to the cells in the initial table are shown in the initial table 302 in fig. 3. Here, "111111111111" and "xxxxxxxxxxxx" in the target image 301 are both related to line feed, so that "111111111111" is recognized as "11111111" and "1111", and "xxxxxxxxxxxx" is recognized as "xxxxxx" and "xxxxxx", so that a total of 4 lines of cells in the initial table 302 are obtained. From the positional relationship between the cells in the initial table 302 and the correspondence relationship between the cells and the target text region, 8 target cell groups as shown in fig. 3 can be determined, in which the first cell is positioned above and the second cell is positioned below in each target cell group. Wherein, there are 3 cells in the 3 rd row of the initial table 302 where the cells 304 in the target cell group 4 are located, wherein the target text regions corresponding to the first two cells in the 3 rd row are "1111" and "xxxxxx", respectively, and the last cell in the 3 rd row has no corresponding target text region, so that it can be determined that the first sum of the numbers of the target text regions corresponding to the respective cells in the 3 rd row where the cells 304 in the target cell group 4 are located is 2, and there are 3 cells in the 2 nd row of the initial table 302 where the cells 303 are located, wherein the target text regions corresponding to the 3 cells are "1111", "xxxxxx", and "300.00", respectively, so that it can be determined that the sum of the second numbers of the target text regions corresponding to the respective cells in the 2 nd row where the cells 303 in the target cell group 4 are located is 3, whereby it can be determined that the sum of the second numbers is greater than the first sum of the numbers, it is determined that the 3 rd row of the initial table 302 is merged to the 2 nd row, and finally the target table 307 is obtained, wherein the target text area corresponding to each cell is indicated in the target table 307 shown in fig. 3.
In one embodiment, the first preset condition may be: the first sum of the quantities is less than or equal to a first predetermined value. Optionally, the first preset value may be less than or equal to the mth value1The first preset value may be set manually or set systematically, and is not limited herein. For example, the first predetermined value may be 1, 5,8, etc. Alternatively, the manner of determining whether the first sum of the quantities satisfies the first preset condition may be: and if the second quantity sum is smaller than or equal to a first preset value, determining that the first quantity sum meets a first preset condition.
Illustratively, referring to fig. 4, fig. 4 shows a schematic diagram of another merged cell. Setting the first preset value to 1, the obtained initial table and the target text area corresponding to each cell in the initial table are shown in the initial table 401 in fig. 4. By comparing the first sum of the first numbers of the target text areas corresponding to the respective cells in the row in which the second cell of the respective target cell groups in the initial table 401 is located with the first preset numerical value, it can be determined that only the target cell group consisting of the cell of row 4, column 2 and the cell of row 5, column 2 satisfies the first preset condition, and therefore, it can be determined that the row 5 is merged to the row 4, and the target table 402 is finally obtained, where the target text areas corresponding to the respective cells are marked in the target table 402 as shown in fig. 4.
In one embodiment, the mth number is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1The way of obtaining the target table by row merging may be: 1) if m is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1Row merging to obtain an updated table; 2) taking the updated table as an initial table, triggering and executing the determination of the target cell group in the initial table until the mth table2If the sum of the first number of the target text areas corresponding to the cells in the row does not meet the first preset condition, determining the initial table to which the cells corresponding to the target text areas which do not meet the first preset condition belong as the targetAnd a target table, wherein the target table comprises p × q cells, p is less than or equal to m, p and q are positive integers, p refers to the number of rows in the target table, q refers to the number of columns in the target table, and m refers to the number of rows in the initial table.
Specifically, when the m-th cell of the second cell in the target cell group is located2If the first sum of the number of the target text areas corresponding to the cells in the line meets a first preset condition, the mth cell in which the first cell in the target cell group is located is determined1Line, m-th cell where the second cell in the target cell group is located2Merging the rows; complete the m < th > of1Row and m2After the merging of the rows, an updated table is obtained. Then, determining a new target cell group again in the updated table, judging whether the new target cell group has a target cell group meeting the first preset condition, if so, continuing to merge, and updating the table; if not, the updated table is determined as the target table.
Illustratively, referring to fig. 5, fig. 5 shows a schematic diagram of yet another merged cell. The first preset condition may be set such that the sum of the first numbers of the respective target cell groups is less than or equal to 1. An initial table 502 of the target image 501 is acquired, wherein the initial table 502 indicates the target text area corresponding to each cell. The initial table 502 is a table with 7 rows and 3 columns as shown in the table 503, and for convenience of description, the cells of the table 503 are sequentially encoded to obtain cells 1 to 21. Through the position relationship between the cells in the table 503 and the corresponding relationship between each cell and the target text region, it can be determined that the target cell group of the table 503 includes: {1,4}, {4,7}, {7,10}, {10,13}, {13,19}, {2,5}, {5,14}, {14,17}, {17,20}, {3,6}, {6,15}, and {15,21 }. The determination may be made by taking any one target cell group from the target cell groups in the table 503, for example, after the target cell group {10,13} is taken, the first number sum of the 5 th row in which the cell 13 is located is 3, and the first preset condition is not satisfied. If the target cell set {14,17} is extracted, and the first number sum of the 6 th row where the cell 17 is located is 1, and the first preset condition is satisfied, it may be determined that the 6 th row in the table 503 is merged to the 5 th row, so as to obtain the updated table, that is, the table 504.
After obtaining table 504, it can be determined that the target cell group of table 504 includes: {1,4}, {4,7}, {7,10}, {10, (13,16) }, { (13,16),19}, {2,5}, {5, (14,17) }, { (14,17),20}, {3,6}, {6, (15,18) }, { (15,18),21 }. Any one target cell group may be selected from the target cell groups in the table 504, for example, after the target cell group {1,4} is taken out, the first number sum of the 2 nd row in which the cell 4 is located is 3, and the first preset condition is not satisfied. If the target cell set {4,7} is extracted, and the first number sum of the 3 rd row in which the cell 7 is located is 1, and the first preset condition is satisfied, it may be determined that the 3 rd row in the table 504 is merged to the 2 nd row, so as to obtain the updated table, that is, the table 505. Similarly, after the table 505 is updated again, the table 506 is obtained.
After table 506 is obtained, it can be determined that the target cell group of table 506 includes: {1, (4,7,10) }, { (4,7,10), (13,16) }, { (13,16),19}, {2, (5,8,11) }, { (5,8,11), (14,17) }, { (14,17),20}, {3, (6,9,12) }, { (6,9,12), (15,18) }, { (15,18),21 }. Since there is no target cell group satisfying the first preset condition among the target cell groups of the table 506, the table 506 may be determined to be a target table. Finally, based on the corresponding relationship between each target text area and each cell in the initial table, filling the text data in the target text area corresponding to each cell in the target table into the target table, so as to obtain table data 507, for example, if the cell in the row 2 and column 1 of the table 506 is formed by merging the original cell 4, cell 7, and cell 10, then the text data in the target text area corresponding to the original cell 4, cell 7, and cell 10 are all filled into the cell in the row 2 and column 1, so that the text data in the cell in the row 2 and column 1 in the table 506 can be finally determined to be "11111111222222223333".
In one embodiment, the manner of determining the target cell group in the initial table may be: 1) determining at least one cell group in the initial table, wherein each cell group comprises a first cell and a second cell; 2) acquiring a target text region corresponding to a first cell in each cell group, and acquiring the distance between the target text region corresponding to a second cell in each cell group and the target text region in the target image; 3) and determining the cell group with the minimum distance as a target cell group.
Optionally, the distance between the target text region corresponding to the first cell and the target text region corresponding to the second cell in the target image may be: the distance between the area center point of the target text area corresponding to the first cell and the area center point of the target text area corresponding to the second cell; or the distance between the lower border of the region of the target text region corresponding to the first cell and the upper border of the region of the target text region corresponding to the second cell; alternatively, the target text region corresponding to the first cell may be the target text region corresponding to the second cell, and the target text region corresponding to the second cell may be another distance in the target image, which is not limited herein. Optionally, the distance may be a pixel point, or may also be a unit length such as millimeter, centimeter, and the like, which is not limited herein. Illustratively, the distance may be 100 pixels, 0.8mm, etc.
Specifically, in general, two cells that occupy one line in the initial table are associated with the same target text region as the distance between the target text regions is smaller, but the probability that a line feed is recognized as belonging to two cells is higher, and therefore, in order to further improve the merging efficiency, the cell group with the smallest distance may be determined as the target cell group.
Optionally, in the process of designing a bill such as expense detail, insurance reimbursement and the like, the text data of the first row or the first column is often the text data of table names or table headers and the like which can not cause line feed, so in the process of determining the target cell group, the cells positioned in the first row or the first column can be excluded.
Illustratively, referring to fig. 6, fig. 6 shows a schematic diagram of yet another merged cell. Setting a first preset condition as follows: mth cell in the target cell group located in the initial table2The sum of the first number of target text regions corresponding to each cell in the line is less than or equal to 1. An initial table 602 of the target image 601 is acquired, wherein the initial table 602 indicates the target text area corresponding to each cell. The initial table 602 is a table with 5 rows and 3 columns as shown in the table 603, and for convenience of description, the cells of the table 603 are sequentially encoded to obtain cells 1 to 15. Through the position relationship between the cells in the table 503 and the corresponding relationship between each cell and the target text region, it can be determined that the cell group of the table 603 includes: {1,4}, {4,7}, {7,13}, {2,5}, {5,8}, {8,11}, {11,14}, {3,6}, {6,9}, and {9,15 }. Then, the distance between the target text region corresponding to the first cell and the target text region corresponding to the second cell in each cell group in the target image 601 is obtained, and the cell groups are sorted according to the distance from small to large, where the sorted cell groups are: {8,11}, {11,14}, {1,4}, {2,5}, {3,6}, {4,7}, {5,8}, {6,9}, {7,13}, and {9,15 }. Therefore, the cell group {4,7} having the smallest distance among the cell groups of the table 603 can be determined as the target cell group. Since the sum of the first number of the target text regions corresponding to the 3 rd row where the cell 7 is located satisfies the first preset condition, it may be determined to merge the 3 rd row in the table 603 into the 2 nd row, so as to obtain the target table 604. Finally, the text data in the target text area corresponding to each cell in the target table 604 is filled into the cell, so as to obtain table data 605.
Alternatively, in the manner of obtaining the target table by cyclically updating the initial table, each time the updated table is obtained, the target cell group in the updated table may also be determined by comparing the distances between the target text regions corresponding to the two cells.
In the embodiment of the application, an initial table of a target image and a corresponding relation between each target text area of the target image and each cell in the initial table are obtained, and then a target cell group is determined in the initial table, wherein the target cell group comprises a first cell and a second cell, the first cell and the second cell are located in the same column, and the first cell is located at the mth cell in the initial table1Line, second cell m of the initial table2A row; then m is judged2Whether the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition or not is judged, if yes, the mth number of the initial table is judged2Row and m1Merging rows to obtain a target table; and finally, filling the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relation between each target text area and each cell in the initial table to obtain table data. In the embodiment of the application, the initial positions of the target text regions in the initial table can be determined through the obtained initial table and the corresponding relation between each target text region of the target image and each cell in the initial table; then, the target table is obtained by determining the target cell group in the initial table and determining whether the first sum of the quantities meets a first preset condition, so as to determine the cells that should be merged in the initial table, wherein the target text regions in the two cells that should belong to the same cell but correspond to different rows in the target table are merged. Therefore, the method avoids the situation that text data in the extracted table is disordered due to the fact that no explicit table line is filled in different cells of the text data in the line feed, and can determine the cells of certain target text areas in the table without depending on the table with the explicit table line such as a full line table, further realize the table structuring of the image containing the non-full line table, and extract the table data of the image containing the non-full line table.
Based on the table information extraction scheme and the table information extraction system, the embodiment of the application provides another table information extraction method. Referring to fig. 7, a schematic flow chart of another table information extraction method provided in the embodiment of the present application is shown. The table information extraction method shown in fig. 7 may be performed by the server or the terminal device shown in fig. 1. The table information extraction method shown in fig. 7 may include the steps of:
s701, acquiring the position information of each target text region in the target image. Wherein the position information may include an abscissa and an ordinate of each target text region in the target image.
In this embodiment of the present application, before obtaining the position information of each target text region in the target image, a rotation correction process may be performed on the target image, where the rotation correction process may include large-angle (90 °, 180 °, 270 °) and small-angle tilt correction (within 45 °), so that a large error may not occur in the position information of the target text regions that should be located in the same row or the same column and are detected in the subsequent OCR recognition, and if the vertical coordinates of 3 target text regions in the same row are different greatly due to the tilt of the target image, then when performing table structuring subsequently, the 3 target text regions are easily considered to belong to cells in 3 rows.
S702, based on the position information of each target text area, an initial table of the target image is determined.
In this embodiment of the present application, the manner of determining the initial table of the target image based on the location information of each target text region may be: 1) arranging each target text region based on the position information of each target text region to obtain the arranged target text region; 2) determining an initial table based on the arranged target text regions, wherein the initial table may include m × n cells, m is determined according to a target text region with a minimum ordinate and a target text region with a maximum ordinate in the arranged target text regions, and n is determined according to a target text region with a minimum abscissa and a target text region with a maximum abscissa in the arranged target text regions.
In an embodiment, the step of arranging the target text regions based on the position information of the target text regions to obtain the arranged target text regions may be:
1) and acquiring the position information of the area center point of each target text area in the target image. The position information of the area center point in the target image refers to: the region center point of each target text region is along the abscissa and the ordinate in the target image.
2) And sequencing the target text regions based on the position information of the region center points of the target text regions to obtain the sequenced target text regions, wherein the vertical coordinate of the region center point of the x-th target text region in the sequenced target text regions is less than or equal to the vertical coordinate of the region center point of the x + 1-th target text region, x is a positive integer and is less than the number of the target text regions in the sequenced target text regions.
Specifically, after the position information of the area center point of each target text area is acquired, the target text areas may be sorted in the order from small to large according to the vertical coordinates of the area center points of the target text areas; if the vertical coordinates of the center points of the plurality of target text regions are the same or the vertical coordinates of the center points of the plurality of target text regions have small difference, sequencing the target text regions in the plurality of target text regions according to the sequence from small to large of the horizontal coordinates of the center points of the target text regions, and finally obtaining the sequenced target text regions.
For example, there are 8 target text regions in the target image, where the region center point coordinate of the target text region 1 is (100 ), the region center point coordinate of the target text region 2 is (200,105), the region center point coordinate of the target text region 3 is (400,98), the region center point coordinate of the target text region 4 is (300,100), the region center point coordinate of the target text region 5 is (200,305), the region center point coordinate of the target text region 6 is (100,302), the region center point coordinate of the target text region 7 is (300,299), and the region center point coordinate of the target text region 8 is (400,308).
Then, it is set that the target text regions having the difference in vertical coordinates of the center points of the regions smaller than 10 pixels are determined as the target text regions having the same vertical coordinates, then it can be determined that the vertical coordinates of the center points of the regions of the target text regions 1 to 4 are the same and the vertical coordinates of the center points of the regions of the target text regions 5 to 8 are the same. And comparing the horizontal coordinates of the area center points of the target text areas 1-4 with the horizontal coordinates of the area center points of the target text areas 5-8, and finally obtaining the sequenced target text areas as follows: a target text region 1, a target text region 2, a target text region 4, a target text region 3, a target text region 6, a target text region 5, a target text region 7, and a target text region 8.
3) Acquiring position information of each region frame of a first target text region in the sequenced target text regions in the target image;
4) arranging the sorted target text regions based on the vertical coordinates of the first region frame and the second region frame of the first target text region and the vertical coordinates of the region center points of the target text regions behind the first target text region to obtain the initially arranged target text regions;
specifically, the first region border may be an uppermost border of the target text region in the target image, and the second region border may be a lowermost border of the target text region in the target image.
In addition, the manner of arranging the sorted target text regions based on the vertical coordinates of the first region border and the second region border of the first target text region and the vertical coordinates of the region center point of each target text region after the first target text region may be: determining a first difference value between a vertical coordinate of a center point of each target text region behind the first target text region and a border of the first region, and a second difference value between the vertical coordinate of the center point of each target text region behind the first target text region and a border of the second region, arranging each target text region behind the first target text region according to the sequence of the first difference values from small to large, and if the first difference values of a plurality of target text regions are the same or the first difference values of the plurality of target text regions are small, arranging each target text region in the plurality of target text regions according to the sequence of the second difference values of each target text region from small to large, and finally obtaining the arranged target text regions.
In an embodiment, the manner of obtaining the initially arranged target text regions by arranging the sorted target text regions based on the vertical coordinates of the first region border and the second region border of the first target text region and the vertical coordinate of the region center point of each target text region after the first target text region may also be:
traversing each target text region in the sorted target text regions, determining the target text region of which the vertical coordinate of the center point of the region in the sorted target text region is larger than the vertical coordinate of a first region frame of a first target text region and smaller than the vertical coordinate of a second region frame of the first target text region, and enabling the determined target text region and the first target text region to be positioned in the same line in the initially arranged target text region;
if the vertical coordinate of the central point of the next target text area of the determined target text areas in the sorted target text areas is larger than the vertical coordinate of the first area frame of the first target text area and larger than the vertical coordinate of the second area frame of the first target text area, determining that the next target text area is positioned at the next line of the first target text area in the initially arranged target text areas;
and taking the next target text area as a first target text area, and triggering and determining the target text area of which the vertical coordinate of the central point of the areas in the sequenced target text areas is larger than the vertical coordinate of the first area frame of the first target text area and smaller than the vertical coordinate of the second area frame of the first target text area so as to obtain the initially arranged target text area.
And the target text regions in the same line in the initially arranged target text regions have the same sequence as the target text regions in the same line in the sequenced target text regions.
For example, referring to fig. 8a, fig. 8a shows a schematic diagram of an initial arrangement. After text detection processing is carried out on the target image 801, target text areas a-k are obtained; then, based on the position information of the center point of each target text region, the target text regions are sorted, and the obtained sorted target text regions are shown as an array G in fig. 8 a. The first target text area a in the array G and the next target text area b of the first target text area a are extracted, and since the ordinate of the area center point 802 of the target text area b is larger than the ordinate of the first area border 803 of the target text area a and smaller than the ordinate of the second area border 804 of the target text area a, it can be determined that the target text area b and the target text area a are located on the same line.
Then, taking the target text region b as the first target text region, at which the next target text region is the target text region c, it is apparent that the ordinate of the region center point 805 of the target text region c is larger than the ordinate of the first region border 806 of the target text region b and larger than the ordinate of the second region border 807 of the target text region b, and therefore, it can be determined that the target text region c is located next to the target text region b. Thus, it can be determined that the target text region located on the 1 st line includes the target text region a and the target text region b, and further, since the target text region a is ranked before the target text region b in the array G, the target text region a is ranked before the target text region b in the 1 st line.
Finally, as shown in fig. 8a, the first target text region and the next target text region are repeatedly determined and compared, and finally 5 lines of the initially arranged target text regions can be determined, wherein the 1 st line includes target text regions a to b, the 2 nd line includes target text regions c to d, the 3 rd line includes target text region f, the 4 th line includes target text regions g to i, and the 5 th line includes target text regions j to k.
5) And rearranging the initially arranged target text regions based on the abscissa of the third region border and the fourth region border of the first target text region of the initially arranged target text regions and the abscissa of the region center points of other target text regions in the initially arranged target text regions to obtain the arranged target text regions.
Specifically, the third region border may be a rightmost border of the target text region in the target image, and the fourth region border may be a leftmost border of the target text region in the target image.
In addition, the mode of rearranging the initially arranged target text regions based on the abscissa of the third region frame and the fourth region frame of the first target text region of the initially arranged target text regions and the abscissa of the center point of the region of the other target text regions in the initially arranged target text regions to obtain the arranged target text regions may be:
determining target text regions which are positioned in the same column in the target text regions after the initial sorting, wherein the abscissa of the center point of the regions of the target text regions in the same column is larger than the abscissa of the border of a third region of the target text regions positioned in the first row in the target text regions in the same column and is smaller than the abscissa of the border of a fourth region of the target text regions positioned in the first row in the target text regions in the same column;
determining the nth row of target text areas in the arranged target text areas based on the horizontal coordinates of the area center points of the target text areas in the rowsiAnd listing to obtain the arranged target text area. Wherein if each target text region is located in the m-th of the initially arranged target text regionsiLine, then each target text region is located at m-th of the arranged target text regionsiLine, mi、niIs a positive integer, miM, n or lessiIs less than or equal to n. The n thiThe columns refer to: any column in the initial table comprising m × n cellsThe m thiThe rows refer to: any row in the initial table that includes m n cells.
Specifically, the nth column of target text regions in the arranged target text regions is determined based on the abscissa of the center point of the region of each column of target text regionsiThe way of the columns may be: determining the number of columns of the target text areas according to the descending order of the abscissa of the center point of the area of the first target text area in the target text areas, for example, taking the target text area in the column with the smallest abscissa of the center point of the area of the first target text area in the target text areas in the columns as the 1 st column of target text area.
Optionally, determining that each column of target text regions is located at the n-th position in the arranged target text regions based on the abscissa of the region center point of each column of target text regionsiThe manner of columns may also be: and calculating the average value of the horizontal coordinates of the area center points of all the target text areas in each row of target text areas, and determining the number of rows of the target text areas in each row according to the descending order of the average value of the horizontal coordinates, for example, taking the row of target text areas with the minimum average value of the horizontal coordinates as the 1 st row of target text areas.
For example, referring to fig. 8b, fig. 8b shows a schematic diagram of a rearrangement mode. After the text detection processing is performed on the target image 808, target text regions a to k may be obtained, where a region center point of each target text region is indicated in a region center of each target text region in a point form, for example, the region center point of the target text region a is a point 817. As can be seen from the target image 808, the abscissa of the center point of the target text region d, f, h is greater than the abscissa of the third region border 812 of the target text region a and less than the abscissa of the fourth region border 811 of the target text region a, so that it can be determined that the target text regions a, d, f, h are located in the same column; similarly, it may be determined that the target text regions b, e, i, k are located in the same column, and that the target text regions c, g, j are located in the same column.
Meanwhile, since the 3 columns of target text regions are located in the target text region of the 1 st line, the abscissa of the target text region c is smallest, the target text region a times later, and the abscissa of the target text region b is largest, it can be determined that the target text regions c, g, j are located in the 1 st column, the target text regions a, d, f, h are located in the 2 nd column, and the target text regions b, e, i, k are located in the 3 rd column. Then, as shown in the structure 815, after determining that each target text region is located in the row and column, it may be determined that the number of rows of the target text region in the target image 808 is 5 at maximum and the number of columns is 3 at maximum; finally, it may be determined that the initial table 816 includes 5 × 6 cells.
In an embodiment, the mode of rearranging the initially arranged target text regions based on the abscissa of the third region border and the fourth region border of the first target text region of the initially arranged target text regions and the abscissa of the center point of the region of the other target text regions in the initially arranged target text regions to obtain the arranged target text regions may also be:
traversing a target text region positioned in a first line in the target text regions after initial arrangement, determining the target text region of which the abscissa of the center point of the region in the target text regions after initial arrangement is larger than the abscissa of a border of a third region of the target text region positioned in the first line and smaller than the abscissa of a border of a fourth region of the target text region positioned in the first line, wherein the determined target text region and the target text region positioned in the first line are positioned in the same column in the target text regions after arrangement;
if a target text region which is not in the same column as the target text region in the first row exists, determining the target text region in the row which is not in the same column as the target text region in the first row and is closest to the first row, taking the target text region which is not in the same column as the target text region in the first row as the target text region after initial sequencing, taking the remaining target text region in the closest row as the target text region in the first row after initial sequencing, triggering to traverse the target text region in the first row in the target text region after initial sequencing, and determining that the horizontal coordinate of the center point of the region in the target text region after initial sequencing is larger than the horizontal coordinate of the border of the third region of the target text region in the first row, and the target text area is smaller than the abscissa of the frame of the fourth area of the target text area in the first line, so as to obtain the arranged target text area.
Determining the n-th line of target text area in the arranged target text area based on the horizontal coordinate of the area center point of each line of target text areaiAnd listing to obtain the aligned target text area.
For example, referring to fig. 8c, fig. 8c shows a schematic diagram of another rearrangement. As shown in fig. 8a, the target text regions located in the first line among the initially arranged target text regions may be determined as a target text region a and a target text region b. Then, traversing the target text areas c, d and e in the second line in the initially arranged target text areas, wherein the abscissa of the center point of only the target text area d in the second line is larger than the abscissa of the border of the third area of the target text area in the first line and smaller than the abscissa of the border of the fourth area of the target text area in the first line, so that the target text area d and the target text area a can be determined to be in the same column; similarly, sequentially traversing the target text regions located in the third to fifth lines in the initially arranged target text regions, and finally determining that the target text regions located in the same column as the target text region a include target text regions d, f, and h, and the target text regions located in the same column as the target text region b include target text regions e, i, and k.
After traversing the target text region located in the first row in the target text regions after initial arrangement, the target text regions after initial arrangement are found to have the remaining target text regions c and target text regions j, that is, the target text regions which are not in the same column as the target text regions located in the first row exist.
At this time, determining the target text regions after the initial arrangement as a target text region c, a target text region g and a target text region j; meanwhile, since the target text region c is located on the second line, and the line number is closest to the first line, the target text region c is determined as the target text region located on the first line. Then, traversing the second to fifth lines, the target text region g may be determined, and the target text region j is located in the same column as the target text region c.
Further, since the 3 columns of target text regions are located in the target text region of line 1, the abscissa of the target text region c is smallest, the target text region a times later, and the abscissa of the target text region b is largest; thus, a structure 815 is obtained, and finally it can be determined by the structure 815 that the initial table 816 includes 5 × 6 cells.
S703, acquiring the corresponding relation between each target text area of the target image and each cell in the initial table.
S704, determining a target cell group in the initial table, wherein the target cell group comprises a first cell and a second cell, and the first cell is positioned at the m-th cell in the initial table1Line, second cell m of the initial table2And (6) rows. Wherein m is1、m2Are all positive integers, m1Less than m2And the first cell and the second cell of the target cell group are positioned in the same column, and the first cell and the second cell both have corresponding target text regions.
S705, if m is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1And merging the rows to obtain the target table.
S706, based on the corresponding relation between each target text area and each cell in the initial table, filling text data in the target text area corresponding to each cell in the target table into the target table to obtain table data.
In the embodiment of the present application, the number m2The sum of the first number of the target text areas corresponding to the cells in the line meets a first preset barIf it is, the m-th table of the initial table is2Row and m1The way of obtaining the target table by row merging may be: 1) determining a first data length of the text data of the target text region corresponding to the first cell and a second data length of the text data of the target text region corresponding to the second cell; 2) if m is2If the sum of the first number of the target text areas corresponding to each cell in the row meets a first preset condition and the first data length is greater than the second data length, the mth number of the initial table is added2Row and m1And merging the rows to obtain the target table. Optionally, the first data length may be greater than or equal to the second data length, or the first data length may be equal to the second data length.
Specifically, for text data in which a line feed condition exists, the text data to be fed to the next line is generally text data that cannot be written in a table in the previous line, and therefore, the data length of the text data to be fed to the next line is generally not longer than the data length of the text data in the previous line, and therefore, while the first sum of the number meets the first preset condition, it can be further determined whether the first data length is greater than or equal to the second data length, so that the accuracy in determining cell merging is improved.
Alternatively, the first data length and the second data length may be a text data length of the target text region, such as 200 pixels, 20mm, and the like, or may be a number of characters of the text data in the target text region, such as 10 characters, 8 characters, and the like.
In one embodiment, the mth number is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1The way of obtaining the target table by row merging may also be: 1) performing line detection processing on the target image to obtain at least one line; 2) if any line does not exist in the target image and is positioned between the target text area corresponding to the first cell and the target text area corresponding to the second cell, the m < th > of the initial table is added2Row and m1Merging the rows to obtain a targetAnd (4) a table. The at least one line may be a horizontal line or an oblique line, which is not limited herein. Optionally, at least one line in step 1) may be screened according to factors such as the position, length, angle and the like of the line, lines with too short length, more inclined angles and the like are removed, and at least one target line is determined; then judging whether any target line exists in the target image and is positioned between the target text area corresponding to the first cell and the target text area corresponding to the second cell, if so, judging that the mth of the initial table is2Row and m1And merging the rows to obtain the target table.
Optionally, the line detection processing on the target image may be performed by first performing binarization processing on the target image, and then detecting the binarized target image through a horizontal line detection interface (such as a huffman linear detection interface) of an OpenCV (cross-platform computer vision and machine learning software library based on apache2.0 license), so as to obtain at least one line in the target image. Alternatively, a deep learning model such as a region with CNN features (R-CNN) may be trained, and the lines in the target image may be determined by the trained model. Alternatively, the line detection processing may be performed on the target image in other manners, which is not limited herein.
Specifically, in the target image, if a line exists between the target text region corresponding to the first cell and the target text region corresponding to the second cell of the target cell group, the target text region corresponding to the first cell and the target text region corresponding to the second cell may not be the target text regions identified as two lines due to the line feed condition, and therefore should not be merged.
For example, referring to fig. 9, fig. 9 shows a schematic diagram of another merged cell. Setting a first preset condition as follows: mth cell in the target cell group located in the initial table2The sum of the first number of target text regions corresponding to each cell in the line is less than or equal to 1. Performing text on target image 901After the detection processing, the target text areas a to I can be obtained, and the initial table 902 of the target image 901 can be obtained through the steps 701 to 702, wherein the target text areas corresponding to the cells are indicated in the initial table 902. Of the 6 target cell groups of the initial table 902, the target cell groups satisfying the first preset condition include a target cell group composed of a target text region G and a target text region H, and a target cell group composed of a target text region C and a target text region I.
Then, performing line detection processing on the target image 901 to obtain a line 904, and comparing the positions of the target text region G, the target text region H and the line 904 to determine that the line 904 is located above the target text region G and the target text region H; meanwhile, comparing the positions between the target text region C, the target text region I, and the line 904, it may be determined that the line 904 is located between the target text region C and the target text region I. Therefore, only the cells corresponding to the target text region G, and the cells corresponding to the target text region H can be merged. Finally, a target table 902 may be obtained, and the text data in the target text area is filled into the corresponding cell, so that table data 903 may be obtained.
Optionally, the mth2If the first sum of the number of the target text areas corresponding to each unit cell in the row meets a first preset condition, the mth number of the initial table is added2Row and m1The way of obtaining the target table by row merging may also be: if m is2If the sum of the first number of the target text areas corresponding to the cells in the row meets a first preset condition, the first data length is greater than the second data length, and no line exists in the target image and is positioned between the target text area corresponding to the first cell and the target text area corresponding to the second cell, the mth number of the initial table is set2Row and m1And merging the rows to obtain the target table.
In this embodiment of the application, the manner of filling the text data in the target text region corresponding to each cell in the target table into the target table based on the corresponding relationship between each target text region and each cell in the initial table to obtain the table data may refer to the specific implementation manner in step S204, which is not described herein again.
In the embodiment of the application, the position information of each target text region in the target image is obtained, and then the initial table of the target image is determined based on the position information of each target text region; then obtaining the corresponding relation between each target text area of the target image and each cell in the initial table, and then determining a target cell group in the initial table, wherein the target cell group comprises a first cell and a second cell, the first cell and the second cell are positioned in the same column, and the first cell is positioned in the mth cell in the initial table1Line, second cell m of the initial table2A row; then m is judged2Whether the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition or not is judged, if yes, the mth number of the initial table is judged2Row and m1Merging rows to obtain a target table; and finally, filling the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relation between each target text area and each cell in the initial table to obtain table data. In the embodiment of the application, by judging whether the horizontal and vertical coordinates of the center point of each target text area are within the range of the horizontal and vertical coordinates of the area frame of the previous target text area, the situation that the coordinate information of the target text frame area located in the same column or the same row in the detected target image has errors and is distributed to different rows and different columns due to the inclination of the target image can be effectively avoided, and therefore the number of rows and columns of the finally constructed initial table has larger errors. Then, determining the cells to be merged in the initial table by judging whether the first quantity sum meets a first preset condition, so as to obtain a target table; therefore, the method avoids the situation that text data in the extracted table is disordered because no clear table line is filled in different cells in the text data of the line feed, thereby realizing independenceFor example, a table with clear table lines such as a full line table determines which cells of the table some target text regions are located in, so as to realize table structuring of an image containing a non-full line table and extract table data of the image containing the non-full line table.
Based on the above table information extraction scheme and table information extraction system, the embodiment of the present application provides yet another table information extraction method. Fig. 10 is a schematic flowchart of another form information extraction method provided in the embodiment of the present application. The table information extraction method shown in fig. 10 may be executed by the server and the terminal device shown in fig. 1. The table information extraction method shown in fig. 10 may include the steps of:
s1001, the terminal device sends the target image to a server.
In this embodiment of the application, the terminal device may send the target image to the server by wireless communication or wired communication, may send the target image after encryption, or may send the target image by other methods, which is not limited herein.
S1002, the server performs text detection processing on the target image to obtain at least one initial text region.
The specific implementation of performing text detection processing on the target image to obtain at least one initial text region may refer to the specific implementation of performing text detection processing on the target image in step S201 to obtain each target text region, which is not described herein again.
S1003, the server acquires the segmentation position of each initial text region.
In this embodiment of the present application, the obtaining of the segmentation position of each initial text region may be: firstly, carrying out character detection processing on each initial text region to determine the distance between adjacent characters in each initial text region, and if the distance between two adjacent characters is greater than a preset distance, determining the position between the two adjacent characters as a segmentation position. Optionally, the preset distance may be a pixel point, or may also be a unit length such as a millimeter, a centimeter, and the like, which is not limited herein. Optionally, the character detection processing on each initial text region may be performed by detecting a target image by an optical character recognition technique, so as to obtain each character; the target image may be detected by a deep learning model such as DBNet, CTPN, SegLink, or the like to obtain each character, which is not limited herein. The process of training the text detection model is a technical means commonly used by those skilled in the art, and is not described herein again.
In one embodiment, the manner of obtaining the segmentation position of each initial text region may be:
1) after each initial text region is detected, extracting a text region image corresponding to each initial text region;
2) performing binarization processing on each text region image to obtain a processed text region image, for example, setting pixels of pixel points identified as belonging to characters in each text region image to be 255, and setting pixels of pixel points not belonging to the characters to be 0;
3) accumulating pixel values of pixel points positioned in the same column in each processed text region image to obtain a pixel distribution histogram; the horizontal coordinate may be a pixel index of the processed text region image, and the vertical coordinate may be a pixel accumulated value of a certain column of pixel points;
4) and if the accumulated value of the continuous n rows of pixels in the pixel distribution histogram is smaller than the preset pixel value, determining the position of the continuous n rows in the text region image as a segmentation position. Wherein n is a positive integer, n is greater than a second preset value, and both n and the second preset value may be set manually or by a system, such as 10, 20, 100, etc., without limitation.
For example, referring to fig. 11, fig. 11 shows a pixel distribution histogram. Wherein, the abscissa is a pixel index for indicating the 1 st to 20 th columns of pixels of the processed text region image; the ordinate may be the pixel accumulation value for each column. If the second preset value is set to 20 and n is set to 3, the positions of the 7 th to 11 th columns of pixel points in the processed text region image may be determined as the segmentation positions through fig. 11.
And S1004, the server performs segmentation processing on each initial text region based on the segmentation position to obtain a first initial text subregion and a second initial text subregion. And the abscissa of the first initial text subarea in the target image is smaller than the abscissa of the second initial text subarea in the target image.
S1005, the server performs character recognition processing on the first initial text subregion to determine a first character in the first initial text subregion, and performs character recognition processing on the second initial text subregion to determine a second character in the second initial text subregion.
In the embodiment of the present application, the fact that the abscissa of the first initial text subregion in the target image is smaller than the abscissa of the second initial text subregion in the target image means that: in the target image, the first initial text subregion is located to the left of the second initial text subregion. Specifically, because the common writing convention is to write from left to right, the abscissa of the first initial text subregion in the target image is determined to be less than the abscissa of the second initial text subregion in the target image; if the writing habit changes from right to left, then it is determined that the abscissa of the first initial text sub-region in the target image is greater than the abscissa of the second initial text sub-region in the target image. Optionally, the position relationship between the first initial text sub-region and the second initial text sub-region may correspondingly change with the change of the writing habit, and is not limited herein.
In an embodiment of the application, the first character is a last character of the text data of the first initial text sub-region, and the second character is a first character of the text data of the second initial text sub-region. For a specific implementation of the character recognition processing, reference may be made to the specific implementation of the character recognition processing in step S1003, which is not described herein again.
S1006, if the character type of the first character is not a preset type and the character type of the second character is not a preset type, the server determines that the first initial text subregion and the second initial text subregion are different target text regions.
In the embodiment of the present application, the preset type may be a chinese symbolic character and an english symbolic character. Specifically, in the process of printing fonts, english numeric characters generally occupy one character width, and chinese characters generally occupy two character widths, while for symbols such as colon, dash, dot and the like commonly used in the tickets, the symbols actually occupy fewer pixels, resulting in the following: ","; "and other Chinese symbolic characters, and". ","! "there are often large blank areas before and after the english symbol character, which are easily determined as the segmentation position in step S1003, so after the segmentation position is obtained, the first character in the first initial text sub-area and the second character in the second initial text sub-area can be identified, and then whether the initial text area needs to be segmented according to the segmentation position is determined by judging whether the character types of the first character and the second character are not the preset types. After the wrong segmentation position is removed, the position information such as coordinates of the target text region and the text data obtained after segmentation is carried out by using the correct segmentation position are beneficial to subsequent tabular extraction of the target image.
For example, referring to fig. 12, fig. 12 is a schematic diagram illustrating a text sticky segmentation. After the target image 1201 is subjected to image recognition, 4 initial text regions can be obtained. Then, it can be acquired that the cutting position of the initial text region 1202 is position a, and the cutting position of the initial text region 1203 is position B. By segmenting the initial text region 1202 at location a, and after performing the character recognition process, it may be determined that the first character of the first initial text sub-region of the initial text region 1202 is ": ", the second character of the second initial text sub-region is" man ", wherein the first character": "is a preset type, and therefore, the initial text region 1202 does not need to be segmented, and the initial text region 1202 is the target text region.
After the initial text region 1203 is segmented at the position B and character recognition processing is performed, it may be determined that a first character of a first initial text sub-region of the initial text region 1203 is a "meta", and a second character of a second initial text sub-region is a "6", which are not all of a preset type; therefore, the initial text region 1203 needs to be sliced at the position B, and the resulting target text region is shown as the target image 1204.
S1007, the server acquires an initial table of the target image and the corresponding relation between each target text area of the target image and each cell in the initial table.
S1008, the server determines a target cell group in the initial table, wherein the target cell group comprises a first cell and a second cell, and the first cell is located at the mth cell in the initial table1Line, second cell m of the initial table2And (6) rows. Wherein m is1、m2Are all positive integers, m1Less than m2
S1009, if m2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the server stores the mth number of the initial table2Row and m1And merging the rows to obtain the target table.
S1010, the server fills the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relation between each target text area and each cell in the initial table to obtain table data.
For specific implementation of steps S1007 to S1010, reference may be made to the specific implementation of steps S201 to S204, which is not described herein again.
S1011, the server transmits the table data to the terminal device.
In this embodiment of the application, in addition to the form data, prompt information including success of form data extraction, failure of form data extraction, and the like may also be sent to the terminal device, which is not limited herein. The manner of sending the table data to the terminal device in step S1010 is a conventional technical means for those skilled in the art, and is not described herein again.
In the embodiment of the application, after the segmentation position is obtained, each initial text region is segmented based on the segmentation position to obtain a first initial text sub-region and a second initial text sub-region, then character recognition processing is performed on the first initial text sub-region and the second initial text sub-region to determine a first character and a second character, and finally whether the initial text region needs to be segmented into the first initial text sub-region and the second initial text sub-region is determined by judging whether the character types of the first character or the second character are not preset types. Because different texts are easily identified to be the same text area in the process of text detection of characters with close distances in the target image, and the identification error of the text area easily causes the construction error of the subsequent initial table and the combination error of cells, the accuracy of table structuring can be effectively improved by correctly segmenting the initial text area. In addition, whether the first quantity sum meets a first preset condition or not is judged to determine the cells which should be combined in the initial table, so that a target table is obtained, the situation that text data is disordered in the extracted table due to the fact that no clear table line is filled in different cells in the text data of the line feed can be avoided, the cells of certain target text areas located in the table can be determined without depending on the table with the clear table line such as a full line table, the table structuring of the image containing the non-full line table is further achieved, and the table data of the image containing the non-full line table is extracted.
Based on the above table information extraction scheme and table information extraction system, the embodiment of the present application provides yet another table information extraction method. Referring to fig. 13, a schematic flow chart of another table information extraction method provided in the embodiment of the present application is shown. The table information extraction method shown in fig. 13 may be performed by the server or the terminal device shown in fig. 1. The table information extraction method shown in fig. 13 may include the steps of:
s1301, acquiring an initial table of the target image and the corresponding relation between each target text area of the target image and each cell in the initial table. Each target text area is obtained by performing text detection processing on the target image.
In this embodiment of the present application, the manner of obtaining the initial table of the target image may be: firstly, acquiring position information of each target text area in a target image, wherein the position information can comprise an abscissa and an ordinate of each target text area in the target image; then, based on the position information of the respective target text regions, an initial table of the target image is determined.
Specifically, the manner of determining the initial table of the target image based on the position information of each target text region may be: 1) arranging each target text region based on the position information of each target text region to obtain the arranged target text region; 2) based on the arranged target text regions, an initial table is determined.
Optionally, based on the position information of each target text region, the target text regions are arranged, and a manner of obtaining the arranged target text regions may be:
1) and acquiring the position information of the region center point of each target text region in the target image. The position information of the area center point in the target image refers to: the region center point of each target text region is along the abscissa and the ordinate in the target image.
2) And sequencing the target text regions based on the position information of the region center points of the target text regions to obtain the sequenced target text regions, wherein the abscissa of the region center point of the x-th target text region in the sequenced target text regions is less than or equal to the abscissa of the region center point of the x + 1-th target text region, x is a positive integer, and x is less than the number of the sequenced target text regions.
Specifically, after the position information of the area center point of each target text area is acquired, the target text areas may be sorted in order from small to large according to the abscissa of the area center point of each target text area; if the abscissa of the center points of the target text regions is the same or the difference between the abscissas of the center points of the target text regions is small, sequencing the target text regions in the target text regions according to the sequence from small to large of the ordinate of the center points of the target text regions, and finally obtaining the sequenced target text regions.
3) Acquiring position information of each region frame of a first target text region in the sequenced target text regions in the target image;
4) arranging the sorted target text regions based on the abscissa of the third region frame and the fourth region frame of the first target text region and the abscissa of the region center point of each target text region behind the first target text region to obtain the initially arranged target text regions;
in an embodiment, the manner of obtaining the initially arranged target text regions by arranging the sorted target text regions based on the abscissa of the third region border and the fourth region border of the first target text region and the abscissa of the region center point of each target text region after the first target text region may further be:
traversing each target text region in the sorted target text regions, determining the target text region of which the abscissa of the center point of the region in the sorted target text region is larger than the abscissa of the border of a third region of the first target text region and smaller than the abscissa of the border of a second region of the first target text region, and enabling the determined target text region and the first target text region to be positioned in the same column in the initially arranged target text regions;
if the abscissa of the region center point of the next target text region of the determined target text regions in the sorted target text regions is larger than the abscissa of the border of the third region of the first target text region and larger than the abscissa of the border of the fourth region of the first target text region, determining that the next target text region is positioned in the next column of the first target text region in the initially arranged target text regions;
and taking the next target text area as a first target text area, and triggering and determining the target text area of which the abscissa of the central point of the area in the sequenced target text areas is larger than the abscissa of the border of a third area of the first target text area and smaller than the abscissa of the border of a second area of the first target text area so as to obtain the initially arranged target text area.
And the target text regions in the same column in the target text regions after the initial arrangement have the same rank as the target text regions in the same column in the ranked target text regions.
5) And rearranging the initially arranged target text regions based on the vertical coordinates of the first region frame and the second region frame of the first target text region of the initially arranged target text regions and the vertical coordinates of the region center points of other target text regions in the initially arranged target text regions to obtain the arranged target text regions.
In an embodiment, the manner of rearranging the initially arranged target text regions based on the vertical coordinates of the first region border and the second region border of the first one of the initially arranged target text regions and the vertical coordinates of the region center points of other target text regions in the initially arranged target text regions to obtain the arranged target text regions may be:
determining target text areas positioned in the same line in the arranged target text areas in the initially sequenced target text areas, wherein the vertical coordinate of the center point of the area of the target text areas in the same line is larger than the vertical coordinate of the border of a first area of the target text areas positioned in the first column in the target text areas in the same line and is smaller than the vertical coordinate of the border of a second area of the target text areas positioned in the first column in the target text areas in the same line;
determining the mth line of the target text area in the arranged target text area based on the vertical coordinate of the area center point of each line of the target text areaiAnd (6) rows. Wherein if each target text region is located in the n-th of the initially arranged target text regionsiColumn, then each target text region is located in the n-th of the arranged target text regionsiAnd (4) columns.
In an embodiment, the rearranging, based on the vertical coordinates of the first region border and the second region border of the first target text region of the initially arranged target text regions and the vertical coordinates of the center points of the regions of other target text regions in the initially arranged target text regions, may further be performed in a manner of:
traversing target text regions in a first column in the target text regions after initial arrangement, determining the target text regions in which the abscissa of the center points of the regions in the target text regions after initial arrangement is larger than the ordinate of a first region border of the target text regions in the first column and smaller than the ordinate of a second region border of the target text regions in the first column, wherein the determined target text regions and the target text regions in the first column are positioned in the same line in the target text regions after arrangement;
if a target text region which is not in the same line with the target text region in the first column exists, determining the target text region in the column which is closest to the first column and is not in the same line with the target text region in the first column, taking the target text region which is not in the same line with the target text region in the first column as the target text region after initial arrangement, taking the rest target text regions in the closest line as the target text regions in the first column after initial sequencing, triggering to traverse the target text region in the first column in the target text regions after initial arrangement, and determining that the horizontal coordinate of the center point of the regions in the target text regions after initial arrangement is larger than the vertical coordinate of the border of the first region of the target text regions in the first column, and the determined target text area and the target text area in the first column are positioned in the same line in the arranged target text areas.
Determining the mth line of the target text area in the arranged target text area based on the vertical coordinate of the area center point of each line of the target text areaiAnd (5) performing line drawing to obtain the arranged target text area.
In one embodiment, the manner of acquiring each target text region of the target image may be: 1) performing text detection processing on the target image to obtain at least one initial text region; 2) acquiring the segmentation position of each initial text region; 3) segmenting each initial text region based on the segmentation position to obtain a third initial text sub-region and a fourth initial text sub-region, wherein the ordinate of the third initial text sub-region in the target image is smaller than the ordinate of the fourth initial text sub-region in the target image; 4) performing character recognition processing on the third initial text subregion, determining a third character in the third initial text subregion, performing character recognition processing on the fourth initial text subregion, and determining a fourth character in the fourth initial text subregion, wherein the third character is the last character of the text data of the third initial text subregion, and the fourth character is the first character of the text data of the fourth initial text subregion; 5) and if the character type of the third character is not the preset type and the character type of the third character is not the preset type, determining that the third initial text subregion and the fourth initial text subregion are different target text regions.
The specific implementation of step S1301 is similar to the specific implementation of steps S201 and S701, and the specific implementation of steps S1001 to S1006, and reference may be made to the specific implementation of steps S201 and S701, which are not repeated herein.
S1302, a target cell group is determined in the initial table, the target cell group comprises a first cell and a second cell, and the first cell is located at the n-th cell in the initial table1Column, second cell located at nth in the initial table2And (4) columns.
In this embodiment, the first unit cell and the second unit cell are located in the same row, and the first unit cell and the second unit cell have correspondingTarget text region, n1、n2Are all positive integers, n1Less than n2. Specifically, the first cell and the second cell of the target cell group may be two cells located in the same row in two adjacent columns in the initial table; the first cell and the second cell of the target cell group may also be two nonadjacent columns in the initial table, but a third cell located between the first cell and the second cell has no corresponding target text region; wherein, the third cell means that the third cell is in the same row with the first cell and the column in the initial table is in the nth column1Column and nth2The cells between the columns.
S1303, if n is2If the sum of the third number of the target text areas corresponding to each cell in the column meets a second preset condition, the nth number of the initial table is added2Column and nth1And combining the columns to obtain the target table. Wherein, the nth table in the target table1The target text area corresponding to each cell in the column includes the nth table of the initial table1The target text region corresponding to the corresponding cell in the column and the n-th cell2The target text region corresponding to the respective cell in the column.
S1304, based on the correspondence between each target text region and each cell in the initial table, filling the text data in the target text region corresponding to each cell in the target table into the target table, so as to obtain table data.
In this embodiment of the present application, the second preset condition may be: the third sum of the quantities being smaller than the nth1A fourth sum of the number of target text regions corresponding to each cell in the column. Then the way of determining whether the sum of the third numbers satisfies the second preset condition may be: get the n-th1A fourth sum of the number of target text regions corresponding to each cell in the column; and if the fourth quantity sum is larger than the third quantity sum, determining that the third quantity sum meets a second preset condition.
For example, referring to fig. 14, fig. 14 shows a schematic diagram of another merged cell. The initial table of the acquired target image 1401 and the target text regions corresponding to the respective cells in the initial table are shown as an initial table 1402 in fig. 14. Here, "2246567" is recognized as "2246" and "567" because "2246567" and "the schooler who should belong to the same cell in the target image 1401 both have a line feed, and" the schooler "is recognized as" the schooler "and" the schooler ", so that 5 columns of cells are in total in the acquired initial table 1402. From the positional relationship between the cells in the initial table 1402 and the correspondence relationship between the cells and the target text region, 11 target cell groups as shown in fig. 14 can be determined, in which the first cell is located on the left side and the second cell is located on the right side in each target cell group. Wherein there are 3 cells in total in the 4 th column of the initial table 1402 in which the cell 1404 in the target cell group 3 is located, wherein the target text region corresponding to the first cell in the 4 th column is "567", the target text region corresponding to the third cell in the 4 th column is "student", and the second cell in the 4 th column has no corresponding target text region, so that it can be determined that the sum total of the third numbers of target text regions corresponding to respective cells in the 4 th column in which the cell 1404 in the target cell group 3 is located is 2, and there are 3 cells in total in the 3 rd column of the initial table 1402 in which the cell 1403 is located, wherein the target text regions corresponding to the 3 cells are "2246", "white", and "home" respectively, so that it can be determined that the sum total of the fourth numbers of target text regions corresponding to respective cells in the 3 rd column in which the cell 1403 in the target cell group 3 is located is 3, it can thus be determined that the fourth quantity sum is greater than the third quantity sum, so as to determine to merge the 4 th column of the initial table 1402 into the 3 rd column, and finally obtain the target table 1407, where the target text regions corresponding to the respective cells are indicated in the target table 1407 as shown in fig. 14.
In one embodiment, the second preset condition may be: the sum of the first quantities is less than or equal to a third predetermined value. Optionally, the third preset value may be less than or equal to the nth value1Target text corresponding to each cell in the columnThe third preset value may be set manually or by a system, and is not limited herein. For example, the third predetermined value may be 1, 5,8, etc. Alternatively, the manner of determining whether the third sum of the quantities satisfies the second preset condition may be: and if the second quantity sum is smaller than or equal to a third preset value, determining that the third quantity sum meets a second preset condition.
In one embodiment, the n-th order2If the sum of the third number of the target text areas corresponding to each cell in the column meets a second preset condition, the nth number of the initial table is added2Column and nth1The column merging to obtain the target table may be: determining a first data length of the text data of the target text region corresponding to the first cell and a second data length of the text data of the target text region corresponding to the second cell; if said n is2The sum of the third number of the target text areas corresponding to each cell in the column meets a second preset condition, and the first data length is greater than the second data length, then the nth number of the initial table is used2Column and nth1And combining the columns to obtain the target table.
In one embodiment, the n-th order2If the sum of the third number of the target text areas corresponding to each cell in the column meets a second preset condition, the nth number of the initial table is added2Column and nth1The column merging to obtain the target table may be: performing line detection processing on the target image to obtain at least one line; if said n is2If the sum of the third number of the target text areas corresponding to the cells in the column meets a second preset condition and any line does not exist in the target image and is located between the target text area corresponding to the first cell and the target text area corresponding to the second cell, the nth number of the initial table is used2Column and nth1And combining the columns to obtain the target table. The at least one line may be a vertical line or an oblique line, which is not limited herein.
In one embodiment, the manner of determining the target cell group in the initial table may be: 1) determining at least one cell group in the initial table, wherein each cell group comprises a first cell and a second cell; 2) acquiring a target text region corresponding to a first cell in each cell group, and acquiring the distance between the target text region corresponding to a second cell in each cell group and the target text region in the target image; 3) and determining the cell group with the minimum distance as a target cell group.
In one embodiment, the n-th order2If the sum of the third number of the target text areas corresponding to each cell in the column meets a second preset condition, the nth number of the initial table is added2Column and nth1The column merging to obtain the target table may also be: if it is n2If the sum of the third number of the target text areas corresponding to each cell in the column meets a second preset condition, the nth number of the initial table is added2Column and nth1Merging the columns to obtain an updated table; taking the updated table as an initial table, triggering and executing the determination of the target cell group in the initial table until the nth2And if the sum of the third number of the target text areas corresponding to the cells in the column does not meet the second preset condition, determining the initial table to which the cells corresponding to the target text areas which do not meet the second preset condition belong as a target table, wherein the target table comprises p × q cells, p is less than or equal to m, and p and q are positive integers.
The specific embodiments of steps S1302-S1504 are the same as or similar to the specific embodiments of steps S202-204, steps 702-706 and steps S1002-S1010, which can be referred to the specific embodiments of steps S202-204, steps 702-706 and steps S1002-S1010, and are not repeated herein.
In the embodiment of the application, the position information of each target text region in the target image is obtained, and then the initial table of the target image is determined based on the position information of each target text region; then acquiring the corresponding relation between each target text area of the target image and each cell in the initial table, and determining a target cell group in the initial table, wherein the target cell group comprises a first cell and a second cell,the first cell and the second cell are in the same row, and the first cell is in the nth cell in the initial table1Line, second cell located at nth in initial table2Columns; then, the n-th order is judged2Whether the sum of the third number of the target text areas corresponding to each cell in the column meets a second preset condition or not is judged, if yes, the nth number of the initial table is judged2Row and nth1Merging the columns to obtain a target table; and finally, filling the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relation between each target text area and each cell in the initial table to obtain table data. In the embodiment of the application, whether the sum of the third quantities meets a second preset condition or not can be judged to determine the cells to be merged in the initial table, so that a target table is obtained; therefore, the method avoids the situation that text data in the extracted table is disordered due to the fact that no explicit table line is filled in different cells of the text data in the column, and can determine the cells of certain target text areas in the table without depending on the table with the explicit table line such as a full line table, further realize the table structuring of the image containing the non-full line table, and extract the table data of the image containing the non-full line table.
Based on the above table information extraction scheme and table information extraction system, the embodiment of the present application provides yet another table information extraction method. Referring to fig. 15, a schematic flow chart of another table information extraction method provided in the embodiment of the present application is shown. The table information extraction method shown in fig. 15 may be performed by the server or the terminal device shown in fig. 1. The table information extraction method shown in fig. 15 may include the steps of:
s1501, acquiring an initial table of the target image, the corresponding relation between each target text area of the target image and each cell in the initial table, and the display mode of text data in each target text area. And each target text area is obtained by carrying out text detection processing on the target image.
S1502, determining a target cell group in the initial table based on a display manner of the text data in each target text region, where the target cell group includes a first cell and a second cell, and a position relationship and a display manner of the first cell and the second cell in the initial table are matched, where the second cell is located next to the first cell in the initial table.
In this embodiment of the present application, the matching of the position relationship and the display mode of the first cell and the second cell in the initial table means:
1) when the display mode of the text data in each target text area is horizontal display, the first cell and the second cell are positioned in the same column, and the first cell is positioned in the m-th cell in the initial table1Line, second cell m of the initial table2And (6) rows. The second cell being located next to the first cell in the initial table means: m is1Less than m2The first cell and the second cell both have corresponding target text regions, where m1、m2Are all positive integers;
2) when the display mode of the text data in each target text area is vertical display, the first cell and the second cell are positioned in the same line, and the first cell is positioned in the nth cell in the initial table1Column, the second cell being located at the n-th cell in the initial table2And (4) columns. The second cell being located next to the first cell in the initial table means: n is1Less than n2The first cell and the second cell have corresponding target text regions, where n1、n2Are all positive integers.
S1503, judging a display mode, and if the display mode is vertical display, executing S1504; if the display mode is landscape display, step S1505 is executed;
s1504, if the sum of the third number of the target text areas corresponding to the cells included in the column of the second cell meets a second preset condition, merging the column of the second cell with the column of the first cell to obtain a target table;
s1505, if the sum of the first number of the target text regions corresponding to each cell included in the line where the second cell is located meets a first preset condition, merging the line where the second cell is located and the line where the first cell is located to obtain a target table.
And S1506, based on the corresponding relation between each target text area and each cell in the initial table, filling the text data in the target text area corresponding to each cell in the target table into the target table to obtain table data.
The specific embodiments of the steps S1501 to S1506 are the same as or similar to the specific embodiments of the steps S201 to 204, the steps 701-706 and the steps S1002 to S1010, which can be referred to the specific embodiments of the steps S201 to 204, the steps 701-706 and the steps S1002 to S1010, and are not repeated herein.
In the embodiment of the application, an initial table of a target image, a corresponding relation between each target text area of the target image and each cell in the initial table and a display mode of text data in each target text area are obtained, and then a target cell group is determined in the initial table; if the display mode is vertical display and the sum of the third number of the target text areas corresponding to the cells included in the column of the second cell meets a second preset condition, merging the column of the second cell with the column of the first cell to obtain a target table; if the display mode is horizontal display and the sum of the first number of the target text areas corresponding to the cells included in the line in which the second cell is located meets a first preset condition, merging the line in which the second cell is located with the line in which the first cell is located to obtain a target table; and finally, filling the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relation between each target text area and each cell in the initial table to obtain table data. In this embodiment of the present application, it may be determined, by an obtained display mode of text data in each target text region, whether column merging or row merging should be performed on cells in an initial table, and then, target cell groups containing first cells and second cells with different position relationships may be determined according to different display modes, and cells that need to be merged are determined according to a first preset condition or a second preset condition, so as to obtain a target table, where target text regions in two cells, which should originally belong to the same cell but correspond to different rows or columns, in the target table are merged. Therefore, the method avoids the situation that text data in the extracted table is disordered due to the fact that no explicit table line is filled in different cells in the text data in line or column changing, so that the purpose that a table with explicit table lines such as a full line table is not relied on to determine which cells in the table some target text areas are located in can be achieved, and further the purpose that the table structuring is carried out on the image containing the non-full line table is achieved, and the table data of the image containing the non-full line table is extracted.
Based on the above table information extraction method embodiment, the embodiment of the application provides a table information extraction device. Referring to fig. 16, a schematic structural diagram of a table information extraction apparatus provided in an embodiment of the present application may include an obtaining unit 1601, a determining unit 1602, a merging unit 1603, and a padding unit 1604. The table information extraction device shown in fig. 11 may operate as follows:
the acquiring unit 1601 is configured to acquire an initial table of a target image and a corresponding relationship between each target text region of the target image and each cell in the initial table, where each target text region is obtained by performing text detection processing on the target image;
the determining unit 1602 is configured to determine a target cell group in the initial table, where the cell group includes a first cell and a second cell, the first cell and the second cell are located in the same column, and the first cell is located at the mth cell in the initial table1Line, second cell m of the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions;
the merge unit 1603 for the mth2Target text corresponding to each cell in lineIf the first sum of the number of the areas meets the first preset condition, the mth table of the initial table is added2Row and m1Merging rows to obtain a target table and the mth table of the target table1The target text area corresponding to each cell in the row includes the mth of the initial table1Target text region corresponding to corresponding cell in line and m < th > cell2Target text regions corresponding to respective cells in the row;
the filling unit 1604 is configured to fill, based on the corresponding relationship between each target text region and each cell in the initial table, text data in the target text region corresponding to each cell in the target table into the target table to obtain table data.
In one embodiment, the merging unit 1603 is further configured to determine a first data length of the text data of the target text region corresponding to the first cell and a second data length of the text data of the target text region corresponding to the second cell; if m is2If the sum of the first number of the target text areas corresponding to each cell in the row meets a first preset condition and the first data length is greater than the second data length, the mth number of the initial table is added2Row and m1And merging the rows to obtain the target table.
In an embodiment, the merging unit 1603 is further configured to perform line detection processing on the target image to obtain at least one line; if m is2The first sum of the number of the target text areas corresponding to the cells in the line meets a first preset condition, and if no line exists in the target image and is positioned between the target text area corresponding to the first cell and the target text area corresponding to the second cell, the mth line of the initial table is added2Row and m1And merging the rows to obtain the target table.
In one embodiment, the obtaining unit 1601 is further configured to obtain position information of each target text region in the target image; and determining an initial table of the target image based on the position information of each target text region, wherein the position information comprises an abscissa and an ordinate, the initial table comprises m × n cells, m is determined according to the position information of the target text region with the minimum ordinate and the target text region with the maximum ordinate in the target image, n is determined according to the position information of the target text region with the minimum abscissa and the target text region with the maximum abscissa in the target image, and m and n are positive integers.
In one embodiment, the obtaining unit 1601 is further configured to arrange the target text regions based on the position information of the target text regions, so as to obtain arranged target text regions; the initial table is determined based on the arranged target text regions, m is determined based on the target text region of the minimum ordinate and the target text region of the maximum ordinate among the arranged target text regions, and n is determined based on the target text region of the minimum abscissa and the target text region of the maximum abscissa among the arranged target text regions.
In one embodiment, the obtaining unit 1601 is further configured to obtain location information of a region center point of each target text region in the target image; sequencing the target text regions based on the position information of the region center points of the target text regions to obtain sequenced target text regions, wherein the abscissa of the region center point of the xth target text region in the sequenced target text regions is less than or equal to the abscissa of the region center point of the (x + 1) th target text region, and x is a positive integer; acquiring position information of each region frame of a first target text region in the sequenced target text regions in the target image; arranging the sorted target text regions based on the vertical coordinates of the first region frame and the second region frame of the first target text region and the vertical coordinates of the region center points of the target text regions behind the first target text region to obtain the initially arranged target text regions; and rearranging the initially arranged target text regions based on the abscissa of the third region border and the fourth region border of the first target text region of the initially arranged target text regions and the abscissa of the region center points of other target text regions in the initially arranged target text regions to obtain the arranged target text regions.
In one embodiment, the obtaining unit 1601 is further configured to traverse each target text region in the sorted target text regions, determine a target text region in the sorted target text region, where a vertical coordinate of a center point of the region is greater than a vertical coordinate of a first region border of a first target text region and smaller than a vertical coordinate of a second region border of the first target text region, and the determined target text region and the first target text region are located in a same row in the initially-arranged target text region; if the vertical coordinate of the central point of the next target text region of the target text regions determined in the sorted target text regions is larger than the vertical coordinate of the first region frame of the first target text region and larger than the vertical coordinate of the second region frame of the first target text region, determining that the next target text region is positioned in the next line of the first target text region in the initially arranged target text regions; taking the next target text area as a first target text area, and triggering and determining the target text area of which the vertical coordinate of the central point of the area in the sequenced target text areas is larger than the vertical coordinate of the first area frame of the first target text area and smaller than the vertical coordinate of the second area frame of the first target text area so as to obtain the initially arranged target text area; and the target text regions in the same line in the initially arranged target text regions have the same sequence as the target text regions in the same line in the sequenced target text regions.
In one embodiment, the obtaining unit 1601 is further configured to determine, in the target text regions after the initial arrangement, target text regions located in a same column of the target text regions after the arrangement, where an abscissa of a center point of the region of the target text regions in the same column is larger than an abscissa of a border of a third region of the target text regions located in a first row of the target text regions in the same column and is smaller than an abscissa of a border of a fourth region of the target text regions located in the first row of the target text regions in the same column; determining each row of target text area based on the horizontal coordinate of the area center point of each row of target text areaN-th text region located in the arranged target text regioniA column for obtaining an arranged target text region; wherein if each target text region is located in the m-th of the initially arranged target text regionsiLine, then each target text region is located at m-th of the arranged target text regionsiLine, mi、niIs a positive integer, miM, n or lessiIs less than or equal to n.
In one embodiment, the obtaining unit 1601 is further configured to perform text detection processing on the target image to obtain at least one initial text region; acquiring the segmentation position of each initial text region; segmenting each initial text region based on the segmentation position to obtain a first initial text sub-region and a second initial text sub-region, wherein the abscissa of the first initial text sub-region in the target image is smaller than the abscissa of the second initial text sub-region in the target image; performing character recognition processing on the first initial text subregion, determining a first character in the first initial text subregion, performing character recognition processing on the second initial text subregion, and determining a second character in the second initial text subregion, wherein the first character is the last character of the text data of the first initial text subregion, and the second character is the first character of the text data of the second initial text subregion; and if the character type of the first character is not a preset type and the character type of the second character is not a preset type, determining that the first initial text subregion and the second initial text subregion are different target text regions.
In an embodiment, the determining unit 1602 is further configured to determine at least one cell group in the initial table, where each cell group includes a first cell and a second cell, the first cell and the second cell are located in the same column, and the first cell is located at the mth cell in the initial table1Line, second cell m of the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions; for each cell group, obtaining each cell groupThe target text area corresponding to the first cell and the distance of the target text area corresponding to the second cell in each cell group in the target image; and determining the cell group with the minimum distance as a target cell group.
In one embodiment, the merge unit 1603 is further configured to determine the mth2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1Merging rows to obtain an updated table; taking the updated table as an initial table, triggering and executing the determination of the target cell group in the initial table until the mth table2And if the sum of the first number of the target text areas corresponding to the cells in the row does not meet a first preset condition, determining an initial table to which the cells corresponding to the target text areas which do not meet the first preset condition belong as a target table, wherein the target table comprises p multiplied by q cells, p is less than or equal to m, and p and q are positive integers.
In one embodiment, the obtaining unit 1601 is further configured to obtain the mth1The sum of the second number of target text regions corresponding to each cell in the row; and if the second quantity sum is larger than the first quantity sum, determining that the first quantity sum meets a first preset condition.
According to an embodiment of the present application, the steps involved in the table information extraction methods shown in fig. 2, 7,10, 13, and 15 may be performed by units in the table information extraction apparatus shown in fig. 16.
According to another embodiment of the present application, the units in the table information extraction apparatus shown in fig. 16 may be respectively or entirely combined into one or several other units to form the table information extraction apparatus, or some of the units may be further split into multiple units with smaller functions to form the table information extraction apparatus, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the table information extraction apparatus based on logic function division may also include other units, and in practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, the table information extraction apparatus shown in fig. 16 may be configured by running a computer program (including program codes) capable of executing the steps involved in the respective methods shown in fig. 2, fig. 7, fig. 10, fig. 13, and fig. 16 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and the table information extraction method of the embodiment of the present application may be implemented. The computer program may be embodied on a computer-readable storage medium, for example, and loaded into and executed by the above-described computing apparatus via the computer-readable storage medium.
In the embodiment of the application, an initial table of a target image and a corresponding relation between each target text area of the target image and each cell in the initial table are obtained, and then a target cell group is determined in the initial table, wherein the target cell group comprises a first cell and a second cell, the first cell and the second cell are located in the same column, and the first cell is located at the mth cell in the initial table1Line, second cell m of the initial table2A row; then m is judged2Whether the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition or not is judged, if yes, the mth number of the initial table is judged2Row and m1Merging rows to obtain a target table; and finally, filling the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relation between each target text area and each cell in the initial table to obtain table data. In the embodiment of the application, the correspondence between each target text area of the target image and each cell in the initial table can be determined through the obtained initial table and the correspondence between each target text area of the target image and each cell in the initial tableAn initial position in the start table; then, the target cell group in the initial table is determined, and whether the first sum of the quantities meets a first preset condition is judged, so that the cells which should be merged in the initial table are determined, and the target table is obtained, wherein the target text areas in the two cells which should belong to the same cell but correspond to different rows in the target table are merged. Therefore, the method avoids the situation that text data in the extracted table is disordered due to the fact that no explicit table line is filled in different cells of the text data in the line feed, and can determine the cells of certain target text areas in the table without depending on the table with the explicit table line such as a full line table, further realize the table structuring of the image containing the non-full line table, and extract the table data of the image containing the non-full line table.
Based on the method embodiment and the device embodiment, the application further provides the electronic equipment. Fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device shown in fig. 17 may include at least a processor 1701, an input interface 1702, an output interface 1703, and a computer storage medium 1704. The processor 1701, the input interface 1702, the output interface 1703, and the computer storage medium 1704 may be connected by a bus or other means.
A computer storage medium 1704 may be stored in the memory of the electronic device, the computer storage medium 1704 being configured to store a computer program comprising program instructions, the processor 1701 being configured to execute the program instructions stored by the computer storage medium 1704. The processor 1701 (or CPU) is a computing core and a control core of the electronic device, and is adapted to implement one or more instructions, and specifically, adapted to load and execute the one or more instructions so as to implement the table information extraction method flow or the corresponding function.
An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by the processor 1701. The computer storage medium may be a Random Access Memory (RAM) memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by the processor 1701 to implement the corresponding steps of the methods described above with respect to the table information extraction method embodiments of fig. 2, 7, and 10, and in particular, one or more instructions stored in a computer storage medium may be loaded and executed by the processor 1701 to implement the following steps:
the processor 1701 obtains an initial table of a target image and a corresponding relation between each target text area of the target image and each cell in the initial table, wherein each target text area is obtained by performing text detection processing on the target image;
the processor 1701 determines a target cell group in the initial table, the target cell group including a first cell and a second cell, the first cell and the second cell being located in the same column, the first cell being located at the mth cell in the initial table1Line, second cell m of the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions;
if the processor 1701 judges the m-th2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Line and firstm1Merging rows to obtain a target table and the mth table of the target table1The target text area corresponding to each cell in the row includes the mth of the initial table1Target text region corresponding to corresponding cell in line and m < th > cell2Target text regions corresponding to respective cells in the row;
the processor 1701 fills text data in the target text area corresponding to each cell in the target table into the target table based on the correspondence relationship between each target text area and each cell in the initial table, to obtain table data.
In one embodiment, the processor 1701 determines the m-th bit2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1And merging rows to obtain a target table, comprising: determining a first data length of the text data of the target text region corresponding to the first cell and a second data length of the text data of the target text region corresponding to the second cell; if m is judged2If the sum of the first number of the target text areas corresponding to each cell in the row meets a first preset condition and the first data length is greater than the second data length, the mth number of the initial table is added2Row and m1And merging the rows to obtain the target table.
In one embodiment, the processor 1701 determines the m-th bit2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1And merging rows to obtain a target table, comprising: performing line detection processing on the target image to obtain at least one line; if m is2The first sum of the number of the target text areas corresponding to the cells in the line meets a first preset condition, and if no line exists in the target image and is positioned between the target text area corresponding to the first cell and the target text area corresponding to the second cell, the mth line of the initial table is added2Row and m1And merging the rows to obtain the target table.
In one embodiment, the processor 1701 obtains an initial table of target images, including: acquiring position information of each target text area in a target image; and determining an initial table of the target image based on the position information of each target text region, wherein the position information comprises an abscissa and an ordinate, the initial table comprises m × n cells, m is determined according to the position information of the target text region with the minimum ordinate and the target text region with the maximum ordinate in the target image, n is determined according to the position information of the target text region with the minimum abscissa and the target text region with the maximum abscissa in the target image, and m and n are positive integers.
In one embodiment, the processor 1701 determines an initial table for the target image based on the location information for each target text region, including: arranging each target text area based on the position information of each target text area to obtain the arranged target text areas; the initial table is determined based on the arranged target text regions, m is determined based on the target text region of the minimum ordinate and the target text region of the maximum ordinate among the arranged target text regions, and n is determined based on the target text region of the minimum abscissa and the target text region of the maximum abscissa among the arranged target text regions.
In one embodiment, the processor 1701 aligns the respective target text regions based on the position information of the respective target text regions to obtain aligned target text regions, including: acquiring the position information of the area center point of each target text area in the target image; sequencing the target text regions based on the position information of the region center points of the target text regions to obtain sequenced target text regions, wherein the abscissa of the region center point of the xth target text region in the sequenced target text regions is less than or equal to the abscissa of the region center point of the (x + 1) th target text region, and x is a positive integer; acquiring position information of each region frame of a first target text region in the sequenced target text regions in the target image; arranging the target text areas in the sequenced target text areas based on the vertical coordinates of the first area border and the second area border of the first target text area and the vertical coordinates of the area center points of all the target text areas behind the first target text area to obtain the initially arranged target text areas; and rearranging the initially arranged target text regions based on the abscissa of the third region border and the fourth region border of the first target text region of the initially arranged target text regions and the abscissa of the region center points of other target text regions in the initially arranged target text regions to obtain the arranged target text regions.
In one embodiment, the processor 1701 aligns the target text regions in the sorted target text regions based on the vertical coordinates of the first region border and the second region border of the first target text region and the vertical coordinate of the region center point of each target text region after the first target text region to obtain the initially aligned target text regions, including: traversing each target text region in the sorted target text regions, determining the target text region of which the vertical coordinate of the center point of the region in the sorted target text region is larger than the vertical coordinate of the first region frame of the first target text region and smaller than the vertical coordinate of the second region frame of the first target text region, and the determined target text region and the first target text region are positioned in the same line in the initially arranged target text region; if the vertical coordinate of the central point of the next target text region of the target text regions determined in the sorted target text regions is larger than the vertical coordinate of the first region frame of the first target text region and larger than the vertical coordinate of the second region frame of the first target text region, determining that the next target text region is positioned in the next line of the first target text region in the initially arranged target text regions; taking the next target text area as a first target text area, and triggering and determining the target text area of which the vertical coordinate of the central point of the area in the sequenced target text areas is larger than the vertical coordinate of the first area frame of the first target text area and smaller than the vertical coordinate of the second area frame of the first target text area so as to obtain the initially arranged target text area; and the target text regions in the same line in the initially arranged target text regions have the same sequence as the target text regions in the same line in the sequenced target text regions.
In one embodiment, the processor 1701 rearranges the initially arranged target text regions based on abscissa of a third region border and a fourth region border of a first one of the initially arranged target text regions and abscissa of a region center point of other target text regions in the initially arranged target text regions to obtain arranged target text regions, including: determining target text areas which are positioned in the same column in the target text areas after the initial arrangement, wherein the abscissa of the center point of the area of the target text areas in the same column is greater than the abscissa of the border of a third area of the target text area positioned in the first line in the target text areas in the same column and is less than the abscissa of the border of a fourth area of the target text area positioned in the first line in the target text areas in the same column; determining the nth row of target text areas in the arranged target text areas based on the horizontal coordinates of the area center points of the target text areasiA column for obtaining an arranged target text region; wherein if each target text region is located in the m-th of the initially arranged target text regionsiLine, then each target text region is located at m-th of the arranged target text regionsiLine, mi、niIs a positive integer, miM, n or lessiIs less than or equal to n.
In one embodiment, the processor 1701 is further configured to perform a text detection process on the target image to obtain at least one initial text region; acquiring the segmentation position of each initial text region; segmenting each initial text region based on the segmentation position to obtain a first initial text sub-region and a second initial text sub-region, wherein the abscissa of the first initial text sub-region in the target image is smaller than the abscissa of the second initial text sub-region in the target image; performing character recognition processing on the first initial text subregion, determining a first character in the first initial text subregion, performing character recognition processing on the second initial text subregion, and determining a second character in the second initial text subregion, wherein the first character is the last character of the text data of the first initial text subregion, and the second character is the first character of the text data of the second initial text subregion; and if the character type of the first character is not a preset type and the character type of the second character is not a preset type, determining that the first initial text subarea and the second initial text subarea are different target text areas.
In one embodiment, the processor 1701 determines a target set of cells in an initial table, including: determining at least one cell group in the initial table, wherein each cell group comprises a first cell and a second cell, the first cell and the second cell are positioned in the same column, and the first cell is positioned at the m-th cell in the initial table1Line, second cell m of the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions; acquiring a target text region corresponding to a first cell in each cell group, and acquiring the distance between the target text region corresponding to a second cell in each cell group and the target text region in the target image; and determining the cell group with the minimum distance as a target cell group.
In one embodiment, the processor 1701 determines the m-th bit2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1And merging rows to obtain a target table, comprising: if m is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1Merging rows to obtain an updated table; taking the updated table as an initial table, triggering and executing the determination of the target cell group in the initial table until the mth table2The sum of the first number of the target text areas corresponding to the cells in the line does not meet the first preset conditionAnd determining the initial table to which the cells corresponding to the target text area which does not meet the first preset condition belong as a target table, wherein the target table comprises p × q cells, p is less than or equal to m, and p and q are positive integers.
In one embodiment, the processor 1701 is also used to obtain the mth1The sum of the second number of target text regions corresponding to each cell in the row; and if the second quantity sum is larger than the first quantity sum, determining that the first quantity sum meets a first preset condition.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device executes the method embodiments shown in fig. 2, fig. 7 and fig. 10. The computer-readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. A form information extraction method is characterized by comprising the following steps:
acquiring an initial table of a target image and a corresponding relation between each target text area of the target image and each cell in the initial table, wherein each target text area is obtained by performing text detection processing on the target image;
determining a target cell group in the initial table, the target cell group comprising a first cell and a second cell, the first cellThe cells and the second cells are in the same column, and the first cell is in the m-th cell in the initial table1Line, the second cell being located m-th in the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions;
if said m is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1Merging rows to obtain a target table, wherein the m-th table of the target table1The target text area corresponding to each cell in the row includes the m-th cell of the initial table1Target text region corresponding to corresponding cell in line and m < th > cell2Target text regions corresponding to respective cells in the row;
and filling text data in the target text areas corresponding to the cells in the target table into the target table based on the corresponding relation between the target text areas and the cells in the initial table to obtain table data.
2. The method of claim 1, wherein said m is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1And merging rows to obtain a target table, comprising:
determining a first data length of the text data of the target text region corresponding to the first cell and a second data length of the text data of the target text region corresponding to the second cell;
if said m is2If the sum of the first number of the target text areas corresponding to each cell in the row meets a first preset condition and the first data length is greater than the second data length, the mth number of the initial table is added2Row and m1Merging the lines to obtain the targetAnd (4) a table.
3. The method of claim 1, wherein said m is2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1And merging rows to obtain a target table, comprising:
performing line detection processing on the target image to obtain at least one line;
if said m is2If the sum of the first number of the target text areas corresponding to the cells in the row meets a first preset condition and any line does not exist in the target image and is located between the target text area corresponding to the first cell and the target text area corresponding to the second cell, the mth line of the initial table is added to the target text area corresponding to the second cell2Row and m1And merging the rows to obtain the target table.
4. The method of claim 1, wherein obtaining an initial table of target images comprises:
acquiring position information of each target text area in the target image;
and determining an initial table of the target image based on the position information of each target text region, wherein the position information comprises an abscissa and an ordinate, the initial table comprises m × n cells, m is determined according to the position information of the target text region with the minimum ordinate and the target text region with the maximum ordinate in the target image, n is determined according to the position information of the target text region with the minimum abscissa and the target text region with the maximum abscissa in the target image, and m and n are positive integers.
5. The method of claim 4, wherein determining the initial table of the target image based on the location information of the respective target text regions comprises:
arranging the target text regions based on the position information of the target text regions to obtain the arranged target text regions;
and determining the initial table based on the arranged target text regions, wherein m is determined according to the target text region with the minimum vertical coordinate and the target text region with the maximum vertical coordinate in the arranged target text regions, and n is determined according to the target text region with the minimum horizontal coordinate and the target text region with the maximum horizontal coordinate in the arranged target text regions.
6. The method according to claim 5, wherein the arranging the respective target text regions based on the position information of the respective target text regions to obtain arranged target text regions comprises:
acquiring the position information of the area center point of each target text area in the target image;
sequencing the target text regions based on the position information of the region center points of the target text regions to obtain sequenced target text regions, wherein the abscissa of the region center point of the xth target text region in the sequenced target text regions is less than or equal to the abscissa of the region center point of the (x + 1) th target text region, and x is a positive integer;
acquiring position information of each region frame of a first target text region in the sequenced target text regions in the target image;
arranging the sorted target text regions based on the vertical coordinates of the first region frame and the second region frame of the first target text region and the vertical coordinates of the region center points of the target text regions behind the first target text region to obtain the initially arranged target text regions;
and rearranging the initially arranged target text regions based on the abscissa of the third region border and the fourth region border of the first target text region of the initially arranged target text regions and the abscissa of the region center points of other target text regions in the initially arranged target text regions to obtain the arranged target text regions.
7. The method according to claim 6, wherein the ranking the ranked target text regions based on vertical coordinates of a first region border and a second region border of the first target text region and vertical coordinates of a region center point of each target text region after the first target text region to obtain an initially-ranked target text region comprises:
traversing each target text region in the sorted target text regions, and determining the target text region of which the vertical coordinate of the center point of the region in the sorted target text region is larger than the vertical coordinate of the first region border of the first target text region and smaller than the vertical coordinate of the second region border of the first target text region, wherein the determined target text region and the first target text region are located in the same line in the initially arranged target text regions;
if the vertical coordinate of the central point of the next target text region of the determined target text regions in the sorted target text regions is larger than the vertical coordinate of the first region frame of the first target text region and larger than the vertical coordinate of the second region frame of the first target text region, determining that the next target text region is positioned on the next line of the first target text region in the initially arranged target text regions;
taking the next target text region as the first target text region, and triggering and executing the target text region which determines that the vertical coordinate of the center point of the regions in the sequenced target text regions is larger than the vertical coordinate of the first region frame of the first target text region and smaller than the vertical coordinate of the second region frame of the first target text region, so as to obtain the initially arranged target text region;
and the target text regions in the same line in the initially arranged target text regions are ranked in the same order as the target text regions in the same line in the ranked target text regions.
8. The method according to claim 6, wherein the rearranging the initially arranged target text regions based on abscissa of a third region border and a fourth region border of a first one of the initially arranged target text regions and abscissa of a region center point of other target text regions in the initially arranged target text regions to obtain the arranged target text regions comprises:
determining target text regions in the same column in the target text regions after the initial arrangement, wherein the abscissa of the center point of the region of the target text regions in the same column is larger than the abscissa of the border of a third region of the target text regions in the first row in the target text regions in the same column and smaller than the abscissa of the border of a fourth region of the target text regions in the first row in the target text regions in the same column;
determining the nth target text area in the arranged target text areas based on the horizontal coordinate of the area center point of each row of target text areasiA column for obtaining the arranged target text region;
wherein, if each target text region is located in the m-th target text region after the initial arrangementiAnd if the line is in the line, the target text areas are positioned in the m-th of the arranged target text areasiLine, mi、niIs a positive integer, miM, n or lessiIs less than or equal to n.
9. The method of claim 1, further comprising:
performing text detection processing on the target image to obtain at least one initial text region;
acquiring the segmentation position of each initial text region;
segmenting each initial text region based on the segmentation position to obtain a first initial text sub-region and a second initial text sub-region, wherein the abscissa of the first initial text sub-region in the target image is smaller than the abscissa of the second initial text sub-region in the target image;
performing character recognition processing on the first initial text sub-region, determining a first character in the first initial text sub-region, and performing character recognition processing on the second initial text sub-region, determining a second character in the second initial text sub-region, wherein the first character is a last character of text data of the first initial text sub-region, and the second character is a first character of text data of the second initial text sub-region;
and if the character type of the first character is not a preset type and the character type of the second character is not a preset type, determining that the first initial text subregion and the second initial text subregion are different target text regions.
10. The method of claim 1, wherein determining a target set of cells in the initial table comprises:
determining at least one cell group in the initial table, each cell group comprising a first cell and a second cell;
acquiring a target text region corresponding to a first cell in each cell group, and acquiring a distance between the target text region corresponding to a second cell in each cell group and a target image;
and determining the cell group with the minimum distance as the target cell group.
11. The method according to any one of claims 1 to 10, wherein said m is2Target text regions corresponding to respective cells in a lineMeets a first preset condition, the mth number of the initial table is added2Row and m1And merging rows to obtain a target table, comprising:
if said m is2If the sum of the first number of the target text areas corresponding to each cell in the row meets a first preset condition, the mth number of the initial table is added to the original table2Row and m1Merging rows to obtain an updated table;
taking the updated table as the initial table, and triggering and executing the determination of the target cell group in the initial table until the mth table2If the sum of the first number of the target text areas corresponding to the cells in the row does not meet a first preset condition, determining an initial table to which the cells corresponding to the target text areas which do not meet the first preset condition belong as the target table, wherein the target table comprises p × q cells, p is less than or equal to m, and p and q are positive integers.
12. The method according to any one of claims 1-10, further comprising:
obtaining the m < th > of1The sum of the second number of target text regions corresponding to each cell in the row;
and if the second quantity sum is larger than the first quantity sum, determining that the first quantity sum meets a first preset condition.
13. A form information extraction apparatus characterized in that the image-based form extraction apparatus includes an acquisition unit, a determination unit, a merging unit, and a filling unit, wherein:
the acquiring unit is configured to acquire an initial table of a target image and a corresponding relationship between each target text region of the target image and each cell in the initial table, where each target text region is obtained by performing text detection processing on the target image;
the determining unit is used for determining the target cell group in the initial tableThe cell group comprises a first cell and a second cell, the first cell and the second cell are positioned in the same column, and the first cell is positioned in the mth cell in the initial table1Line, the second cell being located m-th in the initial table2Line, m1、m2Are all positive integers, m1Less than m2The first unit cell and the second unit cell have corresponding target text regions;
the merging unit is used for if the mth2If the first sum of the number of the target text areas corresponding to each cell in the line meets a first preset condition, the mth number of the initial table is added2Row and m1Merging rows to obtain a target table, wherein the mth row in the target table is1The target text area corresponding to each cell in the row includes the m-th cell of the initial table1Target text region corresponding to corresponding cell in line and m < th > cell2Target text regions corresponding to respective cells in the row;
the filling unit is configured to fill the text data in the target text area corresponding to each cell in the target table into the target table based on the corresponding relationship between each target text area and each cell in the initial table, so as to obtain table data.
14. A computer storage medium having computer program instructions stored therein, which when executed by a processor, are configured to perform the table information extraction method of any one of claims 1-12.
CN202210126160.5A 2022-02-10 2022-02-10 Table information extraction method and device and storage medium Pending CN114463765A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210126160.5A CN114463765A (en) 2022-02-10 2022-02-10 Table information extraction method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210126160.5A CN114463765A (en) 2022-02-10 2022-02-10 Table information extraction method and device and storage medium

Publications (1)

Publication Number Publication Date
CN114463765A true CN114463765A (en) 2022-05-10

Family

ID=81412833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210126160.5A Pending CN114463765A (en) 2022-02-10 2022-02-10 Table information extraction method and device and storage medium

Country Status (1)

Country Link
CN (1) CN114463765A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943978A (en) * 2022-05-13 2022-08-26 上海弘玑信息技术有限公司 Table reconstruction method and electronic equipment
CN116127928A (en) * 2023-04-17 2023-05-16 广东粤港澳大湾区国家纳米科技创新研究院 Table data identification method and device, storage medium and computer equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943978A (en) * 2022-05-13 2022-08-26 上海弘玑信息技术有限公司 Table reconstruction method and electronic equipment
CN114943978B (en) * 2022-05-13 2023-10-03 上海弘玑信息技术有限公司 Table reconstruction method and electronic equipment
WO2023216745A1 (en) * 2022-05-13 2023-11-16 上海弘玑信息技术有限公司 Table reconstruction method and electronic device
CN116127928A (en) * 2023-04-17 2023-05-16 广东粤港澳大湾区国家纳米科技创新研究院 Table data identification method and device, storage medium and computer equipment
CN116127928B (en) * 2023-04-17 2023-07-07 广东粤港澳大湾区国家纳米科技创新研究院 Table data identification method and device, storage medium and computer equipment

Similar Documents

Publication Publication Date Title
CN111723807B (en) End-to-end deep learning recognition machine for typing characters and handwriting characters
CN111325110B (en) OCR-based table format recovery method, device and storage medium
CN114463765A (en) Table information extraction method and device and storage medium
US20210295114A1 (en) Method and apparatus for extracting structured data from image, and device
US7650035B2 (en) Optical character recognition based on shape clustering and multiple optical character recognition processes
US7646921B2 (en) High resolution replication of document based on shape clustering
CN113785305B (en) Method, device and equipment for detecting inclined characters
US11783610B2 (en) Document structure identification using post-processing error correction
CN108280051B (en) Detection method, device and the equipment of error character in a kind of text data
CN107886082B (en) Method and device for detecting mathematical formulas in images, computer equipment and storage medium
JP7026165B2 (en) Text recognition method and text recognition device, electronic equipment, storage medium
CN112949476B (en) Text relation detection method, device and storage medium based on graph convolution neural network
CN111008559B (en) Typesetting method, typesetting system and typesetting computer equipment for face sheet recognition result
CN110490190A (en) A kind of structured image character recognition method and system
US9323726B1 (en) Optimizing a glyph-based file
CN111400497A (en) Text recognition method and device, storage medium and electronic equipment
KR102122561B1 (en) Method for recognizing characters on document images
CN115618836B (en) Wireless table structure restoration method and device, computer equipment and storage medium
CN114730241A (en) Gesture stroke recognition in touch user interface input
EP2074558A1 (en) Shape clustering in post optical character recognition processing
Edan Cuneiform symbols recognition based on k-means and neural network
EP3832544A1 (en) Visually-aware encodings for characters
CN113486283B (en) Page merging method, device, server, medium and product
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN112560849B (en) Neural network algorithm-based grammar segmentation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination