WO2023216745A1 - Procédé de reconstruction de table et dispositif électronique - Google Patents

Procédé de reconstruction de table et dispositif électronique Download PDF

Info

Publication number
WO2023216745A1
WO2023216745A1 PCT/CN2023/084482 CN2023084482W WO2023216745A1 WO 2023216745 A1 WO2023216745 A1 WO 2023216745A1 CN 2023084482 W CN2023084482 W CN 2023084482W WO 2023216745 A1 WO2023216745 A1 WO 2023216745A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
area
coordinates
abscissa
cell
Prior art date
Application number
PCT/CN2023/084482
Other languages
English (en)
Chinese (zh)
Inventor
王伟印
张晓程
Original Assignee
上海弘玑信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海弘玑信息技术有限公司 filed Critical 上海弘玑信息技术有限公司
Publication of WO2023216745A1 publication Critical patent/WO2023216745A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Definitions

  • the present application relates to the field of data processing technology, specifically, to a table reconstruction method and electronic equipment.
  • the purpose of the embodiments of the present application is to provide a table reconstruction method and electronic device, so as to reduce the deviation of the reconstructed table when reconstructing the table in the table image.
  • a method for table reconstruction including:
  • the text area recognition result contains the regional text content and regional location information of the text area
  • the regional location information determine the row coordinates of each table and the coordinates of each table column of the target table
  • the regional location information add the regional text content to the blank table to obtain the target table.
  • the table is reconstructed through the regional text content and regional position information of each text area identified in the image to be processed.
  • the framed table in the image to be processed can also be reconstructed without a frame in the image to be processed. table, improving the accuracy of table reconstruction.
  • the area position information includes the area vertex coordinates of the text area.
  • Performing text recognition on the image to be processed and obtaining the text area recognition result of the image to be processed includes: performing text detection on the image to be processed and obtaining multiple areas of the text area.
  • Vertex coordinates, the area vertex coordinates are the coordinates of the vertices of the text area; perform text recognition on the text area to obtain the area text content.
  • text recognition is performed on the graph to be processed, and the vertex coordinates of each area in the text area and the text content of the area are determined, so that the location and content of each text area can be accurately identified.
  • the regional vertex coordinates include the regional vertex abscissa and the regional vertex ordinate.
  • determining the table row coordinates and the table column coordinates of the target table includes: determining the maximum ordinate in each region vertex. The ordinate and the minimum ordinate; determine the maximum abscissa and the minimum abscissa in the abscissa of each area vertex;
  • the first area number is the number of text areas containing a certain ordinate
  • the second area number is the number of text areas containing a certain horizontal coordinate.
  • the number of text areas for coordinates determine the coordinates of each table row based on the maximum ordinate, the minimum ordinate, and the number of first areas; determine the coordinates of each table column based on the maximum abscissa, the minimum abscissa, and the number of second areas.
  • the row coordinates and column coordinates of each table are determined through the location of each text area, and the rows and columns of the frameless table can be identified, which improves the accuracy of table reconstruction.
  • determining the coordinates of each table row based on the maximum ordinate, the minimum ordinate, and the number of first areas includes: determining the trough ordinate and the trough ordinate based on each ordinate and its corresponding number of first areas.
  • the number of first areas is not higher than the number of first areas in the adjacent ordinates of the trough ordinate, and the adjacent ordinates of the trough ordinate are the previous ordinate and the next ordinate of the trough ordinate; according to the maximum ordinate,
  • the minimum vertical coordinate, and the trough vertical coordinate are used to obtain the coordinates of each table row.
  • the table row coordinates are determined based on the number of first areas of the text areas traversed by the horizontal lines where the vertical coordinates are located.
  • determining the coordinates of each table column according to the maximum abscissa, the minimum abscissa, and the number of second areas includes: determining the trough abscissa and the trough abscissa according to each abscissa and its corresponding number of second areas.
  • the number of second areas is not higher than the number of second areas of the adjacent abscissa of the wave trough abscissa, and the adjacent abscissas of the wave trough abscissa are the previous abscissa and the next abscissa of the trough abscissa; according to the maximum abscissa, The minimum abscissa and the trough abscissa are used to obtain the coordinates of each table column.
  • the table column coordinates are determined based on the number of second areas of the text areas where the vertical lines where the abscissa coordinates are located pass through. .
  • the method further includes: determining the cell position information of the cells in the blank table based on each table row coordinate and each table column coordinate; According to the area position information and cell position information, determine the target cell covered by the text area; among them, there are coordinate points in the text area that are located in the target cell and do not coincide with the boundary of the target cell; if it is determined that the text area covers If there are multiple target cells, merge the target cells.
  • multiple target cells covered by the text area are merged according to the area where the cell is located and the area where the text area is located, thereby solving the problem of how to set merged cells that exist in the table.
  • adding the regional text content to the blank table according to the regional location information to obtain the target table includes: performing the following steps for each cell in the blank table: if based on the regional location information, determine a If the cell only contains one text area, then the area text content in the text area contained in a cell is added to a cell; if it is determined based on the area position information that a cell contains at least two text areas, then the area text content is added according to the area position information. Position information, sort the regional text content in each text area contained in a cell, and add the sorted regional text content to a cell.
  • the regional text content of multiple text areas in the same cell is first sorted, and then the sorted regional text content is added to the cell, thus solving how to add multiple text areas to the same cell. Issues with area text content of text areas.
  • determining that a cell contains only one text area based on the area location information includes: determining the center point coordinates of the text area based on the area location information, and the center point coordinates are the coordinates of the center point of the text area; if According to the cell position information of a cell, it is determined that the center point coordinate of only one text area is located in the cell, then it is determined that a cell contains only one text area.
  • the text area contained in the cell can be quickly determined based on the coordinates of the center point of the text area and the area where the cell is located.
  • adding the regional text content in the text area contained in a cell to a cell includes: setting the attribute value of the text attribute of a cell to the text area contained in the cell.
  • the regional text content includes: setting the attribute value of the text attribute of a cell to the text area contained in the cell.
  • determining that a cell contains at least two text areas according to the area position information includes: if it is determined that the number of text areas is multiple, determining each text area separately according to the area position information of each text area. The center point coordinate of At least two text areas.
  • multiple text areas contained in a cell can be quickly determined based on the coordinates of the center point of the text area and the area where the cell is located.
  • the center point coordinates include the center point abscissa and the center point ordinate. Sorting the regional text content in each text area contained in a cell includes: in descending order of the center point ordinate. , sort the regional text content in each text area contained in a cell; sort the regional text content in each text area with the same central point ordinate in ascending order of the center point abscissa.
  • each text area in the same cell is sorted according to the abscissa coordinate and the ordinate coordinate of the center point of each text area.
  • adding the sorted regional text content to a cell includes: setting the attribute value of a text attribute of a cell to the sorted regional text content.
  • a table reconstruction device including:
  • the recognition unit is used to perform text recognition on the image to be processed and obtain the text area recognition result of the image to be processed.
  • the text area recognition result contains the regional text content and regional position information of the text area;
  • the determination unit is used to determine based on the regional position information.
  • the generating unit is used to generate a blank table based on the row coordinates and column coordinates of each table;
  • the obtaining unit is used to add the regional text content to the blank based on the regional location information form to obtain the target form.
  • the regional position information includes the regional vertex coordinates of the text area
  • the recognition unit is used to: perform text detection on the image to be processed, and obtain multiple regional vertex coordinates of the text area, where the regional vertex coordinates are the coordinates of the vertices of the text area; Perform text recognition on the text area to obtain the text content of the area.
  • the regional vertex coordinates include the regional vertex abscissa and the regional vertex ordinate
  • the determination unit is used to: determine the maximum ordinate and the minimum ordinate in the ordinate of each regional vertex; determine the maximum ordinate in the abscissa of each regional vertex.
  • the number of the first area is the number of text areas containing a certain ordinate
  • the second The number of areas is the number of text areas containing a certain abscissa; determine the coordinates of each table row based on the maximum ordinate, the minimum ordinate, and the number of first areas; determine the coordinates of each table row based on the maximum abscissa, the minimum abscissa, and the number of second areas , determine the coordinates of each table column.
  • the determining unit is configured to: determine the trough ordinate based on each ordinate and its corresponding number of first areas, and the number of the first areas of the trough ordinate is not higher than the number of adjacent ordinates of the trough ordinate.
  • the number of areas, The adjacent vertical coordinates of the trough vertical coordinate are the previous ordinate and the next ordinate of the trough vertical coordinate; according to the maximum ordinate, the minimum ordinate, and the trough vertical coordinate, the coordinates of each table row are obtained.
  • the determination unit is configured to: determine the trough abscissa according to each abscissa and its corresponding number of second areas, and the number of the second areas of the trough abscissa is not higher than the number of adjacent abscissas of the trough abscissa. Second, the number of areas.
  • the adjacent abscissas of the trough abscissa are the previous abscissa and the next abscissa of the trough. According to the maximum abscissa, the minimum abscissa, and the trough abscissa, the coordinates of each table column are obtained.
  • the generation unit is also used to: determine the cell position information of the cells in the blank table based on each table row coordinate and each table column coordinate; determine the text area coverage based on the area position information and the cell position information.
  • the target cell among them, there are coordinate points in the text area that are located in the target cell and do not coincide with the boundary of the target cell; if it is determined that there are multiple target cells covered by the text area, the target cells will be merged.
  • the unit is obtained by: performing the following steps for each cell in the blank table: if it is determined that a cell contains only one text area based on the area position information, then the text contained in a cell is The regional text content in the area is added to a cell; if it is determined that a cell contains at least two text areas according to the area position information, the area text in each text area contained in a cell is added according to the area position information. Sort the content and add the sorted range text content to a cell.
  • the obtaining unit is used to: determine the center point coordinates of the text area based on the area position information, and the center point coordinates are the coordinates of the center point of the text area; if based on the cell position information of a cell, determine only If the center point coordinates of a text area are located within the cell, it is determined that a cell contains only one text area.
  • the obtaining unit is used to: set the attribute value of the text attribute of a cell to the regional text content in the text area contained in the cell.
  • the obtaining unit is configured to: if it is determined that the number of text areas is multiple, determine the center point coordinates of each text area according to the area position information of each text area, and the center point coordinates are the center of the text area. The coordinates of the point; if based on the cell position information of a cell, it is determined that the center point coordinates of at least two text areas are located in the cell, then it is determined that a cell contains at least two text areas.
  • the center point coordinates include the center point abscissa and the center point ordinate
  • the obtaining unit is used to: according to the descending order of the center point ordinate, the regional text in each text area contained in a cell is Sort the content; sort the regional text content in each text area with the same central point ordinate in ascending order of the center point's abscissa.
  • the obtaining unit is used to: set the attribute value of the text attribute of a cell to the sorted regional text content.
  • an electronic device including a processor and a memory.
  • the memory stores computer-readable instructions.
  • various optional table reconstruction methods such as those mentioned above are executed. Implement the steps for the method provided in How.
  • a computer-readable storage medium on which a computer program is stored.
  • the steps of the method provided in any of the above optional implementations of table reconstruction are executed.
  • a computer program product is provided.
  • the computer program product When the computer program product is run on a computer, it causes the computer to perform the steps of the method provided in any of the above optional implementations of table reconstruction.
  • Figure 1 is a flow chart of a table reconstruction method provided by an embodiment of the present application.
  • Figure 2 is an example diagram of a user attribute table image provided by an embodiment of the present application.
  • Figure 3 is a specific flow chart of a method for reconstructing a user attribute table provided by an embodiment of the present application
  • Figure 4 is an example diagram 1 of a first curve provided by the embodiment of the present application.
  • Figure 5 is an example diagram 1 of a second curve provided by the embodiment of the present application.
  • Figure 6 is an example diagram of merging table images provided by the embodiment of the present application.
  • Figure 7 is a specific flow chart of a method for merging table reconstruction provided by an embodiment of the present application.
  • Figure 8 is an example of Figure 2 of a first curve provided by the embodiment of the present application.
  • Figure 9 is an example of Figure 2 of a second curve provided by the embodiment of the present application.
  • Figure 10 is a structural block diagram of a table reconstruction device provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of an electronic device in an embodiment of the present application.
  • Terminal device It can be a mobile terminal, a fixed terminal or a portable terminal, such as a mobile phone, a site, a unit, a device, a multimedia computer, a multimedia tablet, an Internet node, a communicator, a desktop computer, a laptop computer, a notebook computer, a netbook computer, Tablet computers, personal communication system devices, personal navigation devices, personal digital assistants, audio/video players, digital cameras/camcorders, positioning devices, television receivers, radio broadcast receivers, e-book devices, gaming devices, or any combination thereof, Includes accessories and peripherals for these devices or any combination thereof. It is also foreseeable that the terminal device can support any type of user-oriented interface (such as wearable devices), etc.
  • Server It can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for middleware services, domain name services, security services, and basic cloud computing services such as big data and artificial intelligence platforms.
  • OCR Optical Character Recognition
  • Embodiments of the present application provide a table reconstruction method and electronic device.
  • the execution subject is an electronic device.
  • the electronic device may be a server or a terminal device.
  • FIG. 1 is a flow chart of a table reconstruction method provided by an embodiment of the present application.
  • the specific implementation process of the method is as follows:
  • Step 100 Perform text recognition on the image to be processed, and obtain the text area recognition result of the image to be processed.
  • step 100 when performing step 100, the following steps can be adopted:
  • S1001 Perform text detection on the image to be processed and obtain multiple area vertex coordinates of the text area.
  • text detection is performed on the image to be processed to obtain the regional vertex coordinates of one or more rectangular text areas.
  • the coordinates of the area vertices are the coordinates of the vertices of the text area.
  • the coordinates of the regional vertex include the horizontal coordinate of the regional vertex and the vertical coordinate of the regional vertex. Since the text area is a rectangle, the number of area vertex coordinates of each text area is 4.
  • the text area can also be in other shapes, such as a quadrilateral, which is not limited here.
  • S1002 Perform text recognition on the text area and obtain the text content of the area.
  • OCR technology is used to perform text recognition on the text area to obtain regional text content in the text area.
  • the regional text content is the information recognized in the text area.
  • Regional text content can include at least one of the following: text, formulas, and dates.
  • the regional text content can also be other types of information, which is not limited here.
  • S1003 Based on the regional vertex coordinates and regional text content of the text area, obtain the text area recognition result of the image to be processed.
  • the text area recognition result includes the area text content and area location information of the text area.
  • the area position information includes the area vertex coordinates of the text area.
  • the region vertex coordinates are used as the region position information in the text area recognition result.
  • Step 101 Determine the coordinates of each table row and each table column of the target table based on the area position information in the text area recognition result.
  • step 101 when performing step 101, the following steps can be taken:
  • S1011 Determine the maximum ordinate and the minimum ordinate among the ordinates of the vertices of each region.
  • the maximum ordinate and the minimum ordinate among the ordinates of all area vertices of each text area are determined.
  • S1012 Determine the maximum abscissa coordinate and the minimum abscissa coordinate among the vertex abscissas of each area.
  • the maximum abscissa and the minimum abscissa of the abscissas of all area vertices of each text area are determined.
  • the boundary of the target table can be determined by the maximum ordinate, minimum ordinate, maximum abscissa, and minimum abscissa.
  • S1013 Determine the number of first regions for each ordinate and the number of second regions for each abscissa according to the region location information.
  • the first area quantity is the number of text areas containing a certain vertical coordinate
  • the second area quantity is the number of text areas containing a certain abscissa.
  • the text area contains a certain vertical coordinate, which means that the vertical coordinate of the coordinate point in the text area is the above-mentioned certain vertical coordinate.
  • the text area contains a certain abscissa, which means that the abscissa of the coordinate point in the text area is the above-mentioned abscissa.
  • the abscissa interval and ordinate interval of the text area are determined based on the regional position information of the text area. If it is determined that a certain ordinate is located within the ordinate interval of the text area, it is determined that the text area contains the ordinate, If it is determined that a certain abscissa is located in the abscissa interval of the text area, then it is determined that the text area contains the abscissa.
  • the abscissa interval of a certain text area is determined based on the maximum and minimum values of the abscissa coordinates among the four area vertex coordinates of the text area.
  • the vertical coordinate interval of a text area is determined based on the maximum and minimum values of the vertical coordinate among the four regional vertex coordinates of the text area.
  • S1014 Determine the coordinates of each table row of the target table based on the maximum ordinate, the minimum ordinate, and the number of first areas.
  • each ordinate and its corresponding first area number the trough ordinate that meets the trough ordinate condition is determined, and each table row coordinate is obtained based on the maximum ordinate, the minimum ordinate, and the trough ordinate.
  • the condition of the trough ordinate is: the number of the first areas in the trough ordinate is not higher than the number of the first areas in the adjacent ordinate of the trough ordinate, and the adjacent ordinate of the trough ordinate is the previous ordinate of the trough ordinate. and the latter ordinate.
  • obtaining the coordinates of each table row based on the maximum ordinate, the minimum ordinate, and the trough ordinate includes: using the maximum ordinate and the minimum ordinate as table row coordinates; and generating based on each trough ordinate.
  • the trough ordinate interval (each ordinate within the trough ordinate interval is the trough ordinate), screen out the trough ordinate interval that does not include the maximum ordinate and the trough ordinate interval that does not include the minimum ordinate; in the filtered trough In the ordinate interval, if it is determined that the trough ordinate interval contains only one trough ordinate, then the trough ordinate interval will be used as the table row coordinate.
  • the trough ordinate interval contains multiple trough ordinates, then the trough ordinate interval will be used as the table row coordinate. Select a trough vertical coordinate as the table row coordinate.
  • a trough ordinate can be randomly selected from the trough ordinate interval. In actual applications, the specific method of selection can be set according to the actual application scenario, and is not limited here.
  • S1015 Determine the coordinates of each table column based on the maximum abscissa, the minimum abscissa, and the number of second areas.
  • each abscissa and its corresponding number of second areas determine the trough abscissa that meets the conditions of the trough abscissa, and obtain the coordinates of each table column based on the maximum abscissa, the minimum abscissa, and the trough abscissa.
  • the condition of the wave trough abscissa is: the number of the second areas of the wave trough abscissa is not higher than the number of the second areas of the adjacent abscissa of the wave trough abscissa, and the adjacent abscissa of the wave trough abscissa is the previous abscissa of the wave trough abscissa. and the latter abscissa.
  • the coordinates of each table column are obtained based on the maximum abscissa, the minimum abscissa, and the trough abscissa, including:
  • the trough abscissa interval of , and the trough abscissa interval that does not include the minimum abscissa; in the filtered trough abscissa interval If it is determined that the trough abscissa interval contains only one trough abscissa, then the trough abscissa is used as the table column coordinate. If it is determined that the trough abscissa interval contains multiple trough abscissas, then one trough abscissa is selected from the trough abscissa interval. Coordinates, as table column coordinates. Optionally, you can select a trough abscissa coordinate from the trough abscissa coordinate interval as the table column coordinate. In actual applications, the specific method of selection can be set according to the actual application scenario, and is not limited here.
  • Step 102 Generate a blank table based on the row coordinates of each table and the coordinates of each table column.
  • the target cell covered by the text area meets the following conditions: there is a coordinate point in the text area that is located within the target cell and does not coincide with the boundary of the target cell.
  • cell C is determined to be the target cell covered by text area A.
  • S1021 Determine the cell position information of the cells in the blank table based on the row coordinates of each table and the coordinates of each table column.
  • the cells in the blank table are rectangular, and the cell position information of the cell includes four cell vertex coordinates of the cell.
  • the cell vertex coordinates are the coordinates of the cell's vertex.
  • the cell vertex coordinates include the cell vertex ordinate and the cell vertex abscissa.
  • S1022 Determine the target cell covered by the text area based on the area position information and the cell position information.
  • the target text area in each text area if there are multiple text areas, then for the target text area in each text area (the target text area is any text area in each text area), if it is determined that the target cell covered by the target text area is If there are multiple, the target cells covered by the target text area will be merged.
  • Step 103 According to the region position information, add the regional text content in the text region recognition result to the blank table to obtain the target table.
  • a cell contains a text area, which means that all coordinate points in the text area are located in the cell, that is, the coverage area of the cell is not smaller than the text area.
  • determining that a cell contains only one text area based on the area location information may include: determining the center point coordinates of the text area based on the area location information. If based on the cell location information of a cell, determining only If the center point coordinate of a text area is located in the cell, it is determined that a cell contains only one text area.
  • the center point coordinates are the coordinates of the center point of the text area.
  • the coordinates of the center point include the abscissa coordinate of the center point and the ordinate coordinate of the center point. This is because in step 102, if the text area covers multiple cells, the multiple cells covered by the text area have been merged. The text area can only be located in one cell. Therefore, only the center point coordinates of the text area are used. , you can determine the cell where the text area is located.
  • adding the regional text content in the text area contained in a cell to a cell may include: setting the attribute value of the text attribute of a cell to the text area contained in the cell. The text content of the area within.
  • determining that a cell contains at least two text areas based on area location information may include: if it is determined that the number of text areas is multiple, determining each text area based on the area location information of each text area. The coordinates of the center point of the area. If, according to the cell position information of a cell, it is determined that the center point coordinates of at least two text areas are located in the cell, it is determined that a cell contains at least two text areas.
  • sorting the regional text content in each text area contained in a cell may include: sorting the regions in each text area contained in a cell in ascending order of the ordinate of the center point. Sort the text content; sort the regional text content in each text area with the same central point ordinate in ascending order of the center point's abscissa.
  • adding the sorted regional text content to a cell may include: setting the attribute value of a text attribute of a cell to the sorted regional text content.
  • the text content in each area can be sorted according to the actual application scenario, and there is no restriction here.
  • the target table corresponding to the image to be processed can be generated.
  • text recognition technology is used for text recognition to obtain the position and content of each text area in each image to be processed, which reduces the amount of development. Furthermore, the framed table in the image to be processed can be reconstructed, and The frameless table in the image to be processed can be reconstructed, which improves the accuracy of table reconstruction, and it can accurately identify merged cells and the situation where the same cell contains text content in multiple areas, further improving the accuracy of table reconstruction.
  • FIG. 2 is an example of a user attribute table image.
  • Figure 2 shows a user attribute table image containing multiple user attribute information.
  • the user attribute table image is an image to be processed that requires table reconstruction.
  • Figure 3 which is a specific flow chart of a method for reconstructing a user attribute table. The method shown in Figure 3 is used to reconstruct the user attribute table in the user attribute table image shown in Figure 2.
  • the specific implementation process of this method as follows:
  • Step 300 Perform text recognition on the user attribute table image, and obtain the text area recognition result of the user attribute table image.
  • OCR technology is used to perform text detection and text recognition on the user attribute table image shown in Figure 2, and the regional text content and regional location information of 34 text areas are obtained.
  • the area position information includes the vertex coordinates of each area of the text area, that is, the vertex coordinates of the first area, the vertex coordinates of the second area, the vertex coordinates of the third area, and the vertex coordinates of the fourth area.
  • the output format of the identified regional text content and the regional location information of each regional text content is ⁇ 'content':'regional text content','location':[first region vertex coordinates, second Area vertex coordinates, third area vertex coordinates, fourth area vertex coordinates] ⁇ .
  • the text area recognition results of the user attribute table image in Figure 2 are:
  • Step 301 Determine the maximum ordinate and the minimum ordinate among the ordinates of the vertices of each region in the text area recognition result.
  • Step 302 Determine the maximum abscissa and the minimum abscissa of the abscissas of the vertices of each region in the text area recognition result.
  • Step 303 Determine the number of first regions for each ordinate and the number of second regions for each abscissa according to the region position information in the text region recognition result.
  • the target ordinate in [table_bottom, table_top] ie [21, 268]
  • the target ordinate is any ordinate in [table_bottom, table_top]
  • determine the target ordinate line target ordinate line everyone on the table sits down
  • the ordinate of the punctuation point is the number of text areas passed through by the target ordinate)
  • the first area number of the target ordinate is obtained.
  • the target abscissa in [table_left, table_right] i.e.
  • the target abscissa is any abscissa in [table_left, table_right]
  • determine the target abscissa line each point on the target abscissa line
  • the abscissas of the coordinate points are the number of text areas passed through by the target abscissa line), and the second area number of the target abscissa line is obtained.
  • Step 304 Determine the coordinates of each table row of the target table based on the maximum ordinate, the minimum ordinate, and the number of first areas.
  • 21 and 268 are determined as table row coordinates, and based on the number of first areas in each ordinate, the first curve shown in Figure 4 is generated, and through the first curve in Figure 4, the The coordinates of multiple table rows are [67, 119, 172, 223, 259].
  • each ordinate may also be discontinuous (the ordinate is usually obtained by sampling), which is not limited here.
  • Step 305 Determine the coordinates of each table column based on the maximum abscissa, the minimum abscissa, and the number of second areas.
  • FIG. 5 is an example of the second curve.
  • 37 and 499 are determined as the abscissa of the table, and based on the number of the second areas of each abscissa, the second curve shown in Figure 5 is generated, and through the second curve in Figure 5, the number of The abscissa coordinates of each table are [139, 325, 456, 621] in order.
  • each abscissa may also be discontinuous (the abscissa is usually obtained by sampling), which is not limited here.
  • Step 306 Generate a blank table based on the row coordinates of each table and the coordinates of each table column.
  • Step 307 According to the area location information, add the area text content in the text area recognition result to the blank form to obtain the target form.
  • FIG. 6 is an example of merging table images.
  • the merged table image is an image to be processed that requires table reconstruction.
  • FIG 7 which is a specific flow chart of a method for reconstructing a merged table. The method shown in Figure 7 is used to reconstruct the merged table in the merged table image shown in Figure 6.
  • the specific implementation process of this method is as follows:
  • Step 700 Perform text recognition on the merged table image to obtain the text area recognition result of the user attribute table image.
  • the area position information includes the vertex coordinates of each area of the text area, that is, the vertex coordinates of the first area, the vertex coordinates of the second area, the vertex coordinates of the third area, and the vertex coordinates of the fourth area.
  • the output format of the identified regional text content and the regional location information of each regional text content is ⁇ 'content':'regional text content','location':[first region vertex coordinates, second Area vertex coordinates, third area vertex coordinates, fourth area vertex coordinates] ⁇ .
  • the text area recognition results of the merged table image in Figure 6 are:
  • Step 701 Determine the maximum ordinate and the minimum ordinate among the ordinates of the vertices of each region in the text area recognition result.
  • Step 702 Determine the maximum abscissa and the minimum abscissa of the abscissas of the vertices of each region in the text area recognition result.
  • Step 703 Determine the number of first regions for each ordinate and the number of second regions for each abscissa according to the region position information in the text region recognition result.
  • Step 704 Determine the coordinates of each table row of the target table based on the maximum ordinate, the minimum ordinate, and the number of first areas.
  • FIG. 8 is only used to illustrate the corresponding relationship between each ordinate and the number of first regions.
  • 84 and 250 are determined as table row coordinates, and based on the number of first areas in each ordinate, the first curve shown in Figure 8 is generated, and through the first curve in Figure 8, the The coordinates of multiple table rows are [137,195].
  • Step 705 Determine the coordinates of each table column based on the maximum abscissa, the minimum abscissa, and the number of second areas.
  • FIG. 9 is only used to illustrate the corresponding relationship between each abscissa and the number of second regions.
  • 213 and 728 are determined as the abscissas of the table, and based on the number of the second areas of each abscissa, the second curve shown in Figure 9 is generated, and through the second curve in Figure 9, the number of The abscissas of the tables are [437,586] in sequence.
  • Step 706 Generate a blank table based on the row coordinates of each table and the coordinates of each table column.
  • Step 707 According to the region position information, add the regional text content in the text region recognition result to the blank table to obtain the target table.
  • the embodiment of the present application also provides a device for table reconstruction. Since the principle of solving the problem of the above device and equipment is similar to a method for table reconstruction, the implementation of the above device can be referred to the implementation of the method. The repetitive parts will not be repeated.
  • FIG. 10 it is a schematic structural diagram of a table reconstruction device provided by an embodiment of the present application, including:
  • the recognition unit 1001 is used to perform text recognition on the image to be processed, and obtain the text area recognition result of the image to be processed.
  • the text area recognition result includes the regional text content and regional location information of the text area;
  • the determination unit 1002 is used to determine the table row coordinates and the table column coordinates of the target table according to the regional location information
  • the generation unit 1003 is used to generate a blank table based on the row coordinates of each table and the coordinates of each table column;
  • the obtaining unit 1004 is used to add the regional text content to the blank table according to the region location information to obtain the target table.
  • the area location information includes the area vertex coordinates of the text area
  • the identification unit 1001 is used to: perform text detection on the image to be processed, and obtain multiple area vertex coordinates of the text area, where the area vertex coordinates are the coordinates of the vertices of the text area. ; Perform text recognition on the text area and obtain the text content of the area.
  • the regional vertex coordinates include the regional vertex abscissa and the regional vertex ordinate.
  • the determining unit 1002 is used to: determine the maximum ordinate and the minimum ordinate in the ordinate of each regional vertex; determine the maximum ordinate in the abscissa of each regional vertex.
  • the maximum abscissa and the minimum abscissa according to the area position information, determine the number of the first area for each ordinate and the number of the second area for each abscissa.
  • the number of the first area is the number of text areas containing a certain ordinate.
  • the number of the second area is the number of text areas containing a certain abscissa; determine the coordinates of each table row based on the maximum ordinate, the minimum ordinate, and the number of the first area; determine the coordinates of each table row based on the maximum abscissa, the minimum abscissa, and the second area Quantity, determine the coordinates of each table column.
  • the determining unit 1002 is configured to determine the trough ordinate based on each ordinate and its corresponding number of first areas.
  • the number of first areas in the trough ordinate is not higher than the number of adjacent ordinates in the trough ordinate.
  • the number of the first area, the adjacent ordinates of the trough ordinate are the previous ordinate and the next ordinate of the trough ordinate; according to the maximum ordinate, the minimum ordinate, and the trough ordinate, the coordinates of each table row are obtained.
  • the determining unit 1002 is configured to: determine the trough abscissa according to each abscissa and its corresponding number of second areas.
  • the number of the second areas of the trough abscissa is not higher than the number of adjacent abscissas of the trough abscissa.
  • the second area quantity, the adjacent abscissa of the trough abscissa is the previous abscissa and the next abscissa of the trough; according to the maximum abscissa, the minimum abscissa, and the trough abscissa, the coordinates of each table column are obtained.
  • the generation unit 1003 is also used to: determine the cell position information of the cells in the blank table according to each table row coordinate and each table column coordinate; determine the text area according to the area position information and the cell position information.
  • the covered target cells where there are coordinate points in the text area that are within the target cells and do not coincide with the boundaries of the target cells; if it is determined that there are multiple target cells covered by the text area, the target cells will be merged.
  • the obtaining unit 1004 is used to: perform the following steps for each cell in the blank table: if it is determined that a cell contains only one text area according to the area position information, then the The regional text content in the text area is added to a cell; if it is determined that a cell contains at least two text areas according to the area position information, then the areas in each text area contained in a cell are added according to the area position information. Sort the text content and add the sorted range text content to a cell.
  • the obtaining unit 1004 is used to: determine the center point coordinates of the text area based on the area position information, and the center point coordinates are the coordinates of the center point of the text area; if based on the cell position information of a cell, determine only If the center point coordinate of a text area is located in the cell, it is determined that a cell contains only one text area.
  • the obtaining unit 1004 is used to: set the attribute value of the text attribute of a cell to the regional text content in the text area contained in the cell.
  • the obtaining unit 1004 is configured to: if it is determined that the number of text areas is multiple, determine the center point coordinates of each text area according to the area position information of each text area, and the center point coordinates are The coordinates of the center point; if based on the cell position information of a cell, it is determined that the center point coordinates of at least two text areas are located in the cell, then it is determined that a cell contains at least two text areas.
  • the center point coordinates include the center point abscissa and the center point ordinate.
  • the obtaining unit 1004 is used to: according to the descending order of the center point ordinate, the areas within each text area contained in a cell are Sort the text content; sort the regional text content in each text area with the same central point ordinate in ascending order of the center point's abscissa.
  • the obtaining unit 1004 is used to set the attribute value of the text attribute of a cell to the sorted regional text content.
  • Figure 11 shows a schematic structural diagram of an electronic device 1100.
  • the electronic device 1100 includes a processor 1110 and a memory 1120 .
  • it may also include a power supply 1130 , a display unit 1140 , and an input unit 1150 .
  • the processor 1110 is the control center of the electronic device 1100. It uses various interfaces and lines to connect various components, and executes various functions of the electronic device 1100 by running or executing software programs and/or data stored in the memory 1120, thereby controlling the electronic device 1100. Device 1100 performs overall monitoring.
  • the processor 1110 executes each step in the above embodiment when calling the computer program stored in the memory 1120.
  • the processor 1110 may include one or more processing units; preferably, the processor 1110 may integrate an application processor and a modem processor, where the application processor mainly processes operating systems, user interfaces, applications, etc., The modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 1110.
  • the processor and memory can be implemented on a single chip, and in some embodiments, they can also be implemented on separate chips.
  • the memory 1120 may mainly include a program storage area and a data storage area, where the program storage area may store operating systems, various applications, etc.; the storage data area may store data created according to the use of the electronic device 1100 , etc.
  • the memory 1120 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the electronic device 1100 also includes a power supply 1130 (such as a battery) that supplies power to various components.
  • the power supply can be logically connected to the processor 1110 through a power management system, thereby managing functions such as charging, discharging, and power consumption through the power management system.
  • the display unit 1140 may be used to display information input by the user or information provided to the user, as well as various menus of the electronic device 1100, etc. In the embodiment of the present invention, it is mainly used to display the display interface of each application in the electronic device 1100 and the display interface. text, pictures and other objects.
  • the display unit 1140 may include a display panel 1141.
  • the display panel 1141 can be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), etc.
  • the input unit 1150 may be used to receive information such as numbers or characters input by the user.
  • the input unit 1150 may include a touch panel 1151 and other input devices 1152.
  • the touch panel 1151 also called a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable object or accessory such as a finger, a touch pen, etc. on or near the touch panel 1151. nearby operations).
  • the touch panel 1151 can detect the user's touch operation and detect the signals brought by the touch operation, convert these signals into contact point coordinates, send them to the processor 1110, and receive and execute the commands sent by the processor 1110. .
  • the touch panel 1151 can be implemented using various types such as resistive, capacitive, infrared, and surface acoustic wave.
  • Other input devices 1152 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, power on/off keys, etc.), trackball, mouse, joystick, etc.
  • the touch panel 1151 can cover the display panel 1141.
  • the touch panel 1151 detects a touch operation on or near it, it is sent to the processor 1110 to determine the type of the touch event, and then the processor 1110 determines the type of the touch event according to the type of the touch event.
  • Corresponding visual output is provided on display panel 1141.
  • the touch panel 1151 and the display panel 1141 are used as two independent components to implement the input and output functions of the electronic device 1100, in some implementations
  • the touch panel 1151 and the display panel 1141 can be integrated to implement the input and output functions of the electronic device 1100 .
  • the electronic device 1100 may also include one or more sensors, such as a pressure sensor, a gravity acceleration sensor, a proximity light sensor, and the like. Of course, according to the needs of specific applications, the above-mentioned electronic device 1100 may also include other components such as cameras. Since these components are not the key components used in the embodiments of this application, they are not shown in Figure 11 and will not be described in detail. .
  • FIG. 11 is only an example of an electronic device and does not constitute a limitation on the electronic device. It may include more or fewer components than shown in the figure, or some components may be combined, or different components may be used.
  • a computer-readable storage medium has a computer program stored thereon.
  • the communication device can perform each step in the above embodiment.
  • each of the above parts is divided into modules (or units) according to their functions and described separately.
  • the functions of each module (or unit) can be implemented in the same or multiple software or hardware.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Character Input (AREA)

Abstract

La présente demande appartient au domaine technique du traitement de données. L'invention divulgue un procédé de reconstruction de table et un dispositif électronique. Le procédé consiste : à effectuer une reconnaissance de texte sur une image à traiter de façon à obtenir un résultat de reconnaissance de zone de texte de ladite image, le résultat de reconnaissance de zone de texte comprenant un contenu de texte de zone et des informations de position de zone d'une zone de texte ; en fonction des informations de position de zone, à déterminer chaque coordonnée de rangée de table et chaque coordonnée de colonne de table d'une table cible ; à générer une table vide en fonction de chaque coordonnée de rangée de table et de chaque coordonnée de colonne de table ; et selon les informations de position de zone, à ajouter le contenu de texte de zone dans la table vide de façon à obtenir la table cible. De cette manière, une table encadrée ou une table sans cadre dans une image à traiter peut être reconstruite, ce qui permet d'améliorer la précision et la plage d'application de reconstruction de table.
PCT/CN2023/084482 2022-05-13 2023-03-28 Procédé de reconstruction de table et dispositif électronique WO2023216745A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210523453.7 2022-05-13
CN202210523453.7A CN114943978B (zh) 2022-05-13 2022-05-13 一种表格重建的方法及电子设备

Publications (1)

Publication Number Publication Date
WO2023216745A1 true WO2023216745A1 (fr) 2023-11-16

Family

ID=82906729

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084482 WO2023216745A1 (fr) 2022-05-13 2023-03-28 Procédé de reconstruction de table et dispositif électronique

Country Status (2)

Country Link
CN (1) CN114943978B (fr)
WO (1) WO2023216745A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943978B (zh) * 2022-05-13 2023-10-03 上海弘玑信息技术有限公司 一种表格重建的方法及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190294399A1 (en) * 2018-03-26 2019-09-26 Abc Fintech Co., Ltd. Method and device for parsing tables in pdf document
CN112396048A (zh) * 2020-11-17 2021-02-23 中国平安人寿保险股份有限公司 图片信息提取方法、装置、计算机设备及存储介质
CN114463765A (zh) * 2022-02-10 2022-05-10 微民保险代理有限公司 一种表格信息提取方法、装置及存储介质
CN114943978A (zh) * 2022-05-13 2022-08-26 上海弘玑信息技术有限公司 一种表格重建的方法及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09330377A (ja) * 1996-06-10 1997-12-22 Hitachi Ltd 手書き文字認識装置および手書き文字認識方法
CN110334585B (zh) * 2019-05-22 2023-10-24 平安科技(深圳)有限公司 表格识别方法、装置、计算机设备和存储介质
CN111985465A (zh) * 2020-08-17 2020-11-24 中移(杭州)信息技术有限公司 文本识别方法、装置、设备及存储介质
CN113239227B (zh) * 2021-06-02 2023-11-17 泰康保险集团股份有限公司 图像数据结构化方法、装置、电子设备及计算机可读介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190294399A1 (en) * 2018-03-26 2019-09-26 Abc Fintech Co., Ltd. Method and device for parsing tables in pdf document
CN112396048A (zh) * 2020-11-17 2021-02-23 中国平安人寿保险股份有限公司 图片信息提取方法、装置、计算机设备及存储介质
CN114463765A (zh) * 2022-02-10 2022-05-10 微民保险代理有限公司 一种表格信息提取方法、装置及存储介质
CN114943978A (zh) * 2022-05-13 2022-08-26 上海弘玑信息技术有限公司 一种表格重建的方法及电子设备

Also Published As

Publication number Publication date
CN114943978A (zh) 2022-08-26
CN114943978B (zh) 2023-10-03

Similar Documents

Publication Publication Date Title
WO2018072663A1 (fr) Procédé et dispositif de traitement de données, procédé et système d'apprentissage de classificateur, et support de stockage
US7760189B2 (en) Touchpad diagonal scrolling
US8427438B2 (en) Virtual input tools
US20160342779A1 (en) System and method for universal user interface configurations
TWI611338B (zh) 縮放螢幕畫面的方法、電子裝置及電腦程式產品
US20130132874A1 (en) Automatically arranging of icons on a user interface
US8972891B2 (en) Method for handling objects representing annotations on an interactive input system and interactive input system executing the method
CN103135884A (zh) 以圈选方式进行检索的输入方法、系统及其装置
EP3493112B1 (fr) Procédé de traitement d'image, dispositif informatique et support d'informations lisible par ordinateur
US9588678B2 (en) Method of operating electronic handwriting and electronic device for supporting the same
US10445417B2 (en) Entry of values into multiple fields of a form using touch screens
US8938123B2 (en) Electronic device and handwritten document search method
EP3175375A1 (fr) Interrogation basée sur une image et permettant d'identifier des objets dans des documents
US9529526B2 (en) Information processing method and information processing device
WO2023216745A1 (fr) Procédé de reconstruction de table et dispositif électronique
CN116168038B (zh) 一种图像翻拍检测的方法、装置、电子设备及存储介质
US20160275095A1 (en) Electronic device, method and storage medium
WO2020000970A1 (fr) Procédé et appareil d'identification d'intérêt d'utilisateur, et dispositif terminal et support d'informations
US20150134641A1 (en) Electronic device and method for processing clip of electronic document
US20130346893A1 (en) Electronic device and method for editing document using the electronic device
CN107291367B (zh) 一种橡皮擦的使用方法及装置
US20160026613A1 (en) Processing image to identify object for insertion into document
US20180336173A1 (en) Augmenting digital ink strokes
CN107402673A (zh) 一种全局搜索方法、终端及计算机可读存储介质
CN111221917A (zh) 智能分区存储方法、装置及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802521

Country of ref document: EP

Kind code of ref document: A1