CN111639637A - Table identification method and device, electronic equipment and storage medium - Google Patents

Table identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111639637A
CN111639637A CN202010477731.0A CN202010477731A CN111639637A CN 111639637 A CN111639637 A CN 111639637A CN 202010477731 A CN202010477731 A CN 202010477731A CN 111639637 A CN111639637 A CN 111639637A
Authority
CN
China
Prior art keywords
cell
row
information
column
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010477731.0A
Other languages
Chinese (zh)
Other versions
CN111639637B (en
Inventor
韩光耀
庞敏辉
谢国斌
李丹青
曲福
姜泽青
冯博豪
杨舰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010477731.0A priority Critical patent/CN111639637B/en
Publication of CN111639637A publication Critical patent/CN111639637A/en
Application granted granted Critical
Publication of CN111639637B publication Critical patent/CN111639637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a form identification method and device, electronic equipment and a storage medium, and relates to the field of cloud computing and the field of image identification. The specific implementation scheme is as follows: acquiring position information of each cell in the form image; gridding the table image according to the position information of each cell to obtain row and column information of each grid; and obtaining the row and column information of each cell by utilizing the row and column information of the grids included by each cell. The embodiment of the application is beneficial to extracting the structural information of the table and realizing the complete identification of the table information.

Description

Table identification method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing, and more particularly to the field of image recognition.
Background
In daily work and life, a form is an important information carrier. People extract or record important information by operating on the table. In general, many form files are images and cannot be edited. Therefore, the related art proposes to identify a table in an image to extract key information therein.
Currently, the table is identified, and generally, the position of the table line or cell in the table is identified by using an image segmentation technique, such as a threshold segmentation technique or a deep learning segmentation technique.
Disclosure of Invention
The application provides a table identification method and device, electronic equipment and a storage medium.
According to an aspect of the present application, there is provided a table identifying method including:
acquiring position information of each cell in the form image;
gridding the table image according to the position information of each cell to obtain row and column information of each grid;
and obtaining the row and column information of each cell by utilizing the row and column information of the grids included by each cell.
According to another aspect of the present application, there is provided a form recognition apparatus including:
the acquisition module is used for acquiring the position information of each cell in the form image;
the gridding module is used for gridding the table image according to the position information of each cell to obtain the row and column information of each grid;
and the row and column module is used for obtaining the row and column information of each cell by utilizing the row and column information of the grids included by each cell.
According to another aspect of the present application, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method provided by any of the embodiments of the present application.
According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided by any of the embodiments of the present application.
According to the embodiment of the application, after the form image is gridded, the structural relationship of each cell in the complex form can be described by using the grid, so that the row and column information of each cell is obtained by using the row and column information of the grid, the structural information of the form is favorably extracted, and the complete identification of the form information is realized.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present application;
FIG. 2A is a schematic illustration of a form image in a first embodiment of the present application;
FIG. 2B is a schematic representation of gridding in the first embodiment of the present application;
FIG. 2C is a schematic representation of gridding in a first embodiment of the present application;
FIG. 3 is a schematic diagram according to a second embodiment of the present application;
FIG. 4 is a diagram illustrating row and column information of cells according to a second embodiment of the present application;
FIG. 5 is a schematic diagram of a relationship between cells and grids in a second embodiment of the present application;
FIG. 6 is a schematic illustration according to a third embodiment of the present application;
FIG. 7 is a schematic diagram of a U-net network in a third embodiment of the present application;
FIG. 8 is a schematic illustration according to a fourth embodiment of the present application;
FIG. 9 is a diagram of an application scenario in which embodiments of the present application may be implemented;
FIG. 10 is a schematic diagram of an example of an application of the present application;
fig. 11A is a schematic diagram of a table image before hough transform in an application example of the present application;
fig. 11B is a schematic diagram of a table image after hough transform in an application example of the present application;
FIG. 12 is a schematic view of a fifth embodiment of the present application;
FIG. 13 is a schematic illustration of a sixth embodiment of the present application;
fig. 14 is a block diagram of an electronic device for implementing a table identification method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a method of a table identification method according to an embodiment of the present application, as shown in fig. 1, the method including:
step S11, obtaining the position information of each cell in the form image;
step S12, gridding the table image according to the position information of each cell to obtain the row and column information of each grid;
step S13 is to obtain the row/column information of each cell by using the row/column information of the mesh included in each cell.
According to the embodiment of the application, after the form image is gridded, the structural relationship of each cell in the complex form can be described by using the grid, so that the row and column information of each cell is obtained by using the row and column information of the grid, the structural information of the form is favorably extracted, and the complete identification of the form information is realized.
There are various embodiments for obtaining the position information of each cell in the form image.
For example, the table lines of the table image are first segmented from the table image using a deep learning network such as a U-net network, FCN (full Convolutional neural network). Then, the position information of each cell in the table image is determined using OpenCV (Open source computer Vision Library) and the divided table line. The cells are identified based on the table lines, even if the quality problems of backlight, ink shortage and the like exist in the image, the cells can still be accurately identified, and the table identification method is favorable for accurately identifying the table from the image with a complex background.
For another example, an outer frame of the table image is segmented from the table image by using a deep learning network, a threshold segmentation algorithm, a watershed algorithm or an edge detection algorithm, and then, by using OpenCV and the segmented table outer frame, a table line is detected and position information of each cell in the table image is determined.
Illustratively, the location information of the cell includes coordinates of the cell, such as upper left-hand coordinates (x1, y1) and lower right-hand coordinates (x2, y 2). Alternatively, the location information of the cell includes coordinates and sizes of the cell, such as upper left-hand coordinates (x1, y1) and cell length, width.
In one example, each edge line of each cell can be extended according to the coordinate of each cell, and the head and tail ends of the extended line are connected with the table outer frame or the table image outer frame, so as to realize gridding of the table image.
For example, the grid image of fig. 2B can be obtained by grid-forming the table image of fig. 2A. Each grid includes two adjacent transverse lines and two adjacent vertical lines, such as grid E in fig. 2B.
In one example, if the difference between the coordinates of the plurality of cells is smaller than the threshold, a grid line connected end to end with the table frame or with the table image frame may be obtained based on the coordinates where the difference is smaller. The table image is gridded using the obtained grid lines.
For example, as shown in FIG. 2C, if the abscissa of the upper left corner, i.e., the abscissa of the vertical left side, of cell G, H and I are very similar, then the cells may be in the same column, and the average, median, or reference value of the similar coordinates is taken as the grid line abscissa, resulting in a grid line L (see the most-sided dashed line in FIG. 2C). Therefore, the situation that excessive redundant grids appear in the gridding image due to poor image quality or errors in the position information acquisition process can be avoided.
The threshold value in this example may be set according to the size of each cell in the table image, or the size of the entire table.
As an exemplary implementation manner, referring to fig. 3, in step S13, obtaining the row and column information of each cell by using the row and column information of the grid included in each cell may include:
step S131, under the condition that the cell comprises at least two grids, the row and column serial numbers of the grids positioned at the upper left corner in the grids comprised by the cell are used for obtaining the row and column serial numbers of the cell.
In the exemplary embodiment, after the table image is gridded, the cells across rows and columns are identified according to the number of the grids included in the cells, and the row and column serial numbers of the cells across rows and columns are obtained by using the row and column serial numbers of the grids. The structure information of the row-column cells is accurately extracted, the form image is favorably restored into an editable form document, and the complete identification of the form information is realized.
As an exemplary implementation manner, referring to fig. 3, in step S13, obtaining the row and column information of each cell by using the row and column information of the grid included in each cell may further include:
step S132, the row and column number of the grids at the lower right corner and the row and column number of the grids at the upper left corner in the grids included in the cell are used to obtain the number of the rows and columns across the cell.
In the exemplary embodiment, the number of cross-row and column of cells crossing rows and columns is determined using the row and column information of the grid. The structure information of the row-column cells is accurately extracted, the form image is favorably restored into an editable form document, and the complete identification of the form information is realized.
As an exemplary implementation manner, referring to fig. 3, in step S13, obtaining the row and column information of each cell by using the row and column information of the grid included in each cell may further include:
step S133, determining the row and column number of the grid included in the cell as the row and column number of the cell when the cell includes one grid.
In the exemplary embodiment, the row and column number of the cell which does not span rows and columns is determined by the row and column information of the grid. The structure information of the row-column cells is accurately extracted, the form image is favorably restored into an editable form document, and the complete identification of the form information is realized.
For example, as shown in FIG. 4, cell C comprises four grids, and thus, cell C is a cell that spans rows and columns. In the grids included in the cell C, the grid at the upper left corner is the grid at row 2 and column 5, and therefore, the cell C can be denoted as the cell at row 2 and column 5. The grid at the lower right corner is the grid at the 3 rd row and the 6 th column, the row sequence number difference value between the grid at the lower right corner and the grid at the upper left corner is 2, the column sequence number difference value is 2, and the cell C spans two rows and two columns.
In an embodiment of the present application, an area of a region in a grid that is in a cell may be calculated, and when the area is greater than a threshold value, for example, greater than 50% or 70% of the area of the grid, it is determined that the cell includes the grid.
For example, as shown in fig. 5, when the grid lines deviate from the coordinates of the cells, the area of the region of each grid in the cell D is calculated, and it can be determined that the area of the region of the grid 1 in the cell D is greater than 50% of the area of the grid 1, and the area of the region of the grid 9 in the cell D is less than 50% of the area of the grid 9. Thus, cell D includes grid 1 but does not include grid 9.
The embodiment of the present application further provides an exemplary implementation manner, where a U-net network is used to obtain location information of each cell in a table image, where a Skip Connection structure of the U-net network is improved. Referring to fig. 6, acquiring the location information of each cell in the form image may include:
step S21, inputting the tabular image into a U-net network, wherein the U-net network comprises a pooling part and an up-sampling part, the pooling part comprises a convolution layer and a pooling layer, and the up-sampling part comprises a convolution layer and an up-sampling layer;
step S22, performing convolution and pooling on the table image by utilizing a pooling part, wherein the pooling part is used for outputting first characteristic information;
step S23, performing upsampling on the first characteristic information to obtain second characteristic information;
step S24, performing convolution and upsampling on the feature information output by the pooling part by using the upsampling part, wherein an upsampling layer of the upsampling part is used for outputting third feature information, and a convolution layer of the upsampling part is used for performing feature fusion on the first feature information, the second feature information and the third feature information;
and step S25, obtaining the position information of each cell in the form image by using the fused feature information.
The exemplary embodiment improves the jump connection structure in the U-net network, performs up-sampling on the non-bottom layer image, and adds the up-sampled non-bottom layer image into the U-net network for feature fusion, so that the recognition target in the form image, such as a form outer frame or a form line, can be more accurately segmented, and the accuracy of form recognition is improved.
Each feature information obtained in the above steps is shown in fig. 7 by taking a table image 100 of 256 × 256 × 1 as an example, and may include an image as an example. 256 × 256 × 1 indicates that the size of the form image 100 is 256 × 256 and the number of channels is 1. Referring to fig. 7, the U-net network includes a pooling part 61 and an upsampling part 62. Wherein pooling may also be referred to as down-sampling.
In the pooled portion of the U-net network, the form image 100 is processed using multiple convolutional layers and multiple pooling layers. Wherein the convolution layer is used to convolve the input feature information (see conv in fig. 7), and output first feature information 11, 12, 13, and 14. The pooling layer is used to pool the first feature information (see posing in fig. 7), and the feature information obtained by pooling (posing) is input to the next convolutional layer. After alternating convolution (conv) and pooling (pooling), the pooling portion outputs the characteristic information 15.
Then, the feature information 16 obtained by convolving the feature information 15 is input to the up-sampling part of the U-net network. The upsampling section processes the convolved feature information 16 using the plurality of convolution layers and the plurality of upsampling layers. Wherein each upsampling layer upsamples the feature information (see Up Sampling in fig. 7), and outputs a third feature information. The U-net network adopts a jump connection structure, after third characteristic information is output by an upper sampling layer, first characteristic information with the same size as the third characteristic information can be cut by a convolution layer connected with the upper sampling layer, the third characteristic information and the first characteristic information are subjected to characteristic fusion, and then convolution is carried out. For example, in fig. 7, after the feature information 16 is up-sampled, the obtained third feature information is feature-fused with the first feature information 14, so as to obtain the feature information 31.
In the embodiment of the present application, the first feature information 12, 13, and 14 may also be upsampled to obtain corresponding second feature information 21, 22, and 23. When the third feature information is output by the upper sampling layer, the information for feature fusion includes the third feature information, the first feature information and the second feature information having the same size as the third feature information. The fourth feature information 41, 42, and 43 are obtained through fusion of the three feature information. For example, the first feature information 14 is upsampled to obtain the second feature information 23. After the feature information 32 with the size of 32 × 32 is up-sampled, the obtained third feature information with the size of 64 × 64 is feature-fused with the first feature information 13 and the second feature information 23 with the size of 64 × 64, so as to obtain fourth feature information 41.
Each convolution layer is used for performing convolution conv on the fourth feature information after feature fusion. The characteristic information output by the convolutional layer will be input to the next upsampling layer. The upsampling section 62 outputs the feature information 200 having the same size as the form image 100 through the upsampling Up Sampling and convolution conv alternately performed.
Using the feature information 200, the location information of the cells in the form image can be obtained. For example, convolution conv is performed on the feature information 200 to obtain an image 300 with the number of channels being 1. The U-net network outputs an image 300 of the outer frame or line of the table that is well segmented. With OpenCV, based on the image 300, the position information of each cell can be obtained.
As an exemplary implementation manner, referring to fig. 8, a table identification method provided in an embodiment of the present application may further include:
step S31, determining an included angle between the form image and a horizontal line by using a classifier;
in step S32, when the angle between the form image and the horizontal line is not the reference angle, the form image is rotated so that the angle between the form image and the horizontal line is the reference angle.
Illustratively, the reference angle is 0, which is advantageous for correcting the tilted form image.
For example, the form image is input to the VGG16 quad classifier, which determines that the form image is at an angle of 0, 90, 180, or 270 to the horizontal line by the VGG16 quad classifier. Then, based on the angle, the form image can be processed to be at an angle of 0 ° to the horizontal.
According to the exemplary embodiment, the form image can be corrected into the image in the positive direction, so that the wrong form structure information is prevented from being acquired due to the fact that the form direction is not correct, and the accuracy of form identification is improved.
The following describes the effects of the embodiments of the present application with specific application examples:
referring to fig. 9, a view of an application scenario of the embodiment of the present application is shown. In this application example, the Web server provides a form identification service. The Web server may receive an image uploaded by a user, encode the Base64 of the image into a form recognition service. And then outputting a character string in a json format through four-stage processing, wherein the character string stores the structural information of the table. The Web server can display the table in the image in the form of editable document such as Excel or Word in the front page by using the character string.
With reference to fig. 10, the form recognition service includes four processing stages for the form image: the method comprises four stages of picture preprocessing, table detection, character recognition and table structuring.
Stage one: picture preprocessing
1. Inclination correction
In general, it is difficult to ensure that the image is perfectly forward, either by scanning or by shooting. In this step, the table image is corrected by hough transform so that the horizontal lines of the table in the table image are horizontal or the vertical lines of the table are vertical. For example, the form image of fig. 11A is rectified into the form image of fig. 11B.
2. Direction detection
After correction by Hough transform, the included angle between the tabular image and the horizontal line is 0 degree, 90 degrees, 180 degrees or 270 degrees. Using the VGG16 quad classifier, the form image is classified as one of these four angles. Then, the form image is rotated according to the determined angle of the included angle so that the form image is oriented in the forward direction, that is, the form image is rotated to have an included angle of 0 ° with the horizontal line.
In practice, a large number of images, for example 1000 images, can be randomly rotated to one of the four angles. 70% of the images were used as a training set, 20% of the images were used as a validation set, and 10% of the images were used as a test set. Through training and verification, the accuracy of the direction detection of the images in the test set can reach 93%.
3. Seal detection and seal removal.
In the financial field, financial statement images containing red seals are often processed. For such form images, the red channel in the image may be separated, the pixel with the red channel value greater than the threshold value is identified as the stamp pixel, and then the red channel value of the stamp pixel is modified to 0 to remove the stamp in the image.
And a second stage: table detection
1. Form line detection
And detecting the table lines by adopting the improved U-net network. The U-net network has fast convergence, the positions of the table lines are roughly judged through two parts of pooling and upsampling, and then the table lines are gradually finely divided, so that the effect of detecting the table lines is good.
In practice, batch normalization (Batchnormalization) may be added after each convolutional layer in the U-net network. As the depth of the network increases, the distribution of the feature values of each layer gradually approaches to the upper end and the lower end of the output interval of the activation function (the saturation interval of the activation function), which easily causes the disappearance of the gradient. The Batch Normalization can readjust the distribution of each layer of characteristic values into standard normal distribution, the characteristic values fall in the range where the activation function is sensitive to input, and even if the input changes a little, the loss function can change greatly, so that the gradient is prevented from disappearing, and meanwhile, the convergence speed is accelerated.
In addition, the exemplary implementation manner in the embodiment of the present application may also be adopted, that is, the non-underlying image is up-sampled to obtain the second feature information, and the second feature information is added to the U-net network to perform feature fusion. By improving the jump connection structure of the U-net network, the detection effect can be improved to a certain extent.
For example, a table line is detected from the table image, and an image including only the table line in fig. 2A is obtained.
2. Cell identification
Cell detection is performed by using OpenCV based on the table lines detected in the table line detection step, and the position information of each cell is obtained.
In the stage, the position information of the cells is acquired in two steps, the table lines are detected by using a deep learning image segmentation technology (U-net) network, and then the position information of the cells is identified based on the table lines, so that the cells can be accurately identified even if the picture quality problems such as backlight, ink shortage and the like exist.
And a third stage: character recognition
The characters in the form image and their position information are recognized by OCR (Optical Character Recognition).
And a fourth stage: structured form
1. Meter head identification
And extracting the position information of the first row of cells and the position information of the first column of cells, reading characters at corresponding positions, and obtaining the top header information and the side header information of each cell.
2. Rank information acquisition
Through the flow shown in fig. 3, the row-column serial number and the number of crossing rows and columns of each cell are obtained. With this information, the form image can be accurately converted into an editable form document.
According to the embodiment of the application, after the form image is gridded, the structural relationship of each cell in the complex form can be described by using the grid, so that the row and column information of each cell is obtained by using the row and column information of the grid, the structural information of the form is favorably extracted, and the complete identification of the form information is realized.
The application also provides a form recognition device. Referring to fig. 12, the table recognition apparatus 200 includes:
an obtaining module 210, configured to obtain location information of each cell in the form image;
the gridding module 220 is configured to gridd the form image according to the position information of each cell to obtain row and column information of each grid;
the row-column module 230 is configured to obtain the row-column information of each cell by using the row-column information of the grids included in each cell.
Illustratively, referring to FIG. 13, rank module 230 includes:
the first sequence sub-module 231 is configured to, when a cell includes at least two grids, obtain a row-column sequence of the cell by using a row-column sequence of a grid located at an upper left corner in the grids included in the cell.
Illustratively, referring to FIG. 13, the rank module 230 further includes:
the quantity submodule 232 is configured to obtain the number of rows and columns crossing the cells by using the row and column serial numbers of the grids at the lower right corner and the row and column serial numbers of the grids at the upper left corner in the grids included in the cells.
Illustratively, referring to FIG. 13, rank module 230 includes:
the second sequence sub-module 233 is configured to determine that the row and column sequence number of the grid included in the cell is determined as the row and column sequence number of the cell if the cell includes a grid.
Illustratively, referring to fig. 13, the obtaining module 210 includes:
an input sub-module 211 for inputting the tabular image into a U-net network, the U-net network including a pooling portion and an upsampling portion, the pooling portion including a convolutional layer and a pooling layer, the upsampling portion including a convolutional layer and an upsampling layer;
a pooling sub-module 212 for convolving and pooling the table image with pooled portions, wherein the pooled portion convolution layer is used for outputting first feature information;
a first upsampling submodule 213, configured to upsample the first feature information to obtain second feature information;
a second upsampling submodule 214, configured to perform convolution and upsampling on the feature information output by the pooling part by using an upsampling part, where an upsampling layer of the upsampling part is configured to output third feature information, and a convolution layer of the upsampling part is configured to perform feature fusion on the first feature information, the second feature information, and the third feature information;
and the information acquisition submodule 215 is configured to obtain the position information of each cell in the form image by using the fused feature information.
Illustratively, referring to fig. 13, the table identifying apparatus 200 further includes:
a classification module 310, configured to determine an included angle between the form image and the horizontal line by using a classifier;
and a rotating module 320, configured to rotate the table image to make the included angle between the table image and the horizontal line be a reference angle, when the included angle between the table image and the horizontal line is not a reference angle.
The form recognition device provided by the embodiment of the application can realize the form recognition method provided by any embodiment of the application, and has corresponding beneficial effects.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 14 is a block diagram of an electronic device according to the table identification method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 14, the electronic apparatus includes: one or more processors 901, memory 902, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 14 illustrates an example of a processor 901.
Memory 902 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the table identification method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the table identification method provided herein.
The memory 902, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the table identification method in the embodiments of the present application (e.g., the obtaining module 210, the gridding module 220, and the rank module 230 shown in fig. 12). The processor 901 executes various functional applications of the server and data processing, i.e., implements the table identification method in the above method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 902.
The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device of the table recognition method, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include a memory remotely located from the processor 901, and these remote memories may be connected to the electronic device of the table recognition method through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the form recognition method may further include: an input device 903 and an output device 904. The processor 901, the memory 902, the input device 903, and the output device 904 may be connected by a bus or other means, and fig. 14 illustrates an example of connection by a bus.
The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the form recognition method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 904 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the embodiment of the application, after the form image is gridded, the structural relationship of each cell in the complex form can be described by using the grid, so that the row and column information of each cell is obtained by using the row and column information of the grid, the structural information of the form is favorably extracted, and the complete identification of the form information is realized.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (14)

1. A form identification method, comprising:
acquiring position information of each cell in the form image;
gridding the table image according to the position information of each cell to obtain row and column information of each grid;
and obtaining the row and column information of each cell by utilizing the row and column information of the grids included by each cell.
2. The form recognition method of claim 1, wherein obtaining the row and column information of each cell by using the row and column information of the grid included in each cell comprises:
and under the condition that the cell comprises at least two grids, obtaining the row and column serial numbers of the cells by utilizing the row and column serial numbers of the grids positioned at the upper left corner in the grids comprised by the cell.
3. The form identification method of claim 2, further comprising:
and obtaining the row-column crossing quantity of the cells by utilizing the row-column sequence numbers of the grids positioned at the lower right corner and the row-column sequence numbers of the grids positioned at the upper left corner in the grids included by the cells.
4. The form recognition method of claim 1, wherein obtaining the row and column information of each cell by using the row and column information of the grid included in each cell comprises:
and under the condition that the cell comprises one grid, determining the row and column serial number of the grid comprised by the cell as the row and column serial number of the cell.
5. The form recognition method of any one of claims 1 to 4, wherein obtaining location information for each cell in the form image comprises:
inputting the tabular image into a U-net network, the U-net network comprising a pooled portion and an upsampled portion, the pooled portion comprising a convolutional layer and a pooled layer, the upsampled portion comprising a convolutional layer and an upsampled layer;
convolving and pooling the table image with the pooling part, wherein the pooling part convolution layer is used for outputting first feature information;
the first characteristic information is up-sampled to obtain second characteristic information;
performing convolution and upsampling on the feature information output by the pooling part by using the upsampling part, wherein an upsampling layer of the upsampling part is used for outputting third feature information, and a convolution layer of the upsampling part is used for performing feature fusion on the first feature information, the second feature information and the third feature information;
and obtaining the position information of each cell in the form image by using the fused feature information.
6. The form identification method of any of claims 1 to 4, further comprising:
determining an included angle between the form image and a horizontal line by using a classifier;
and under the condition that the included angle between the table image and the horizontal line is not the reference angle, rotating the table image to enable the included angle between the table image and the horizontal line to be the reference angle.
7. A form recognition apparatus comprising:
the acquisition module is used for acquiring the position information of each cell in the form image;
the gridding module is used for gridding the table image according to the position information of each cell to obtain the row and column information of each grid;
and the row and column module is used for obtaining the row and column information of each cell by utilizing the row and column information of the grids included by each cell.
8. The form recognition apparatus of claim 7, wherein the rank module comprises:
and the first sequence number submodule is used for obtaining the row and column sequence numbers of the cells by utilizing the row and column sequence numbers of the grids positioned at the upper left corner in the grids included in the cells under the condition that the cells include at least two grids.
9. The form identification apparatus of claim 8, wherein the rank module further comprises:
and the quantity submodule is used for obtaining the row-column crossing quantity of the cells by utilizing the row-column serial numbers of the grids positioned at the lower right corner and the row-column serial numbers of the grids positioned at the upper left corner in the grids included by the cells.
10. The form recognition apparatus of claim 7, wherein the rank module comprises:
and the second sequence number submodule is used for determining that the row and column sequence number of the grid included by the cell is determined as the row and column sequence number of the cell if the cell includes one grid.
11. The form recognition apparatus of any one of claims 7 to 10, wherein the obtaining module comprises:
an input sub-module for inputting the tabular image into a U-net network, the U-net network comprising a pooled portion and an upsampled portion, the pooled portion comprising a convolutional layer and a pooled layer, the upsampled portion comprising a convolutional layer and an upsampled layer;
the pooling submodule is used for performing convolution and pooling on the table image by using the pooling part, wherein the pooling part convolution layer is used for outputting first characteristic information;
the first up-sampling sub-module is used for up-sampling the first characteristic information to obtain second characteristic information;
a second upsampling sub-module, configured to perform convolution and upsampling on the feature information output by the pooling part by using the upsampling part, where an upsampling layer of the upsampling part is configured to output third feature information, and a convolution layer of the upsampling part is configured to perform feature fusion on the first feature information, the second feature information, and the third feature information;
and the information acquisition submodule is used for acquiring the position information of each cell in the form image by using the fused feature information.
12. The form recognition apparatus according to any one of claims 7 to 10, further comprising:
the classification module is used for determining an included angle between the form image and the horizontal line by using the classifier;
and the rotating module is used for rotating the table image so as to enable the included angle between the table image and the horizontal line to be the reference angle under the condition that the included angle between the table image and the horizontal line is not the reference angle.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN202010477731.0A 2020-05-29 2020-05-29 Table identification method, apparatus, electronic device and storage medium Active CN111639637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010477731.0A CN111639637B (en) 2020-05-29 2020-05-29 Table identification method, apparatus, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010477731.0A CN111639637B (en) 2020-05-29 2020-05-29 Table identification method, apparatus, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN111639637A true CN111639637A (en) 2020-09-08
CN111639637B CN111639637B (en) 2023-08-15

Family

ID=72331625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010477731.0A Active CN111639637B (en) 2020-05-29 2020-05-29 Table identification method, apparatus, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN111639637B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241730A (en) * 2020-11-21 2021-01-19 杭州投知信息技术有限公司 Form extraction method and system based on machine learning
CN113139457A (en) * 2021-04-21 2021-07-20 浙江康旭科技有限公司 Image table extraction method based on CRNN
CN113221519A (en) * 2021-05-18 2021-08-06 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for processing tabular data
CN113378789A (en) * 2021-07-08 2021-09-10 京东数科海益信息科技有限公司 Cell position detection method and device and electronic equipment
CN113609906A (en) * 2021-06-30 2021-11-05 南京信息工程大学 Document-oriented table information extraction method
CN113705576A (en) * 2021-11-01 2021-11-26 江西中业智能科技有限公司 Text recognition method and device, readable storage medium and equipment
CN114359938B (en) * 2022-01-07 2023-09-29 北京有竹居网络技术有限公司 Form identification method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156761A (en) * 2016-08-10 2016-11-23 北京交通大学 The image form detection of facing moving terminal shooting and recognition methods
CN109961008A (en) * 2019-02-13 2019-07-02 平安科技(深圳)有限公司 Form analysis method, medium and computer equipment based on text location identification
CN110163198A (en) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 A kind of Table recognition method for reconstructing, device and storage medium
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
US20200089946A1 (en) * 2018-06-11 2020-03-19 Innoplexus Ag System and method for extracting tabular data from electronic document

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156761A (en) * 2016-08-10 2016-11-23 北京交通大学 The image form detection of facing moving terminal shooting and recognition methods
US20200089946A1 (en) * 2018-06-11 2020-03-19 Innoplexus Ag System and method for extracting tabular data from electronic document
CN110163198A (en) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 A kind of Table recognition method for reconstructing, device and storage medium
CN109961008A (en) * 2019-02-13 2019-07-02 平安科技(深圳)有限公司 Form analysis method, medium and computer equipment based on text location identification
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241730A (en) * 2020-11-21 2021-01-19 杭州投知信息技术有限公司 Form extraction method and system based on machine learning
CN113139457A (en) * 2021-04-21 2021-07-20 浙江康旭科技有限公司 Image table extraction method based on CRNN
CN113221519A (en) * 2021-05-18 2021-08-06 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for processing tabular data
CN113221519B (en) * 2021-05-18 2024-03-29 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for processing form data
CN113609906A (en) * 2021-06-30 2021-11-05 南京信息工程大学 Document-oriented table information extraction method
CN113378789A (en) * 2021-07-08 2021-09-10 京东数科海益信息科技有限公司 Cell position detection method and device and electronic equipment
WO2023279847A1 (en) * 2021-07-08 2023-01-12 京东科技信息技术有限公司 Cell position detection method and apparatus, and electronic device
CN113378789B (en) * 2021-07-08 2023-09-26 京东科技信息技术有限公司 Cell position detection method and device and electronic equipment
CN113705576A (en) * 2021-11-01 2021-11-26 江西中业智能科技有限公司 Text recognition method and device, readable storage medium and equipment
CN113705576B (en) * 2021-11-01 2022-03-25 江西中业智能科技有限公司 Text recognition method and device, readable storage medium and equipment
CN114359938B (en) * 2022-01-07 2023-09-29 北京有竹居网络技术有限公司 Form identification method and device

Also Published As

Publication number Publication date
CN111639637B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
CN111639637B (en) Table identification method, apparatus, electronic device and storage medium
CN111753727B (en) Method, apparatus, device and readable storage medium for extracting structured information
CN113657390B (en) Training method of text detection model and text detection method, device and equipment
CN111695553A (en) Form recognition method, device, equipment and medium
WO2020140698A1 (en) Table data acquisition method and apparatus, and server
CN111784663B (en) Method and device for detecting parts, electronic equipment and storage medium
WO2020063314A1 (en) Character segmentation identification method and apparatus, electronic device, and storage medium
CN113221743B (en) Table analysis method, apparatus, electronic device and storage medium
CN111626027B (en) Table structure restoration method, device, equipment, system and readable storage medium
US20220189083A1 (en) Training method for character generation model, character generation method, apparatus, and medium
CN111783645A (en) Character recognition method and device, electronic equipment and computer readable storage medium
US20220180043A1 (en) Training method for character generation model, character generation method, apparatus and storage medium
CN112508003A (en) Character recognition processing method and device
JP2022536320A (en) Object identification method and device, electronic device and storage medium
CN111709428A (en) Method and device for identifying key point positions in image, electronic equipment and medium
CN111523292B (en) Method and device for acquiring image information
CN113326766A (en) Training method and device of text detection model and text detection method and device
CN115187995B (en) Document correction method, device, electronic equipment and storage medium
CN116863017A (en) Image processing method, network model training method, device, equipment and medium
CN111444834A (en) Image text line detection method, device, equipment and storage medium
CN113657398B (en) Image recognition method and device
CN113221742B (en) Video split screen line determining method, device, electronic equipment, medium and program product
CN112558810B (en) Method, apparatus, device and storage medium for detecting fingertip position
CN113887394A (en) Image processing method, device, equipment and storage medium
CN111783780B (en) Image processing method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant