CN111639637A

CN111639637A - Table identification method and device, electronic equipment and storage medium

Info

Publication number: CN111639637A
Application number: CN202010477731.0A
Authority: CN
Inventors: 韩光耀; 庞敏辉; 谢国斌; 李丹青; 曲福; 姜泽青; 冯博豪; 杨舰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-08
Anticipated expiration: 2040-05-29
Also published as: CN111639637B

Abstract

The application discloses a form identification method and device, electronic equipment and a storage medium, and relates to the field of cloud computing and the field of image identification. The specific implementation scheme is as follows: acquiring position information of each cell in the form image; gridding the table image according to the position information of each cell to obtain row and column information of each grid; and obtaining the row and column information of each cell by utilizing the row and column information of the grids included by each cell. The embodiment of the application is beneficial to extracting the structural information of the table and realizing the complete identification of the table information.

Description

Table identification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing, and more particularly to the field of image recognition.

Background

In daily work and life, a form is an important information carrier. People extract or record important information by operating on the table. In general, many form files are images and cannot be edited. Therefore, the related art proposes to identify a table in an image to extract key information therein.

Currently, the table is identified, and generally, the position of the table line or cell in the table is identified by using an image segmentation technique, such as a threshold segmentation technique or a deep learning segmentation technique.

Disclosure of Invention

The application provides a table identification method and device, electronic equipment and a storage medium.

According to an aspect of the present application, there is provided a table identifying method including:

acquiring position information of each cell in the form image;

gridding the table image according to the position information of each cell to obtain row and column information of each grid;

and obtaining the row and column information of each cell by utilizing the row and column information of the grids included by each cell.

According to another aspect of the present application, there is provided a form recognition apparatus including:

the acquisition module is used for acquiring the position information of each cell in the form image;

the gridding module is used for gridding the table image according to the position information of each cell to obtain the row and column information of each grid;

and the row and column module is used for obtaining the row and column information of each cell by utilizing the row and column information of the grids included by each cell.

According to another aspect of the present application, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method provided by any of the embodiments of the present application.

According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided by any of the embodiments of the present application.

According to the embodiment of the application, after the form image is gridded, the structural relationship of each cell in the complex form can be described by using the grid, so that the row and column information of each cell is obtained by using the row and column information of the grid, the structural information of the form is favorably extracted, and the complete identification of the form information is realized.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2A is a schematic illustration of a form image in a first embodiment of the present application;

FIG. 2B is a schematic representation of gridding in the first embodiment of the present application;

FIG. 2C is a schematic representation of gridding in a first embodiment of the present application;

FIG. 3 is a schematic diagram according to a second embodiment of the present application;

FIG. 4 is a diagram illustrating row and column information of cells according to a second embodiment of the present application;

FIG. 5 is a schematic diagram of a relationship between cells and grids in a second embodiment of the present application;

FIG. 6 is a schematic illustration according to a third embodiment of the present application;

FIG. 7 is a schematic diagram of a U-net network in a third embodiment of the present application;

FIG. 8 is a schematic illustration according to a fourth embodiment of the present application;

FIG. 9 is a diagram of an application scenario in which embodiments of the present application may be implemented;

FIG. 10 is a schematic diagram of an example of an application of the present application;

fig. 11A is a schematic diagram of a table image before hough transform in an application example of the present application;

fig. 11B is a schematic diagram of a table image after hough transform in an application example of the present application;

FIG. 12 is a schematic view of a fifth embodiment of the present application;

FIG. 13 is a schematic illustration of a sixth embodiment of the present application;

fig. 14 is a block diagram of an electronic device for implementing a table identification method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a method of a table identification method according to an embodiment of the present application, as shown in fig. 1, the method including:

step S11, obtaining the position information of each cell in the form image;

step S12, gridding the table image according to the position information of each cell to obtain the row and column information of each grid;

step S13 is to obtain the row/column information of each cell by using the row/column information of the mesh included in each cell.

There are various embodiments for obtaining the position information of each cell in the form image.

For example, the table lines of the table image are first segmented from the table image using a deep learning network such as a U-net network, FCN (full Convolutional neural network). Then, the position information of each cell in the table image is determined using OpenCV (Open source computer Vision Library) and the divided table line. The cells are identified based on the table lines, even if the quality problems of backlight, ink shortage and the like exist in the image, the cells can still be accurately identified, and the table identification method is favorable for accurately identifying the table from the image with a complex background.

For another example, an outer frame of the table image is segmented from the table image by using a deep learning network, a threshold segmentation algorithm, a watershed algorithm or an edge detection algorithm, and then, by using OpenCV and the segmented table outer frame, a table line is detected and position information of each cell in the table image is determined.

Illustratively, the location information of the cell includes coordinates of the cell, such as upper left-hand coordinates (x1, y1) and lower right-hand coordinates (x2, y 2). Alternatively, the location information of the cell includes coordinates and sizes of the cell, such as upper left-hand coordinates (x1, y1) and cell length, width.

In one example, each edge line of each cell can be extended according to the coordinate of each cell, and the head and tail ends of the extended line are connected with the table outer frame or the table image outer frame, so as to realize gridding of the table image.

For example, the grid image of fig. 2B can be obtained by grid-forming the table image of fig. 2A. Each grid includes two adjacent transverse lines and two adjacent vertical lines, such as grid E in fig. 2B.

In one example, if the difference between the coordinates of the plurality of cells is smaller than the threshold, a grid line connected end to end with the table frame or with the table image frame may be obtained based on the coordinates where the difference is smaller. The table image is gridded using the obtained grid lines.

For example, as shown in FIG. 2C, if the abscissa of the upper left corner, i.e., the abscissa of the vertical left side, of cell G, H and I are very similar, then the cells may be in the same column, and the average, median, or reference value of the similar coordinates is taken as the grid line abscissa, resulting in a grid line L (see the most-sided dashed line in FIG. 2C). Therefore, the situation that excessive redundant grids appear in the gridding image due to poor image quality or errors in the position information acquisition process can be avoided.

The threshold value in this example may be set according to the size of each cell in the table image, or the size of the entire table.

As an exemplary implementation manner, referring to fig. 3, in step S13, obtaining the row and column information of each cell by using the row and column information of the grid included in each cell may include:

step S131, under the condition that the cell comprises at least two grids, the row and column serial numbers of the grids positioned at the upper left corner in the grids comprised by the cell are used for obtaining the row and column serial numbers of the cell.

In the exemplary embodiment, after the table image is gridded, the cells across rows and columns are identified according to the number of the grids included in the cells, and the row and column serial numbers of the cells across rows and columns are obtained by using the row and column serial numbers of the grids. The structure information of the row-column cells is accurately extracted, the form image is favorably restored into an editable form document, and the complete identification of the form information is realized.

As an exemplary implementation manner, referring to fig. 3, in step S13, obtaining the row and column information of each cell by using the row and column information of the grid included in each cell may further include:

step S132, the row and column number of the grids at the lower right corner and the row and column number of the grids at the upper left corner in the grids included in the cell are used to obtain the number of the rows and columns across the cell.

In the exemplary embodiment, the number of cross-row and column of cells crossing rows and columns is determined using the row and column information of the grid. The structure information of the row-column cells is accurately extracted, the form image is favorably restored into an editable form document, and the complete identification of the form information is realized.

step S133, determining the row and column number of the grid included in the cell as the row and column number of the cell when the cell includes one grid.

In the exemplary embodiment, the row and column number of the cell which does not span rows and columns is determined by the row and column information of the grid. The structure information of the row-column cells is accurately extracted, the form image is favorably restored into an editable form document, and the complete identification of the form information is realized.

For example, as shown in FIG. 4, cell C comprises four grids, and thus, cell C is a cell that spans rows and columns. In the grids included in the cell C, the grid at the upper left corner is the grid at row 2 and column 5, and therefore, the cell C can be denoted as the cell at row 2 and column 5. The grid at the lower right corner is the grid at the 3 rd row and the 6 th column, the row sequence number difference value between the grid at the lower right corner and the grid at the upper left corner is 2, the column sequence number difference value is 2, and the cell C spans two rows and two columns.

In an embodiment of the present application, an area of a region in a grid that is in a cell may be calculated, and when the area is greater than a threshold value, for example, greater than 50% or 70% of the area of the grid, it is determined that the cell includes the grid.

For example, as shown in fig. 5, when the grid lines deviate from the coordinates of the cells, the area of the region of each grid in the cell D is calculated, and it can be determined that the area of the region of the grid 1 in the cell D is greater than 50% of the area of the grid 1, and the area of the region of the grid 9 in the cell D is less than 50% of the area of the grid 9. Thus, cell D includes grid 1 but does not include grid 9.

The embodiment of the present application further provides an exemplary implementation manner, where a U-net network is used to obtain location information of each cell in a table image, where a Skip Connection structure of the U-net network is improved. Referring to fig. 6, acquiring the location information of each cell in the form image may include:

step S21, inputting the tabular image into a U-net network, wherein the U-net network comprises a pooling part and an up-sampling part, the pooling part comprises a convolution layer and a pooling layer, and the up-sampling part comprises a convolution layer and an up-sampling layer;

step S22, performing convolution and pooling on the table image by utilizing a pooling part, wherein the pooling part is used for outputting first characteristic information;

step S23, performing upsampling on the first characteristic information to obtain second characteristic information;

step S24, performing convolution and upsampling on the feature information output by the pooling part by using the upsampling part, wherein an upsampling layer of the upsampling part is used for outputting third feature information, and a convolution layer of the upsampling part is used for performing feature fusion on the first feature information, the second feature information and the third feature information;

and step S25, obtaining the position information of each cell in the form image by using the fused feature information.

The exemplary embodiment improves the jump connection structure in the U-net network, performs up-sampling on the non-bottom layer image, and adds the up-sampled non-bottom layer image into the U-net network for feature fusion, so that the recognition target in the form image, such as a form outer frame or a form line, can be more accurately segmented, and the accuracy of form recognition is improved.

Each feature information obtained in the above steps is shown in fig. 7 by taking a table image 100 of 256 × 256 × 1 as an example, and may include an image as an example. 256 × 256 × 1 indicates that the size of the form image 100 is 256 × 256 and the number of channels is 1. Referring to fig. 7, the U-net network includes a pooling part 61 and an upsampling part 62. Wherein pooling may also be referred to as down-sampling.

In the pooled portion of the U-net network, the form image 100 is processed using multiple convolutional layers and multiple pooling layers. Wherein the convolution layer is used to convolve the input feature information (see conv in fig. 7), and output

first feature information

11, 12, 13, and 14. The pooling layer is used to pool the first feature information (see posing in fig. 7), and the feature information obtained by pooling (posing) is input to the next convolutional layer. After alternating convolution (conv) and pooling (pooling), the pooling portion outputs the characteristic information 15.

Then, the feature information 16 obtained by convolving the feature information 15 is input to the up-sampling part of the U-net network. The upsampling section processes the convolved feature information 16 using the plurality of convolution layers and the plurality of upsampling layers. Wherein each upsampling layer upsamples the feature information (see Up Sampling in fig. 7), and outputs a third feature information. The U-net network adopts a jump connection structure, after third characteristic information is output by an upper sampling layer, first characteristic information with the same size as the third characteristic information can be cut by a convolution layer connected with the upper sampling layer, the third characteristic information and the first characteristic information are subjected to characteristic fusion, and then convolution is carried out. For example, in fig. 7, after the feature information 16 is up-sampled, the obtained third feature information is feature-fused with the first feature information 14, so as to obtain the feature information 31.

In the embodiment of the present application, the

first feature information

12, 13, and 14 may also be upsampled to obtain corresponding

second feature information

21, 22, and 23. When the third feature information is output by the upper sampling layer, the information for feature fusion includes the third feature information, the first feature information and the second feature information having the same size as the third feature information. The

fourth feature information

41, 42, and 43 are obtained through fusion of the three feature information. For example, the first feature information 14 is upsampled to obtain the second feature information 23. After the feature information 32 with the size of 32 × 32 is up-sampled, the obtained third feature information with the size of 64 × 64 is feature-fused with the first feature information 13 and the second feature information 23 with the size of 64 × 64, so as to obtain fourth feature information 41.

Each convolution layer is used for performing convolution conv on the fourth feature information after feature fusion. The characteristic information output by the convolutional layer will be input to the next upsampling layer. The upsampling section 62 outputs the feature information 200 having the same size as the form image 100 through the upsampling Up Sampling and convolution conv alternately performed.

Using the feature information 200, the location information of the cells in the form image can be obtained. For example, convolution conv is performed on the feature information 200 to obtain an image 300 with the number of channels being 1. The U-net network outputs an image 300 of the outer frame or line of the table that is well segmented. With OpenCV, based on the image 300, the position information of each cell can be obtained.

As an exemplary implementation manner, referring to fig. 8, a table identification method provided in an embodiment of the present application may further include:

step S31, determining an included angle between the form image and a horizontal line by using a classifier;

in step S32, when the angle between the form image and the horizontal line is not the reference angle, the form image is rotated so that the angle between the form image and the horizontal line is the reference angle.

Illustratively, the reference angle is 0, which is advantageous for correcting the tilted form image.

For example, the form image is input to the VGG16 quad classifier, which determines that the form image is at an angle of 0, 90, 180, or 270 to the horizontal line by the VGG16 quad classifier. Then, based on the angle, the form image can be processed to be at an angle of 0 ° to the horizontal.

According to the exemplary embodiment, the form image can be corrected into the image in the positive direction, so that the wrong form structure information is prevented from being acquired due to the fact that the form direction is not correct, and the accuracy of form identification is improved.

The following describes the effects of the embodiments of the present application with specific application examples:

referring to fig. 9, a view of an application scenario of the embodiment of the present application is shown. In this application example, the Web server provides a form identification service. The Web server may receive an image uploaded by a user, encode the Base64 of the image into a form recognition service. And then outputting a character string in a json format through four-stage processing, wherein the character string stores the structural information of the table. The Web server can display the table in the image in the form of editable document such as Excel or Word in the front page by using the character string.

With reference to fig. 10, the form recognition service includes four processing stages for the form image: the method comprises four stages of picture preprocessing, table detection, character recognition and table structuring.

Stage one: picture preprocessing

1. Inclination correction

In general, it is difficult to ensure that the image is perfectly forward, either by scanning or by shooting. In this step, the table image is corrected by hough transform so that the horizontal lines of the table in the table image are horizontal or the vertical lines of the table are vertical. For example, the form image of fig. 11A is rectified into the form image of fig. 11B.

2. Direction detection

After correction by Hough transform, the included angle between the tabular image and the horizontal line is 0 degree, 90 degrees, 180 degrees or 270 degrees. Using the VGG16 quad classifier, the form image is classified as one of these four angles. Then, the form image is rotated according to the determined angle of the included angle so that the form image is oriented in the forward direction, that is, the form image is rotated to have an included angle of 0 ° with the horizontal line.

In practice, a large number of images, for example 1000 images, can be randomly rotated to one of the four angles. 70% of the images were used as a training set, 20% of the images were used as a validation set, and 10% of the images were used as a test set. Through training and verification, the accuracy of the direction detection of the images in the test set can reach 93%.

3. Seal detection and seal removal.

In the financial field, financial statement images containing red seals are often processed. For such form images, the red channel in the image may be separated, the pixel with the red channel value greater than the threshold value is identified as the stamp pixel, and then the red channel value of the stamp pixel is modified to 0 to remove the stamp in the image.

And a second stage: table detection

1. Form line detection

And detecting the table lines by adopting the improved U-net network. The U-net network has fast convergence, the positions of the table lines are roughly judged through two parts of pooling and upsampling, and then the table lines are gradually finely divided, so that the effect of detecting the table lines is good.

In practice, batch normalization (Batchnormalization) may be added after each convolutional layer in the U-net network. As the depth of the network increases, the distribution of the feature values of each layer gradually approaches to the upper end and the lower end of the output interval of the activation function (the saturation interval of the activation function), which easily causes the disappearance of the gradient. The Batch Normalization can readjust the distribution of each layer of characteristic values into standard normal distribution, the characteristic values fall in the range where the activation function is sensitive to input, and even if the input changes a little, the loss function can change greatly, so that the gradient is prevented from disappearing, and meanwhile, the convergence speed is accelerated.

In addition, the exemplary implementation manner in the embodiment of the present application may also be adopted, that is, the non-underlying image is up-sampled to obtain the second feature information, and the second feature information is added to the U-net network to perform feature fusion. By improving the jump connection structure of the U-net network, the detection effect can be improved to a certain extent.

For example, a table line is detected from the table image, and an image including only the table line in fig. 2A is obtained.

2. Cell identification

Cell detection is performed by using OpenCV based on the table lines detected in the table line detection step, and the position information of each cell is obtained.

In the stage, the position information of the cells is acquired in two steps, the table lines are detected by using a deep learning image segmentation technology (U-net) network, and then the position information of the cells is identified based on the table lines, so that the cells can be accurately identified even if the picture quality problems such as backlight, ink shortage and the like exist.

And a third stage: character recognition

The characters in the form image and their position information are recognized by OCR (Optical Character Recognition).

And a fourth stage: structured form

1. Meter head identification

And extracting the position information of the first row of cells and the position information of the first column of cells, reading characters at corresponding positions, and obtaining the top header information and the side header information of each cell.

2. Rank information acquisition

Through the flow shown in fig. 3, the row-column serial number and the number of crossing rows and columns of each cell are obtained. With this information, the form image can be accurately converted into an editable form document.

The application also provides a form recognition device. Referring to fig. 12, the table recognition apparatus 200 includes:

an obtaining module 210, configured to obtain location information of each cell in the form image;

the gridding module 220 is configured to gridd the form image according to the position information of each cell to obtain row and column information of each grid;

the row-column module 230 is configured to obtain the row-column information of each cell by using the row-column information of the grids included in each cell.

Illustratively, referring to FIG. 13, rank module 230 includes:

the first sequence sub-module 231 is configured to, when a cell includes at least two grids, obtain a row-column sequence of the cell by using a row-column sequence of a grid located at an upper left corner in the grids included in the cell.

Illustratively, referring to FIG. 13, the rank module 230 further includes:

the quantity submodule 232 is configured to obtain the number of rows and columns crossing the cells by using the row and column serial numbers of the grids at the lower right corner and the row and column serial numbers of the grids at the upper left corner in the grids included in the cells.

Illustratively, referring to FIG. 13, rank module 230 includes:

the second sequence sub-module 233 is configured to determine that the row and column sequence number of the grid included in the cell is determined as the row and column sequence number of the cell if the cell includes a grid.

Illustratively, referring to fig. 13, the obtaining module 210 includes:

an input sub-module 211 for inputting the tabular image into a U-net network, the U-net network including a pooling portion and an upsampling portion, the pooling portion including a convolutional layer and a pooling layer, the upsampling portion including a convolutional layer and an upsampling layer;

a pooling sub-module 212 for convolving and pooling the table image with pooled portions, wherein the pooled portion convolution layer is used for outputting first feature information;

a first upsampling submodule 213, configured to upsample the first feature information to obtain second feature information;

a second upsampling submodule 214, configured to perform convolution and upsampling on the feature information output by the pooling part by using an upsampling part, where an upsampling layer of the upsampling part is configured to output third feature information, and a convolution layer of the upsampling part is configured to perform feature fusion on the first feature information, the second feature information, and the third feature information;

and the information acquisition submodule 215 is configured to obtain the position information of each cell in the form image by using the fused feature information.

Illustratively, referring to fig. 13, the table identifying apparatus 200 further includes:

a classification module 310, configured to determine an included angle between the form image and the horizontal line by using a classifier;

and a rotating module 320, configured to rotate the table image to make the included angle between the table image and the horizontal line be a reference angle, when the included angle between the table image and the horizontal line is not a reference angle.

The form recognition device provided by the embodiment of the application can realize the form recognition method provided by any embodiment of the application, and has corresponding beneficial effects.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 14 is a block diagram of an electronic device according to the table identification method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 14, the electronic apparatus includes: one or more processors 901, memory 902, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 14 illustrates an example of a processor 901.

Memory 902 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the table identification method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the table identification method provided herein.

The memory 902, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the table identification method in the embodiments of the present application (e.g., the obtaining module 210, the gridding module 220, and the rank module 230 shown in fig. 12). The processor 901 executes various functional applications of the server and data processing, i.e., implements the table identification method in the above method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 902.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device of the table recognition method, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include a memory remotely located from the processor 901, and these remote memories may be connected to the electronic device of the table recognition method through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the form recognition method may further include: an input device 903 and an output device 904. The processor 901, the memory 902, the input device 903, and the output device 904 may be connected by a bus or other means, and fig. 14 illustrates an example of connection by a bus.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the form recognition method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 904 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A form identification method, comprising:

acquiring position information of each cell in the form image;

2. The form recognition method of claim 1, wherein obtaining the row and column information of each cell by using the row and column information of the grid included in each cell comprises:

and under the condition that the cell comprises at least two grids, obtaining the row and column serial numbers of the cells by utilizing the row and column serial numbers of the grids positioned at the upper left corner in the grids comprised by the cell.

3. The form identification method of claim 2, further comprising:

and obtaining the row-column crossing quantity of the cells by utilizing the row-column sequence numbers of the grids positioned at the lower right corner and the row-column sequence numbers of the grids positioned at the upper left corner in the grids included by the cells.

4. The form recognition method of claim 1, wherein obtaining the row and column information of each cell by using the row and column information of the grid included in each cell comprises:

and under the condition that the cell comprises one grid, determining the row and column serial number of the grid comprised by the cell as the row and column serial number of the cell.

5. The form recognition method of any one of claims 1 to 4, wherein obtaining location information for each cell in the form image comprises:

inputting the tabular image into a U-net network, the U-net network comprising a pooled portion and an upsampled portion, the pooled portion comprising a convolutional layer and a pooled layer, the upsampled portion comprising a convolutional layer and an upsampled layer;

convolving and pooling the table image with the pooling part, wherein the pooling part convolution layer is used for outputting first feature information;

the first characteristic information is up-sampled to obtain second characteristic information;

performing convolution and upsampling on the feature information output by the pooling part by using the upsampling part, wherein an upsampling layer of the upsampling part is used for outputting third feature information, and a convolution layer of the upsampling part is used for performing feature fusion on the first feature information, the second feature information and the third feature information;

and obtaining the position information of each cell in the form image by using the fused feature information.

6. The form identification method of any of claims 1 to 4, further comprising:

determining an included angle between the form image and a horizontal line by using a classifier;

and under the condition that the included angle between the table image and the horizontal line is not the reference angle, rotating the table image to enable the included angle between the table image and the horizontal line to be the reference angle.

7. A form recognition apparatus comprising:

8. The form recognition apparatus of claim 7, wherein the rank module comprises:

and the first sequence number submodule is used for obtaining the row and column sequence numbers of the cells by utilizing the row and column sequence numbers of the grids positioned at the upper left corner in the grids included in the cells under the condition that the cells include at least two grids.

9. The form identification apparatus of claim 8, wherein the rank module further comprises:

and the quantity submodule is used for obtaining the row-column crossing quantity of the cells by utilizing the row-column serial numbers of the grids positioned at the lower right corner and the row-column serial numbers of the grids positioned at the upper left corner in the grids included by the cells.

10. The form recognition apparatus of claim 7, wherein the rank module comprises:

and the second sequence number submodule is used for determining that the row and column sequence number of the grid included by the cell is determined as the row and column sequence number of the cell if the cell includes one grid.

11. The form recognition apparatus of any one of claims 7 to 10, wherein the obtaining module comprises:

an input sub-module for inputting the tabular image into a U-net network, the U-net network comprising a pooled portion and an upsampled portion, the pooled portion comprising a convolutional layer and a pooled layer, the upsampled portion comprising a convolutional layer and an upsampled layer;

the pooling submodule is used for performing convolution and pooling on the table image by using the pooling part, wherein the pooling part convolution layer is used for outputting first characteristic information;

the first up-sampling sub-module is used for up-sampling the first characteristic information to obtain second characteristic information;

a second upsampling sub-module, configured to perform convolution and upsampling on the feature information output by the pooling part by using the upsampling part, where an upsampling layer of the upsampling part is configured to output third feature information, and a convolution layer of the upsampling part is configured to perform feature fusion on the first feature information, the second feature information, and the third feature information;

and the information acquisition submodule is used for acquiring the position information of each cell in the form image by using the fused feature information.

12. The form recognition apparatus according to any one of claims 7 to 10, further comprising:

the classification module is used for determining an included angle between the form image and the horizontal line by using the classifier;

and the rotating module is used for rotating the table image so as to enable the included angle between the table image and the horizontal line to be the reference angle under the condition that the included angle between the table image and the horizontal line is not the reference angle.

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.