CN113269153A

CN113269153A - Form identification method and device

Info

Publication number: CN113269153A
Application number: CN202110715112.5A
Authority: CN
Inventors: 罗奥升; 褚正全
Original assignee: China Electronic System Technology Co ltd
Current assignee: China Electronic System Technology Co ltd
Priority date: 2021-06-26
Filing date: 2021-06-26
Publication date: 2021-08-17
Anticipated expiration: 2041-06-26
Also published as: CN113269153B

Abstract

The application discloses a form identification method, which comprises the following steps: acquiring an image to be identified; inputting the image to be recognized into the trained table detection model to obtain table coordinate information of the table image in the image to be recognized; inputting the image to be recognized into the trained table line detection model to obtain table line position information of the table image in the image to be recognized; determining the position information and the text information of each cell in the table image according to the table coordinate information and the table line position information; and generating table information corresponding to the table image according to the position information and the text information of each cell in the table image. Like this, the form information can be extracted from the image automatically to this application, and need not artifical manual form extraction from the image to avoid appearing artifical form extraction in-process and appearing because the problem that the information extraction error that the operation error leads to, inefficiency, consuming time and difficultly, improved efficiency and the precision of extracting the form information from the image, and then improved user experience.

Description

Form identification method and device

Technical Field

The present application relates to the field of image processing, and in particular, to a method and an apparatus for identifying a table.

Background

Documents are an integral part of modern offices, and forms, as a common part of them, contain a lot of key and refined information. However, it is time-consuming and labor-consuming to manually extract and identify table information from a picture due to the difference of table styles, and redundant text and noise interference in a picture document can have a great influence on a traditional table extraction algorithm. Therefore, a new table identification method is needed.

Disclosure of Invention

The application provides a form identification method, so that the efficiency and the accuracy of extracting form information from an image can be improved, and further the user experience is improved.

In a first aspect, the present application provides a table identification method, including:

acquiring an image to be identified, wherein the image to be identified comprises a form image;

inputting the image to be recognized into a trained table detection model to obtain table coordinate information of the table image in the image to be recognized;

inputting the image to be recognized into a trained table line detection model to obtain table line position information of the table image in the image to be recognized;

determining the position information and the text information of each cell in the table image according to the table coordinate information and the table line position information;

and generating table information corresponding to the table image according to the position information and the text information of each cell in the table image.

Optionally, the table detection model is a YOLO model; the table detection model is obtained by training based on a sample image comprising a sample table image and the corresponding relation between the position information of the sample table image in the sample image; and/or the table coordinate information of the table image in the image to be recognized comprises coordinate information of four endpoints or coordinate information of two diagonal endpoints of the table image in the image to be recognized.

Optionally, the table line detection model is a Unet model; the table line detection model is obtained by training based on a sample image comprising a sample table image and the corresponding relation between coordinate information of pixel points on each table line in the sample table image in the sample image; and/or the table line position information of the table image in the image to be recognized comprises coordinate information of pixel points on each table line in the table image in the image to be recognized.

Optionally, an output item of the form line detection model is a two-channel image corresponding to the image to be recognized, where the two-channel image corresponding to the image to be recognized includes form line position information of the form image.

Optionally, the determining, according to the table coordinate information and the table line position information, the position information and the text information of each cell in the table image includes:

determining the end point coordinate information of each cell in the table image according to the table coordinate information and the table line position information;

and determining the position information and the text information of each cell in the table image according to the endpoint coordinate information of each cell in the table image.

Optionally, the determining, according to the table coordinate information and the table line position information, the endpoint coordinate information of each cell in the table image includes:

determining the form image according to the form coordinate information;

determining the endpoint coordinate information and the type of each table line in the table image according to the table line position information; wherein the form line types include horizontal lines and vertical lines;

determining an intersection point set of the table image according to the endpoint coordinate information of each table line in the table image and the type of the table line; the intersection set of the table image comprises coordinate information of all intersections of each horizontal line table line and each vertical line table line in the table image;

and determining the endpoint coordinate information of each cell in the table image according to the intersection point set of the table image.

Optionally, the determining, according to the table line position information, the endpoint coordinate information and the table line type of each table line in the table image includes:

aiming at each table line in the table image, determining a minimum circumscribed rectangle corresponding to the table line according to the eight-connected region of each pixel point on the table line; determining the type of the table line according to the length and the width of the minimum circumscribed rectangle corresponding to the table line; and determining the endpoint coordinate information of the table line according to the table line type of the table line.

Optionally, the coordinate information of the intersection point includes an abscissa and an ordinate; determining the endpoint coordinate information of each cell in the form image according to the intersection point set of the form image, wherein the determining comprises the following steps:

regarding each intersection in the intersection set of the form image, taking the intersection as a target intersection, and if a first intersection, a second intersection and a third intersection corresponding to the target intersection exist in the intersection set, determining the target intersection and the first intersection, the second intersection and the third intersection corresponding to the target intersection to determine the endpoint coordinate information of one cell in the form image;

the first intersection point is an intersection point which is on the same horizontal line table line with the target intersection point, is adjacent to the target intersection point and has the same horizontal coordinate with the target intersection point; the second intersection point is an intersection point which is on the same vertical line table line with the target intersection point, is adjacent to the target intersection point and has the same ordinate as the target intersection point; the third intersection point is an intersection point which is on the same vertical line table line with the first intersection point, is adjacent to the first intersection point, has the same ordinate as the first intersection point, is on the same horizontal line table line with the second intersection point, is adjacent to the second intersection point, and has the same abscissa as the second intersection point.

Optionally, the determining, according to the endpoint coordinate information of each cell in the form image, the position information and the text information of each cell in the form image includes:

for each cell in the table image, determining the position information of the cell in the table image according to the endpoint coordinate information of the cell; determining an image area of the cell according to the position information of the cell; and performing character recognition on the image area of the cell to obtain text information corresponding to the cell.

In a second aspect, the present application provides a form recognition apparatus, the apparatus comprising:

the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring an image to be identified, and the image to be identified comprises a form image;

the second acquisition unit is used for inputting the image to be recognized into the trained table detection model to obtain table coordinate information of the table image in the image to be recognized;

a third obtaining unit, configured to input the image to be recognized into a trained table line detection model, so as to obtain table line position information of a table image in the image to be recognized;

the information determining unit is used for determining the position information and the text information of each cell in the table image according to the table coordinate information and the table line position information;

and the table generating unit is used for generating the table information corresponding to the table image according to the position information and the text information of each cell in the table image.

Optionally, the information determining unit is specifically configured to:

determining the form image according to the form coordinate information;

Optionally, the information determining unit is specifically configured to:

Optionally, the coordinate information of the intersection point includes an abscissa and an ordinate; the information determining unit is specifically configured to:

Optionally, the information determining unit is specifically configured to:

In a third aspect, the present application provides a readable medium comprising executable instructions, which when executed by a processor of an electronic device, perform the method according to any of the first aspect.

In a fourth aspect, the present application provides an electronic device comprising a processor and a memory storing execution instructions, wherein when the processor executes the execution instructions stored in the memory, the processor performs the method according to any one of the first aspect.

According to the technical scheme, the form identification method is provided, in the embodiment, an image to be identified can be obtained firstly, wherein the image to be identified comprises a form image; then, the image to be recognized can be input into the trained table detection model, and table coordinate information of the table image in the image to be recognized is obtained; then, the image to be recognized can be input into a trained table line detection model, and table line position information of the table image in the image to be recognized is obtained; then, according to the table coordinate information and the table line position information, the position information and the text information of each cell in the table image can be determined; finally, table information corresponding to the table image can be generated according to the position information and the text information of each cell in the table image. In this way, the present application can generate table information corresponding to the table image, that is, a table is extracted from an image, based on the position information and text information of each cell in the table image. Therefore, the table coordinate information and the table line position information can be obtained by using a table detection model and a table line detection model, and the position information and the text information of each cell in the table image can be identified according to the table coordinate information and the table line position information, so that the table information corresponding to the table image can be generated according to the position information and the text information of each cell in the table image, namely, the table is extracted from the image, namely, the table information can be automatically extracted from the image without manually extracting the table from the image as in the prior art, thereby avoiding the problems of information extraction error, low efficiency, time and labor consumption caused by operation error in the manual table extraction process, and improving the efficiency and the accuracy of extracting the table information from the image, thereby improving the user experience.

Further effects of the above-mentioned unconventional preferred modes will be described below in conjunction with specific embodiments.

Drawings

In order to more clearly illustrate the embodiments or prior art solutions of the present application, the drawings needed for describing the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and that other drawings can be obtained by those skilled in the art without inventive exercise.

Fig. 1 is a schematic flowchart of a table identification method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a table identification apparatus according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following embodiments and accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Documents are currently an integral part of modern offices, and forms, as a common part of them, contain a lot of key and refined information. However, it is time-consuming and labor-consuming to manually extract and identify table information from a picture due to the difference of table styles, and redundant text and noise interference in a picture document can have a great influence on a traditional table extraction algorithm. A general table extraction algorithm comprises two steps: table location detection and table structure parsing (identification), generally speaking, these algorithms can be classified into the following categories: 1. an algorithm based on a predefined table structure. This approach matches forms in documents by designing as many form templates as possible, and the obvious disadvantage of this algorithm is that it is too heavily dependent on preset form templates, and even if exceptions occur, it is classified into known template classes. 2. Heuristic algorithms locate and parse tables in a document by setting a series of rules, such as the character of the word arrangement of the tables, the spatial distribution characteristics of the text, and so on. The algorithm can have certain effect on simple tables, but is limited by the complexity of rules, and the identification and analysis effects on special tables are not satisfactory. Therefore, a new table identification method is needed.

In the embodiment, an image to be recognized may be obtained first, where the image to be recognized includes a form image; then, the image to be recognized can be input into the trained table detection model, and table coordinate information of the table image in the image to be recognized is obtained; then, the image to be recognized can be input into a trained table line detection model, and table line position information of the table image in the image to be recognized is obtained; then, according to the table coordinate information and the table line position information, the position information and the text information of each cell in the table image can be determined; finally, table information corresponding to the table image can be generated according to the position information and the text information of each cell in the table image. In this way, the present application can generate table information corresponding to the table image, that is, a table is extracted from an image, based on the position information and text information of each cell in the table image. Therefore, the table coordinate information and the table line position information can be obtained by using a table detection model and a table line detection model, and the position information and the text information of each cell in the table image can be identified according to the table coordinate information and the table line position information, so that the table information corresponding to the table image can be generated according to the position information and the text information of each cell in the table image, namely, the table is extracted from the image, namely, the table information can be automatically extracted from the image without manually extracting the table from the image as in the prior art, thereby avoiding the problems of information extraction error, low efficiency, time and labor consumption caused by operation error in the manual table extraction process, and improving the efficiency and the accuracy of extracting the table information from the image, thereby improving the user experience.

It should be noted that the embodiment of the present application may be applied to an electronic device (such as a mobile phone, a tablet, a computer, etc.) or a server. In addition to the above-mentioned embodiments, other embodiments are also possible, and are not limited herein.

Various non-limiting embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a table identification method in an embodiment of the present application is shown, and in this embodiment, the method may include the following steps:

s101: and acquiring an image to be identified.

In this embodiment, the image to be recognized may be obtained first, for example, the image may be obtained by a user through a terminal, or may be stored in a preset storage device in advance. It should be noted that the image to be recognized may be understood as an image from which table information needs to be extracted, where the image to be recognized may include a table image.

S102: and inputting the image to be recognized into the trained table detection model to obtain table coordinate information of the table image in the image to be recognized.

In this embodiment, after the image to be recognized is acquired, the image to be recognized may be input into a trained table detection model, so as to obtain table coordinate information of the table image in the image to be recognized. In this embodiment, the form coordinate information of the form image in the image to be recognized may be understood as coordinate information of the form image in the image to be recognized, for example, the form coordinate information of the form image in the image to be recognized may include coordinate information of four endpoints (such as an upper left endpoint, a lower right endpoint, and an upper right endpoint) or coordinate information of two diagonal endpoints (such as an upper left endpoint and a lower right endpoint, or an upper right endpoint and a lower left endpoint) of the form image in the image to be recognized.

In this embodiment, the form detection model is trained based on a sample image including a sample form image and a corresponding relationship between position information of the sample form image in the sample image, and it is understood that the form detection model is trained based on a preset sample training set, where the preset sample training set includes a plurality of sets of training samples, each set of training samples includes a sample image of the sample form image, and position information of the sample form image in the sample image (i.e., form coordinate information of the sample form image in the sample image, such as coordinate information of four end points (such as an upper left end point, a lower right end point, and an upper right end point) or coordinate information of two opposite end points (such as an upper left end point and a lower right end point, or, the top-right endpoint and the bottom-left endpoint)). It should be noted that the preset sample training set of the table detection model may be in an XML format, where each set of training samples may further include basic information such as names, positions, and widths and heights of corresponding pictures and position information of all tables in the pictures, which is generally coordinates of the upper left corner of the target frame and width and height values of the target frame.

The table detection model may be a YOLO model. In one implementation mode, a table detection model is constructed based on a YOLO model, a TesnoorFlow/Keras deep learning framework is used, the model receives an image to be recognized as input, high-level and abstract features of the image are extracted through a DarkNet convolutional neural network in the table detection model, then according to the features, coordinate correction and target probability values of a prediction frame are obtained through a classifier in the table detection model and combined with a preset frame, and a Non-Maximum Suppression algorithm is used for removing repeated frames to obtain a final prediction result, namely table coordinate information of the table image in the image to be recognized.

As an example, the form inspection model may be trained by using TesnorFlow/Keras, wherein the form inspection model is trained for 140 generations, the initial learning rate is 0.001, the optimizer is SGD, the loss function is consistent with the original YOLO model, and includes the loss of the center point and width of the inspection box and the target and category prediction loss, and finally reaches 95% of the mep in the verification set; the segmentation model was trained for 100 generations, the initial learning rate was 0.0001, the optimizer was Adam, binary _ cross entropy was used as the loss function, and the training and validation acc at final convergence were 92.3% and 91.1%, respectively.

S103: and inputting the image to be recognized into a trained table line detection model to obtain table line position information of the table image in the image to be recognized.

In this embodiment, after the image to be recognized is acquired, the image to be recognized may be input into a trained table line detection model, so as to obtain table line position information of the table image in the image to be recognized. In this embodiment, the table line position information of the table image in the image to be recognized may be understood as coordinate information of each table line in the table image in the image to be recognized, for example, the table line position information of the table image in the image to be recognized may include coordinate information of pixel points on each table line in the table image in the image to be recognized, that is, coordinate information of each pixel point on each table line in the image to be recognized.

In this embodiment, the form line detection model may be obtained by training based on a sample image including a sample form image and a correspondence relationship between coordinate information of pixel points on each form line in the sample form image in the sample image. It can be understood that the form line detection model is obtained by training based on a preset sample training set, where the preset sample training set includes a plurality of groups of training samples, and each group of training samples includes a sample image of a sample form image and coordinate information of pixel points on each form line in the sample form image in the sample image. It should be noted that the preset sample training set of the table line segmentation model is generally in JSON format, and may further include base64 encoding information of the corresponding image and coordinate information of two end points of all frame lines of the table in the image. It should be noted that, the base64 coding information of the image is automatically generated by the annotation software, and if the original picture exists, the base64 coding information may not be needed, and the original picture may be directly read; if there is no original picture, the picture can be obtained only according to the information of this tag (i.e. base64 encoded information), thereby playing the roles of backup, easy reading and network transmission.

In one implementation, the output item of the table line detection model is a two-channel image corresponding to the image to be recognized, where the two-channel image corresponding to the image to be recognized includes table line position information of the table image. For example, the pixel points of the first channel in the two-channel image are the pixel points on all the horizontal table lines, and the pixel points of the second channel are the pixel points on all the vertical table lines, at this time, the table line position information of the table image includes the coordinate information of the pixel points of the two channels in the two-channel image. It should be noted that, in the present embodiment, the size of the two-channel image coincides with the size of the image to be recognized. The table line detection model is a Unet model, a TesnoorFlow/Keras deep learning framework is used, the model can be roughly divided into an encoding part and a decoding part, wherein the encoding part extracts the features of an input image through a multilayer convolutional neural network and gradually reduces the resolution of a feature map; and the decoding part restores the resolution of the feature map step by step through upsampling and convolution and completes the decoding of the features at the same time, and finally completes the pixel-by-pixel classification by using the convolution with the convolution kernel size of 1 multiplied by 1. The output result of the table line detection model is a two-channel image with the same input size, the vector mode length of each pixel point position in the two-channel image is equal to 2, and the values of the vector mode length represent the probability values of the pixel point at the position belonging to two categories, namely a horizontal table line and a vertical table line. The value of each pixel position in the output image (i.e. the two-channel image) is a vector, two elements are in the vector, and the value of each element is between 0 and 1 (for example: the two elements are 0.1 and 0.9 respectively, 0.1 represents that the pixel has a probability of belonging to the horizontal table line, and 0.9 represents that the pixel has a probability of belonging to the vertical table line). The probability distribution is represented in the form of two values (one-hot), which facilitates the computation of the loss function during training.

The form line detection model can be trained by using TesnoorFlow/Keras, wherein the form line detection model can be trained for 100 generations, the initial learning rate is 0.0001, the optimizer is Adam, binary _ cross entropy is used as a loss function, and the final convergence training and verification acc are respectively 92.3% and 91.1%.

S104: and determining the position information and the text information of each cell in the table image according to the table coordinate information and the table line position information.

In this embodiment, the end point coordinate information of each cell in the table image, for example, four end point coordinate information (i.e., coordinate information of four end points at the top left corner, the bottom left corner, the top right corner, and the bottom right corner) of each cell in the table image, may be determined according to the table coordinate information and the table line position information.

As an example, the form image may be determined according to the form coordinate information, and it may be understood that an image area corresponding to the form image is determined in the image to be recognized or the two-channel image according to the form coordinate information.

Then, according to the table line position information, the end point coordinate information of each table line in the table image and the table line type can be determined, wherein the table line type comprises a horizontal line and a vertical line. Specifically, according to the two-channel image, binary images of two types of table lines in the table image can be obtained respectively, for a binary image of each type of table line in the table image, the minimum circumscribed rectangle corresponding to the table line can be determined according to the eight connected regions of each pixel point on the binary image of the type of table line, it can be understood that, for each table line in the table image, the minimum circumscribed rectangle corresponding to the table line can be determined according to the eight connected regions of each pixel point on the table line, that is, for the binary image of each table line in the table image, the minimum circumscribed rectangle corresponding to the table line can be determined according to the eight connected regions of each pixel point on the binary image of the table line, for example, since the binary image indicates all pixel point positions belonging to the type of table line, in order to obtain more accurate frame line information, firstly, all eight-connected regions in a binary image are extracted by using OpenCV, then frame lines with line widths lower than a threshold value are screened and removed, and the minimum external rectangle of the frame lines is calculated. Then, the type of the table line can be determined according to the length and the width of the minimum circumscribed rectangle corresponding to the table line; it can be understood that, in the binary image output by the model, all the pixel points belonging to the wire frame are marked, but due to the accuracy limitation of the model, an isolated noise point or some obvious misbranching lines often appear, a threshold is set here, prediction lines (points) which are obviously too short, too thin and deviated are removed, because a form line has a width, occupies pixels (the line on the theoretical label has no width, and only has coordinates of two points on the left and right), and is not uniform, so that a minimum circumscribed rectangle needs to be calculated, that is, a rectangle circumscribed by all the pixel points of the form line, and the form line can be judged to be vertical or horizontal (that is, the form line is a horizontal line or a vertical line) according to the length and width of the rectangle, for example, the form line is a vertical line when the length is greater than the width, and the form line is a horizontal line when the length is less than the width. Then, the end point coordinate information of the table line is determined according to the table line type of the table line, and it can be understood that after the minimum circumscribed rectangle is determined, the two end points of the rectangle can be determined to be taken as the coordinates of the left and right points or the upper and lower points of the table line (i.e., the frame line).

Then, the intersection point set of the table image can be determined according to the endpoint coordinate information of each table line in the table image and the type of the table line. The intersection set of the table image includes coordinate information of all intersections of each horizontal line table line and each vertical line table line in the table image, that is, includes coordinate information of the intersection of each horizontal line table line and each vertical line table line in the image to be identified.

Then, the end point coordinate information of each cell in the table image can be determined according to the intersection point set of the table image. The coordinate information of the intersection point comprises an abscissa and an ordinate; specifically, for each intersection in the intersection set of the table image, the intersection is used as a target intersection, and if a first intersection, a second intersection, and a third intersection corresponding to the target intersection exist in the intersection set, the target intersection and the first intersection, the second intersection, and the third intersection corresponding to the target intersection are used to determine the end point coordinate information of one cell in the table image. The first intersection point is on the same cross line table line with the target intersection point, is adjacent to the target intersection point and has the same abscissa with the target intersection point, namely the first intersection point and the target intersection point are on the same cross line table line, the first intersection point is adjacent to the target intersection point, and the abscissa of the first intersection point is the same as the abscissa of the target intersection point; the second intersection point is an intersection point which is on the same vertical line table line with the target intersection point, is adjacent to the target intersection point and has the same vertical coordinate with the target intersection point, namely the second intersection point and the target intersection point are on the same vertical line table line, the second intersection point is adjacent to the target intersection point, and the vertical coordinate of the second intersection point is the same as the vertical coordinate of the target intersection point; the third intersection point is an intersection point which is on the same vertical line table line with the first intersection point, is adjacent to the first intersection point, has the same vertical coordinate as the first intersection point, is on the same horizontal line table line with the second intersection point, is adjacent to the second intersection point and has the same horizontal coordinate as the second intersection point, namely, the third intersection point is on the same vertical line table line with the first intersection point, and the third intersection point is on the same vertical line table line with the first intersection pointThe point is adjacent to the first intersection point, the ordinate of the third intersection point is the same as the ordinate of the first intersection point, the third intersection point and the second intersection point are on the same horizontal line table line, the third intersection point is adjacent to the second intersection point, and the abscissa of the third intersection point is the same as the abscissa of the second intersection point. For example, after the horizontal line table lines and the vertical line table lines are obtained, the intersection points of all the horizontal line table lines and the vertical line table lines are extracted, all the intersection points in the intersection point set are sequentially sequenced from left to right from top to bottom, then each point in the intersection point set is sequentially used as a starting point (marked as point _ lt) at the upper left corner, and the coordinates of other three end points of the cell are searched; because the intersection point sets are ordered, in the x direction, only whether a point (marked as point _ rt) immediately behind the starting point in the intersection point set is the same as the vertical coordinate of the starting point and whether a straight line is connected or not needs to be judged, if the condition is met, a cell with the starting point as the upper left end point exists, the upper right end point of the cell is the point _ rt, and otherwise, the cell does not exist. In the y direction, firstly, recording all points (1-10 in the figure) which are the same as the horizontal coordinate of the starting point and are connected by straight lines, respectively taking the points as the lower left end point (marked as point _ ld) of the cell, combining the upper right end point _ rt to obtain the coordinate of the lower right end point (marked as point _ rd), and if point _ rt is in the intersection set and is connected by straight lines (11 in the figure) with point _ ld and point _ rt, respectively, the point _ rd is the end point of the lower right end of the cell. So far, four end points of the cells are determined, all the cells can be obtained only by traversing the intersection set once, and the time complexity of the whole algorithm flow is O (n)³)。

After determining the end point coordinate information of each cell in the form image, the position information and the text information of each cell in the form image may be determined according to the end point coordinate information of each cell in the form image. Specifically, for each cell in the form image, the position information of the cell in the form image may be determined according to the endpoint coordinate information of the cell, that is, the position information of the pixel point of each edge of the cell in the form image is determined according to the endpoint coordinate information of the cell. Then, the image area of the cell can be determined according to the position information of the cell, that is, the image area of the cell in the image to be recognized is determined according to the position information of the cell. Then, the text information corresponding to the cell may be obtained by performing character recognition on the image area of the cell, for example, by performing character recognition on the image area of the cell using OCR, and the text information corresponding to the cell may be obtained. It should be noted that the position information of each cell in the table image may also include the merge and nesting situation between each cell. It can be understood that, because the endpoint coordinates of each cell are determined, only simple judgment needs to be performed according to the character detection box and the endpoint coordinates identified by OCR, if the text detection box is located inside a cell, the character information can be divided into corresponding cells, and the merging and nesting conditions of the cells can be judged according to the width, height and position information among the cells, so that an Excel table can be generated finally according to the analysis of characters and the table structure. The OCR text detection box is generally a quadrangle similar to a rectangular box to wrap characters, the cell also has coordinates of four endpoints, and when the coordinates of the four endpoints of the detection box are within the four coordinates of the cell, the text box can be considered to be located inside the cell, that is, the text is located inside the cell.

S105: and generating table information corresponding to the table image according to the position information and the text information of each cell in the table image.

After the position information and the text information of each cell in the form image are obtained, the form information corresponding to the form image can be generated according to the position information and the text information of each cell in the form image. That is, an Excel form can be generated according to analysis of characters and a form structure (that is, position information and text information of each cell in a form image), for example, a form frame is drawn according to position information of each cell in the form image (such as end point coordinate information of each cell, merging and nesting conditions between cells, and the like), and then text information corresponding to each cell in the form frame is filled into the form frame, so that the Excel form can be obtained. At this point, the conversion process from the table picture to the Excel table is completed, that is, the table structure is finally analyzed.

In this embodiment, an image to be recognized may be obtained first, where the image to be recognized includes a form image; then, the image to be recognized can be input into the trained table detection model, and table coordinate information of the table image in the image to be recognized is obtained; then, the image to be recognized can be input into a trained table line detection model, and table line position information of the table image in the image to be recognized is obtained; then, according to the table coordinate information and the table line position information, the position information and the text information of each cell in the table image can be determined; finally, table information corresponding to the table image can be generated according to the position information and the text information of each cell in the table image. In this way, the present application can generate table information corresponding to the table image, that is, a table is extracted from an image, based on the position information and text information of each cell in the table image. Therefore, the table coordinate information and the table line position information can be obtained by using a table detection model and a table line detection model, and the position information and the text information of each cell in the table image can be identified according to the table coordinate information and the table line position information, so that the table information corresponding to the table image can be generated according to the position information and the text information of each cell in the table image, namely, the table is extracted from the image, namely, the table information can be automatically extracted from the image without manually extracting the table from the image as in the prior art, thereby avoiding the problems of information extraction error, low efficiency, time and labor consumption caused by operation error in the manual table extraction process, and improving the efficiency and the accuracy of extracting the table information from the image, thereby improving the user experience.

It is emphasized that the present application is mainly based on the combination of a deep learning model and a conventional image processing method, and aims to solve the problem of extraction of multiple types of tables in images and documents. The main technical scheme comprises the following parts: firstly, aiming at the problem that the table positions in the images and the documents are uncertain, the method and the device use a target detection model based on YOLO, complete the detection of all table targets in the images and the documents through self-defined training and learning, and cut the detected targets from the images for subsequent recognition. Then, aiming at the problem of identifying the table frame, the image segmentation model based on UNet improvement is used, and through training and learning on a self-defined data set, the model can accurately segment all the frame lines in the table detected in the previous step; then, through image post-processing, all horizontal and vertical outlines in the table are determined. Finally, aiming at the problem that the style of the table cells is complex, the method provides a cell division algorithm based on horizontal and vertical frame lines, the position, height, width and other information of the table cells are judged through the intersection points and the end points of the frame lines, on the basis, the corresponding character information is collected into the corresponding cells by combining the detection result of the character OCR, and finally the analysis of the table structure is completed. Through the scheme, the extraction of multiple types (such as three-line tables) in natural images and actual documents can be accurately finished, in the testing stage, the table detection accuracy of higher than 95% and the frame line identification accuracy of 93.4% in a TableBank data set and a custom data set are achieved, and the result is presented in an Excel form.

Fig. 2 shows a specific embodiment of a table identification apparatus according to the present application. The apparatus of this embodiment is a physical apparatus for executing the method of the above embodiment. The technical solution is essentially the same as the above embodiments, and the apparatus in this embodiment includes:

a first acquiring unit 201, configured to acquire an image to be recognized, where the image to be recognized includes a form image;

a second obtaining unit 202, configured to input the image to be recognized into a trained table detection model, so as to obtain table coordinate information of a table image in the image to be recognized;

a third obtaining unit 203, configured to input the image to be recognized into a trained table line detection model, so as to obtain table line position information of a table image in the image to be recognized;

an information determining unit 204, configured to determine, according to the table coordinate information and the table line position information, position information and text information of each cell in the table image;

a table generating unit 205, configured to generate table information corresponding to the table image according to the position information and the text information of each cell in the table image.

Optionally, an output item of the form line detection model is a two-channel image corresponding to the image to be recognized, where the two-channel image corresponding to the image to be recognized includes form line position information of the form image, and may be understood as including various types of form line position information.

Optionally, the information determining unit 204 is specifically configured to:

determining the form image according to the form coordinate information;

Optionally, the information determining unit 204 is specifically configured to:

Optionally, the coordinate information of the intersection point includes an abscissa and an ordinate; the information determining unit 204 is specifically configured to:

Optionally, the information determining unit 204 is specifically configured to:

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. On the hardware level, the electronic device comprises a processor and optionally an internal bus, a network interface and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry standard architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry standard architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus.

And the memory is used for storing the execution instruction. In particular, a computer program that can be executed by executing instructions. The memory may include both memory and non-volatile storage and provides execution instructions and data to the processor.

In a possible implementation manner, the processor reads the corresponding execution instruction from the nonvolatile memory into the memory and then runs the corresponding execution instruction, and the corresponding execution instruction can also be obtained from other equipment so as to form the table identification device on a logic level. The processor executes the execution instructions stored in the memory to implement the table identification method provided in any embodiment of the present application through the executed execution instructions.

The method performed by the table identification apparatus according to the embodiment shown in fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The embodiment of the present application further provides a readable storage medium, where the readable storage medium stores an execution instruction, and when the stored execution instruction is executed by a processor of an electronic device, the electronic device can be caused to execute the table identification method provided in any embodiment of the present application, and is specifically configured to execute the method described in the above table identification.

The electronic device described in the foregoing embodiments may be a computer.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of form recognition, the method comprising:

2. The method of claim 1, wherein the table detection model is a YOLO model; the table detection model is obtained by training based on a sample image comprising a sample table image and the corresponding relation between the position information of the sample table image in the sample image; and/or the table coordinate information of the table image in the image to be recognized comprises coordinate information of four endpoints or coordinate information of two diagonal endpoints of the table image in the image to be recognized.

3. The method of claim 1, wherein the table line detection model is a Unet model; the table line detection model is obtained by training based on a sample image comprising a sample table image and the corresponding relation between coordinate information of pixel points on each table line in the sample table image in the sample image; and/or the table line position information of the table image in the image to be recognized comprises coordinate information of pixel points on each table line in the table image in the image to be recognized.

4. The method of claim 3, wherein an output item of the form line detection model is a two-channel image corresponding to the image to be recognized, wherein the two-channel image corresponding to the image to be recognized comprises form line position information of the form image.

5. The method of any of claims 1-4, wherein determining the location information and text information for each cell in the tabular image from the table coordinate information and the table line location information comprises:

6. The method of claim 5, wherein determining endpoint coordinate information for each cell in the tabular image from the table coordinate information and the table line position information comprises:

determining the form image according to the form coordinate information;

7. The method of claim 6, wherein determining endpoint coordinate information and form line type for each form line in the form image based on the form line location information comprises:

8. The method of claim 6, wherein the coordinate information of the intersection point comprises an abscissa and an ordinate; determining the endpoint coordinate information of each cell in the form image according to the intersection point set of the form image, wherein the determining comprises the following steps:

9. The method of claim 5, wherein determining the position information and the text information of each cell in the table image according to the endpoint coordinate information of each cell in the table image comprises:

10. A form recognition apparatus, the apparatus comprising: