CN112818813A

CN112818813A - Method and device for identifying table information in image, electronic equipment and storage medium

Info

Publication number: CN112818813A
Application number: CN202110112628.0A
Authority: CN
Inventors: 郑磊波; 王洪伟; 刘天悦
Original assignee: Chengdu Kingsoft Interactive Entertainment Technology Co ltd; Beijing Kingsoft Software Co Ltd
Current assignee: Chengdu Kingsoft Interactive Entertainment Technology Co ltd; Beijing Kingsoft Software Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2021-05-18
Also published as: CN109726643B; CN109726643A; CN112818812B; CN112818812A

Abstract

The embodiment of the invention provides a method and a device for identifying table information in an image, electronic equipment and a storage medium, wherein the method comprises the following steps: receiving a target image with a table; determining a form image containing a form from the target image; performing text line detection on the form image, and determining the position of a text line in the form image; and identifying the table image according to the position of the text line to obtain table information of the table image, wherein the table information comprises character information and table structure information. Because the recognized table information includes the character information and the table structure information, not only the character content in the table, the diversity of the table recognition results in the image is improved, and the further processing such as the table recovery is carried out subsequently.

Description

Method and device for identifying table information in image, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for identifying table information in an image, an electronic device, and a storage medium.

Background

In the field of image processing, there is an image including a table, and in order to obtain the content of the table in the image, the image including the table needs to be identified.

The current process of identifying the table in the image generally comprises the following steps: firstly, extracting horizontal lines and vertical lines in an image, and judging that no table exists in an area if no horizontal line or no vertical line exists; and if the horizontal lines and the vertical lines exist, determining the position of the table in the image by adopting a region growing method, and further performing text recognition on the table in the image according to the position of the table in the image to obtain the character content in the table in the image.

In the process of identifying the table in the image, the obtained identification result is only the text content in the table, and the information is less, so that the table is not beneficial to further processing such as subsequent recovery.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for identifying table information in an image, so as to improve the diversity of table identification results in the image, and utilize the subsequent further processing. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for identifying table information in an image, where the method includes:

receiving a target image with a table;

determining a form image containing a form from the target image;

performing text line detection on the form image, and determining the position of a text line in the form image;

removing table lines of the table image;

according to the position of the text line, segmenting a text image from the form image with the form lines removed;

identifying the segmented text image to obtain character information of the form image;

removing characters in the form image based on the position of the text line in the form image;

carrying out binarization processing on the table image with the characters removed and carrying out negation processing on pixel values to obtain an intermediate image;

carrying out corrosion treatment on the intermediate image to obtain a corrosion image, and carrying out expansion treatment on the corrosion image to obtain an expansion image;

performing horizontal and vertical table line separation processing on the expansion image to obtain a horizontal line image and a vertical line image;

performing union processing on the horizontal line image and the vertical line image to obtain a table line image;

performing intersection processing on the horizontal line image and the vertical line image to obtain an intersection point image;

determining the number of intersection points in the table image after the characters are removed according to the intersection point image;

determining the number of closed cells in the table image after the characters are removed according to the table line image;

determining the number of cells of the table according to the number of the intersection points of the table lines;

determining whether a table line of the table image is complete based on the number of closed cells and the number of cells;

if the table line of the table image is incomplete, completing the table line of the table image;

and carrying out table identification on the table image with the complete table lines to obtain the table structure information of the table image.

Optionally, the step of determining whether the table line of the table image is complete based on the number of closed cells and the number of cells includes:

judging whether the number of the closed cells is equal to the number of the cells or not;

if the number of the closed cells is equal to the number of the cells, determining that the table line of the table image is complete;

and if the number of the closed cells is not equal to the number of the cells, determining that the table lines of the table image are incomplete.

Optionally, the step of recognizing the segmented text image to obtain the text information of the table includes:

performing character recognition on the segmented text image to obtain a character recognition result of the form image;

performing semantic analysis on the character recognition result to obtain semantics corresponding to each text line;

classifying the character recognition results according to the corresponding semantics of each text line to obtain the corresponding category of each character recognition result;

and storing the character recognition result according to the category corresponding to the character recognition result to obtain the character information of the form image.

Optionally, the step of determining a table image containing a table from the target image includes:

inputting the target image into a deep learning model which is trained in advance to obtain a target position of a table in the target image;

judging whether a table area corresponding to the target position is distorted or not according to the target position;

if yes, affine transformation processing is carried out on the table area, and a table image corresponding to the target image is obtained.

Optionally, the step of performing text line detection on the form image and determining the position of the text line in the form image includes:

and detecting text lines of the tabular image by using a pixel link algorithm, and determining the positions of the text lines in the tabular image.

Optionally, the positions of the text lines in the form image include positions of all the text lines in the form image;

the position of the text line is the vertex coordinate of the minimum circumscribed rectangle of the text line, and the vertex coordinate is the coordinates of four vertexes of the minimum circumscribed rectangle, or the vertex coordinate is the coordinate of the diagonal vertex of the minimum circumscribed rectangle.

Optionally, the step of removing the table lines of the table image includes:

filling the color of the form line of the form image as the background color of the form image.

Optionally, the step of removing characters in the form image based on the position of the text line in the form image includes:

and filling a rectangular area corresponding to the position of the text line in the tabular image into the background color of the tabular image.

Optionally, the step of determining the number of intersection points and the number of closed cells in the form image after the character removal includes:

and detecting the number of closed cells and the number of intersections of the table lines in the table image after the characters are removed by using a findContours algorithm.

Optionally, the step of performing binarization processing on the table image with the characters removed and performing negation processing on pixel values to obtain an intermediate image includes:

carrying out binarization processing on the table image with the characters removed by using an adaptiveThreshold algorithm;

and performing negation processing on the pixel values of the table image after the binarization processing to obtain an intermediate image.

Optionally, the deep learning model includes a correspondence between the table image and the table vertex coordinates;

the step of inputting the target image into a deep learning model trained in advance to obtain the target position of the table in the target image comprises the following steps:

and inputting the target image into a deep learning model which is trained in advance to obtain table vertex coordinates of a table in the target image.

Optionally, the training mode of the deep learning model includes:

obtaining a form image sample and an initial deep learning model;

marking the position of a table area in the table image sample;

inputting the marked form image sample into the initial deep learning model, and training the initial deep learning model;

and when the accuracy of the output result of the initial deep learning model reaches a preset value or the training iteration times of the form image sample reach preset times, stopping training to obtain the deep learning model.

In a second aspect, an embodiment of the present invention provides an apparatus for identifying table information in an image, where the apparatus includes:

a target image receiving module for receiving a target image having a table;

a table image determining module for determining a table image containing a table from the target image;

the text line position determining module is used for detecting the text lines of the form image and determining the positions of the text lines in the form image;

the table line removing module is used for removing the table lines of the table image;

an information identification module comprising:

the image segmentation unit is used for segmenting a text image from the table image with the table lines removed according to the positions of the text lines;

the character recognition unit is used for recognizing the segmented text image to obtain character information of the form image;

the character removing unit is used for removing characters in the form image based on the position of the text line in the form image;

a first quantity determination unit comprising:

a binarization processing subunit, configured to perform binarization processing on the table image from which the characters are removed and perform negation processing on pixel values to obtain an intermediate image; the image corrosion subunit is used for carrying out corrosion treatment on the intermediate image to obtain a corrosion image; the image expansion subunit is used for performing expansion processing on the corrosion image to obtain an expansion image; the table line separating subunit is used for carrying out horizontal and vertical table line separating processing on the expansion image to obtain a horizontal line image and a vertical line image; a table line image determining subunit, configured to perform union processing on the horizontal line image and the vertical line image to obtain a table line image; the intersection point image determining subunit is used for performing intersection processing on the horizontal line image and the vertical line image to obtain an intersection point image; the intersection point number determining subunit is used for determining the number of intersection points in the table image after the characters are removed according to the intersection point image; the cell number determining subunit is used for determining the number of the closed cells in the table image after the characters are removed according to the table line image;

a second number determination unit for determining the number of cells of the table according to the number of intersections of the table lines;

a table line determination unit configured to determine whether a table line of the table image is complete based on the number of closed cells and the number of cells;

a table line completion unit configured to complete the table line of the table image if the table line of the table image is incomplete;

and the table identification unit is used for carrying out table identification on the table image with complete table lines to obtain the table structure information of the table image.

Optionally, the table line determining unit includes:

the quantity judging unit is used for judging whether the quantity of the closed cells is equal to the quantity of the cells or not;

a first table line determining unit, configured to determine that the table line of the table image is complete if the number of the closed cells is equal to the number of the cells;

a second table line determining unit, configured to determine that the table line of the table image is incomplete if the number of closed cells is not equal to the number of cells.

Optionally, the character recognition unit includes:

a character recognition subunit, configured to perform character recognition on the segmented text image to obtain a character recognition result of the form image;

the semantic analysis subunit is used for performing semantic analysis on the character recognition result to obtain the semantics corresponding to each text line;

the classification subunit is used for classifying the character recognition results according to the semantics corresponding to the text lines to obtain the category corresponding to each character recognition result;

and the recognition result storage subunit is used for storing the character recognition result according to the category corresponding to the character recognition result to obtain the character information of the form image.

Optionally, the form image determining module includes:

the target position determining unit is used for inputting the target image into a deep learning model which is trained in advance to obtain a target position of a table in the target image;

the distortion judging unit is used for judging whether the table area corresponding to the target position is distorted or not according to the target position;

and the table image determining unit is used for performing affine transformation processing on the table area if the table area corresponding to the target position is distorted, so as to obtain a table image corresponding to the target image.

Optionally, the text line position determining module includes:

and the text line position determining unit is used for detecting the text lines of the tabular image by using a pixel link algorithm and determining the positions of the text lines in the tabular image.

Optionally, the table line removing module includes:

a table line removing unit for filling a color of a table line of the table image as a background color of the table image.

Optionally, the character removing unit includes:

and the character removing subunit is used for filling the rectangular area corresponding to the position of the text line in the tabular image as the background color of the tabular image.

Optionally, the first number determining unit includes:

Optionally, the binarization processing subunit includes:

a binarization subunit, configured to perform binarization processing on the table image with the characters removed by using an adaptiveThreshold algorithm;

and the negation subunit is used for negating the pixel values of the table image after the binarization processing to obtain an intermediate image.

the target position determining unit is specifically configured to input the target image into a deep learning model which is trained in advance, and obtain table vertex coordinates of a table in the target image.

Optionally, the deep learning model is obtained by training through a model training module;

the model training module is used for acquiring a form image sample and an initial deep learning model; marking the position of a table area in the table image sample;

inputting the marked form image sample into the initial deep learning model, and training the initial deep learning model; and when the accuracy of the output result of the initial deep learning model reaches a preset value or the training iteration times of the form image sample reach preset times, stopping training to obtain the deep learning model.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

and a processor for implementing any of the above-described steps of the method for identifying table information in an image when executing a program stored in the memory.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method for identifying table information in an image according to any one of the above-mentioned steps is implemented.

In the scheme provided by the embodiment of the invention, the electronic equipment can firstly receive a target image with a table, then determine the table image containing the table from the target image, then perform text line detection on the table image, determine the position of a text line in the table image, and further recognize the image according to the position of the text line to obtain the table information of the table image, wherein the table information comprises character information and table structure information. Because the recognized table information includes the character information and the table structure information, not only the character content in the table, the diversity of the table recognition results in the image is improved, and the further processing such as the table recovery is carried out subsequently.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for identifying table information in an image according to an embodiment of the present invention;

FIG. 2(a) is a diagram illustrating a manual frame according to an embodiment of the present invention;

FIG. 2(b) is a diagram of another manual frame provided in the embodiment of the present invention;

FIG. 3 is a schematic illustration of the location of a text line in a form image based on the embodiment shown in FIG. 1;

FIG. 4 is a flowchart illustrating a specific step S104 in the embodiment shown in FIG. 1;

FIG. 5 is a flowchart showing a specific example of step S403 in the embodiment shown in FIG. 4;

FIG. 6 is a schematic diagram of intersections of the form lines based on the embodiment shown in FIG. 5;

FIG. 7 is a flowchart illustrating a specific step S502 in the embodiment shown in FIG. 5;

FIG. 8(a) is a schematic diagram of a form image based on the embodiment shown in FIG. 1;

FIG. 8(b) is a schematic diagram of an intermediate image based on the embodiment shown in FIG. 1;

FIG. 8(c) is a schematic diagram of a horizontal line image based on the embodiment shown in FIG. 1;

FIG. 8(d) is a schematic illustration of a vertical line image based on the embodiment shown in FIG. 1;

FIG. 8(e) is a diagram of a table line image according to the embodiment shown in FIG. 1;

FIG. 8(f) is a schematic diagram of a cross-point image based on the embodiment shown in FIG. 1;

FIG. 9 is a flowchart illustrating a specific step S104 in the embodiment shown in FIG. 1;

FIG. 10 is a flowchart of a training method of the deep learning model according to the embodiment shown in FIG. 1;

fig. 11 is a schematic structural diagram of an apparatus for identifying table information in an image according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to improve the accuracy of table identification in an image, the embodiment of the invention provides a method and a device for identifying table information in an image, an electronic device and a computer-readable storage medium.

The following first describes a method for identifying table information in an image according to an embodiment of the present invention.

The method for identifying table information in an image provided by the embodiment of the present invention can be applied to any electronic device that needs to identify table information in an image, for example, a computer, a mobile phone, a smart watch, and the like, and is not limited specifically herein. For convenience of description, the electronic device is hereinafter referred to simply as an electronic device.

As shown in fig. 1, a method for identifying table information in an image, the method comprising the steps of:

s101, receiving a target image with a table;

s102, determining a form image containing a form from the target image;

s103, performing text line detection on the form image, and determining the position of a text line in the form image;

and S104, identifying the form image according to the position of the text line to obtain form information of the form image.

The table information comprises character information and table structure information.

As can be seen, in the scheme provided in the embodiment of the present invention, the electronic device may first receive a target image with a table, then determine a table image containing the table from the target image, perform text line detection on the table image, determine the position of a text line in the table image, and further identify the image according to the position of the text line to obtain table information of the table image, where the table information includes text information and table structure information. Because the recognized table information includes the character information and the table structure information, not only the character content in the table, the diversity of the table recognition results in the image is improved, and the further processing such as the table recovery is carried out subsequently.

In step S101, the electronic device may receive a target image with a form, where the target image is an image that needs to be identified by form information. The electronic device may retrieve a locally stored image with a table as the target image. An image with a table sent by other electronic equipment can also be received as a target image. It is of course also possible to acquire the image with the form as the target image by an image pickup device mounted on itself, for example, by a camera mounted on itself. This is all reasonable and is not specifically limited herein.

When the electronic device obtains the target image through the image acquisition device installed on the electronic device, the manual selection frame may be displayed in the display screen, for example, as shown in fig. 2(a) and 2(b), and the user may change the shape of the manual selection frame by dragging the manual selection frame, which may be rectangular, trapezoidal, triangular, or the like. And acquiring the region included in the manual selection frame by the image acquisition device to obtain the target image.

After obtaining the target image, the electronic device may determine a form image including the form from the target image in order to identify the form in the target image. The electronic device can determine a form image containing a form in the target image by using a deep learning model, image detection, and the like. For clarity of the scheme and clarity of layout, the manner of determining the form image containing the form from the target image will be described later. After obtaining the form image, the electronic device may perform text line detection on the form image, determine the position of the text line in the form image, and then execute step S103. In an embodiment, the electronic device may perform text line detection on the table image by using a pixel link algorithm, which is not specifically illustrated and limited herein.

In order to improve the accuracy of text line recognition and adapt to the actual application scenario better, the deep learning model used in the pixel link algorithm may be adaptively adjusted, for example, parameters, a loss function, and the like of the deep learning model are adjusted, and a specific adjustment manner may be a related manner in the field of the deep learning model, which is not specifically limited and described herein.

The positions of the text lines in the table image are the positions of all the text lines in the table, and can be represented by vertex coordinates of the minimum circumscribed rectangle of the text lines, coordinates of four vertices, or coordinates of diagonal vertices. For example, as shown in fig. 3, the coordinates of the point 301 to the point 304 may be used, the coordinates of the point 301 and the point 303 may be used, and the coordinates of the point 302 and the point 304 may be used. The position of the text line corresponding to "age" is shown in fig. 3 by way of example only, and the positions of the other text lines are not shown.

Next, in step S104, the electronic device may recognize the table image according to the position of the text line, and further obtain the table information of the table image. The table information may include text information and table structure information.

The table structure information may include information such as the number of rows and columns, cell merging information, cell border information, cell filling colors, and cell widths. The text information may include information such as text content, type, font size, color, etc., and is not particularly limited herein.

As an implementation manner of the embodiment of the present invention, before the step of identifying the form image according to the position of the text line and obtaining the form information of the form image, the method may further include: removing all form lines of the form image.

In order to remove the influence of the table lines on the character recognition, the electronic device may remove the table lines of the table image, and in one embodiment, may remove all the table lines of the table image, so that the table lines are not influenced when the character recognition is performed.

As an embodiment, the electronic device may fill the color of the form lines with the background color of the form image for the purpose of removing the form lines. For example, if the background color of the form image is white and the form lines and characters therein are black, the electronic device may fill all the form lines with white, leaving only black characters.

Correspondingly, as shown in fig. 4, the step of identifying the form image according to the position of the text line to obtain the form information of the form image may include:

s401, according to the position of the text line, segmenting a text image from the form image with the form lines removed;

for character recognition, the electronic device may segment the text image from the form image with the form lines removed according to the positions of the text lines detected by the text line detection. For example, if the text line is located at the coordinates (5, 7.5) and (35, 15) of the diagonal vertices of the rectangle, the electronic device can segment the rectangular region from the table to obtain a text image.

The electronic equipment divides the rectangular areas corresponding to all texts in the form image according to the positions of the text lines, so that all text images corresponding to the form image can be obtained. Since the form line of the form image at this time is already removed, the form line is not divided into text images even when the form line and the character are very close to each other.

S402, carrying out character recognition on the segmented text image to obtain character information of the form image;

further, the electronic device performs character recognition on the segmented text image, so that character information of the form image can be obtained.

S403, determining whether the table lines of the table image are complete, and if the table lines of the table image are incomplete, executing S404;

in order to make the obtained table structure information more accurate, the electronic device may determine whether the table lines of the table image are complete. In one embodiment, the electronic device may determine whether the table lines of the table image are complete by detecting the number of closed cells in the table image, and the specific implementation will be described later.

If the form line of the form image is complete, then execution may continue to step S405.

S404, completing the table lines of the table image;

if the form line of the form image is not complete, the electronic device may perform step S404, i.e., complete the form line of the form image, and then perform step S405;

s405, performing table identification on the table image with the complete table lines to obtain the table structure information of the table image.

The electronic device can perform table identification on the table image with the complete table lines, and further obtain the table structure information of the table image. After the table structure information is obtained, the table structure information may be stored in order to obtain a table by performing a subsequent recovery process.

Therefore, in this embodiment, the electronic device may segment the form image without all the form lines into the text image, so that the segmented text image does not include the form lines, and further, the obtained text information is more accurate. Meanwhile, the incomplete table image of the table line can be subjected to completion processing, and accurate table structure information can be obtained according to the complete table image of the table line.

As an implementation manner of the embodiment of the present invention, as shown in fig. 5, the step of determining whether the table line of the table image is complete may include:

s501, removing characters in the form image based on the position of the text line in the form image;

the position of the text line in the form image is determined, the electronic equipment can remove characters in the form image according to the position of the text line, and in order to not influence the subsequent determination of the number of closed cells and the number of intersection points of the form lines, the electronic equipment can remove all characters, namely only the form lines of the form are reserved.

In one embodiment, the electronic device may fill a rectangular region corresponding to the position of the text line in the form image with a background color of the form image, so as to achieve the purpose of removing the character. For example, if the background color of the form image is white and the form lines and the characters therein are black, the electronic device may fill all the characters with white, leaving only the black form lines.

S502, determining the number of intersection points and the number of closed cells in the table image after the characters are removed;

further, the electronic device can determine the number of closed cells in the form image after the character has been removed, as well as the number of intersections of the form lines. As an embodiment, the electronic device may use a findContours algorithm to detect the number of closed cells and the number of intersections of the form lines in the form image after the character is removed.

The intersection of the table lines is an intersection formed by two table lines, for example, as shown in fig. 6, a table with 2 rows and 3 columns is shown in fig. 6, where the points 610 are intersections of the table lines and there are 12 points.

S503, determining the number of cells of the table according to the number of the intersection points of the table lines;

the number of intersection points of the table lines in the table image is determined, and the electronic equipment can determine the number of cells of the table according to the number of the intersection points of the table lines.

For example, if the number of intersections of the table line is 30, the table may be determined to be a 4-row and 5-column table, or a 5-row and 4-column table, the number of cells of the table may be determined to be 20.

Further, the electronic device may determine whether the form lines of the form image are complete based on the number of closed cells and the number of cells. Specifically, the method may include steps S504 to S506.

S504, judging whether the number of the closed cells is equal to the number of the cells, and executing a step S505 if the number of the closed cells is equal to the number of the cells; if the number of the closed cells is not equal to the number of the cells, executing step S506;

next, the electronic device may determine whether the number of the closed cells is equal to the determined number of the cells, and if the number of the closed cells is equal to the number of the cells, it indicates that all the cells in the form image are closed, which indicates that the form line of the form image is complete and there is no missing line, then step S505 may be performed, i.e., it is determined that the form line of the form image is complete.

If the number of closed cells is not equal to the number of cells, which indicates that not all cells in the table image are closed, that is, the table lines of the table in the table image are incomplete and there are missing lines, step S506 may be performed, i.e., it is determined that the table lines in the table image are incomplete.

For example, if the number of closed cells is 28 and the number of cells determined in step S503 is 30, it indicates that 2 cells of the table in the table image are not closed, and the table line of the table in the table image is incomplete.

S505, determining that the table line of the table image is complete;

s506, determining that the table lines of the table image are incomplete.

It can be seen that, in this embodiment, the electronic device may remove the characters in the form image based on the positions of the text lines in the form image, determine the number of closed cells and the number of intersections of the form lines in the form image after the characters are removed, further determine the number of cells of the form according to the number of intersections of the form lines, then determine whether the number of closed cells is equal to the number of cells, if so, determine that the form lines of the form image are complete, and if not, determine that the form lines of the form image are incomplete. Therefore, whether the table line of the table image is complete or not can be accurately determined, and the accuracy of subsequent table content identification is improved.

As an implementation manner of the embodiment of the present invention, as shown in fig. 7, the step of determining the number of intersection points and the number of closed cells in the table image after removing the character may include:

s701, performing binarization processing on the table image with the characters removed and performing negation processing on pixel values to obtain an intermediate image;

in an embodiment, the electronic device may perform binarization processing on the table image with the characters removed by using an adaptive threshold algorithm, and then the electronic device may perform negation processing on pixel values of the table image after the binarization processing to obtain an intermediate image.

For example, as shown in fig. 8(b), an intermediate image is obtained by removing characters from a table image shown in fig. 8(a), binarizing the table image, and inverting pixel values. It can be seen that the characters and the table lines in the table image are both black, the binarization processing is performed on the table image, the table lines in the intermediate image obtained by performing the inversion processing on the pixel values are white, and the rest of the intermediate image is black.

S702, carrying out corrosion treatment on the intermediate image to obtain a corrosion image;

next, since some characters may be closer to the table line or have repeated portions with the table line, some pixel points that do not belong to the table line may be included in the intermediate image, for example, white dots in fig. 8 (b). Therefore, in order to more accurately determine the number of intersection points in the table image, the electronic device may process the intermediate image by using erosion processing, so as to obtain an erosion image.

Erosion and dilation are morphological operations on an image that essentially change the shape of objects in the image. The erosion process and the expansion process generally work on the binarized image to connect adjacent elements or separate into independent elements. The erosion process and the dilation process are generally directed to white portions in the image.

The erosion process is to take local minimum in a small area of the image. Because the intermediate image is a binary image and the pixel values are only 0 and 255, one of the pixel values in a small area is 0, and all the pixel points in the small area are changed into 0, the pixel points left by characters far away from a form line can be corroded when the intermediate image is processed by corrosion treatment.

S703, performing expansion processing on the corrosion image to obtain an expansion image;

next, the electronic device may perform dilation processing on the erosion image, thereby obtaining a dilated image. Since the dilation process is taking a local maximum within a small region of the image. Because the intermediate image is a binary image and the pixel values are only 0 and 255, one of the pixel values in the small region is 255, and all the pixel points in the small region are 255, the pixel points left by the characters which are closer to the form line can be merged into the form line through the expansion processing of the form line.

S704, performing horizontal and vertical table line separation processing on the expansion image to obtain a horizontal line image and a vertical line image;

after obtaining the expansion image, the electronic device can perform horizontal and vertical table line separation processing on the expansion image to obtain a horizontal line image and a vertical line image. Since the erosion and expansion processes have been performed, only the table lines are present in the horizontal line image and the vertical line image.

For example, the horizontal line image and the vertical line image obtained by subjecting the intermediate image shown in fig. 8(b) to erosion and dilation and then subjecting the intermediate image to horizontal and vertical table line separation processing can be shown in fig. 8(c) and 8(d), respectively.

S705, performing union set processing on the horizontal line image and the vertical line image to obtain a table line image;

furthermore, the electronic device may perform a union processing on the horizontal line image and the vertical line image, so as to obtain a table line image. For example, as shown in fig. 8(c) and 8(d), the horizontal line image and the vertical line image are merged with each other in fig. 8(c) and 8(d), respectively, to obtain a table line image 8 (e).

S706, performing intersection taking processing on the horizontal line image and the vertical line image to obtain an intersection point image;

the electronic device may further perform intersection processing on the horizontal line image and the vertical line image, so as to obtain an intersection point image. For example, as shown in fig. 8(c) and 8(d), the horizontal line image and the vertical line image are respectively processed to obtain the intersection image fig. 8(f) by performing intersection processing on fig. 8(c) and 8 (d).

S707, determining the number of intersection points in the table image after the characters are removed according to the intersection point image;

after obtaining the intersection point image, the electronic device can determine the number of intersection points in the table image after the character is removed. For example, if the intersection point image is as shown in fig. 8(f), the number of intersection points can be determined to be 56.

And S708, determining the number of closed cells in the table image after the characters are removed according to the table line image.

After obtaining the table line image, the electronic device may determine the number of closed cells in the table image after the character is removed. For example, if the intersection image is as shown in fig. 8(e), then the number of closed cells in the table image can be determined to be 42.

Therefore, in this embodiment, the electronic device may perform binarization processing on the table image from which the characters are removed and perform negation processing on the pixel values to obtain an intermediate image, and then perform horizontal and vertical table line separation processing on the intermediate image by using erosion and expansion processing to obtain a horizontal line image and a vertical line image, and the obtained horizontal line image and vertical line image have pixel points left as characters, so that the number of intersection points and the number of closed cells determined subsequently are more accurate.

As an implementation manner of the embodiment of the present invention, in order to facilitate subsequent query and recover table content, as shown in fig. 9, the step of identifying the segmented text image to obtain the text information of the table may include:

s901, carrying out character recognition on the segmented text image to obtain a character recognition result of the form image;

the electronic device can perform character recognition on the segmented text image, and then obtains a character recognition result of the form image. For the specific implementation manner of the character recognition, any character recognition manner in the field of character recognition in the image may be adopted as long as the character content in the text image can be recognized, and no specific limitation and description are made herein.

S902, performing semantic analysis on the character recognition result to obtain the corresponding semantics of each text line;

after the character recognition result is obtained, the electronic device may perform semantic analysis on the character recognition result to obtain the semantics corresponding to each text line in order to perform structured storage on the character recognition result. The specific implementation manner of performing semantic analysis on the character recognition result may be any semantic analysis manner in the field of semantic analysis, and is not specifically limited or described herein.

S903, classifying the character recognition results according to the corresponding semantics of each text line to obtain the corresponding category of each character recognition result;

furthermore, the electronic device can classify the character recognition results according to the corresponding semantics of each text line to obtain the corresponding category of each character recognition result. For example, if the text recognition results are "name", "zhang san", "lie si", "age", "25 years" and "28 years", then the corresponding semantics of "zhang san" and "lie si" are all names of people, and the corresponding semantics of "25 years" and "28 years" are all ages of people, then the electronic device table may classify the text recognition results "zhang san", and "lie si" and "name" into names, and classify the text recognition results "25 years" and "28 years" and "age" into ages.

And S904, storing the character recognition result according to the category corresponding to the character recognition result to obtain the character information of the form image.

And obtaining the category corresponding to the character recognition result. The electronic device can store the character recognition result according to the category to obtain the character information of the tabular image.

In one embodiment, the electronic device may store the text recognition result in a key value pair in JSON (Object Notation) format in a structured manner. Also, in the above example, the electronic device may store "name" and "age" as the stored keys, and "zhang" and "lie four" as the values corresponding to the key "name". Similarly, "age" is stored as a key, "25 years" and "28 years" are stored as values corresponding to the key "age".

In order to more intuitively display the table in the table image, the electronic device may also store the table image with the complete table line or the table image with the completed table line.

The electronic equipment can also store information such as the type, font size, color and the like of the characters in the form image and the form structure information, so that the form can be conveniently recovered by utilizing the character information and the form structure information in the follow-up process.

As can be seen, in this embodiment, the electronic device may perform semantic analysis on the character recognition result to obtain semantics corresponding to each text line, classify the character recognition result according to the semantics corresponding to each text line, and store the character recognition result according to the classification result. The table image with the complete table lines, the table image with the table lines completed, the table structure information, and the like may also be stored. Therefore, when the user checks the information corresponding to the form image, the finished form image and the form content can be checked, the method is more visual and convenient, the user experience is improved, and the form can be conveniently recovered by subsequently utilizing the character information and the form structure information.

As an implementation manner of the embodiment of the present invention, the step of determining the table image including the table from the target image may include:

inputting the target image into a deep learning model which is trained in advance to obtain a target position of a table in the target image; judging whether a table area corresponding to the target position is distorted or not according to the target position; if yes, affine transformation processing is carried out on the table area, and a table image corresponding to the target image is obtained.

In order to determine the table position in the acquired target image to identify the table, the electronic device may determine the target position of the table in the target image through a deep learning model trained in advance. The deep learning model is obtained by training an initial deep learning model based on a table image sample acquired in advance, and the position of a table in a target image, namely the target position, can be obtained through the deep learning model.

The deep learning model may be a convolutional neural network or the like, and the specific structure of the deep learning model is not specifically limited in the present invention, as long as the deep learning model capable of obtaining the position of the table in the table image can be obtained through training. The initial parameters of the initial deep learning model may be randomly set, and are not particularly limited herein. For clarity of the scheme and clarity of layout, the training mode of the deep learning model will be described in the following.

After the target position of the table in the target image is determined, the electronic device can determine the table area in the target image according to the target position. For example, the target position is four vertices of a table in the target image, and then the table area in the target image is the area determined by the four vertices.

And the electronic equipment can judge whether the table area corresponding to the target position is distorted, if not, the table area is not processed, and the image corresponding to the table area is the table image. Wherein the electronic device may determine whether the table region is distorted according to the coordinates of the target location, for example, if the coordinates of the target location indicate that the table region is a parallelogram, the table region may be determined to be distorted; if the coordinates of the target location indicate that the table region is a rectangle, then it can be determined that the table region is not distorted.

If the table area is distorted, the electronic device can perform affine transformation processing on the determined table area to obtain a table image corresponding to the target image. In many practical situations, the table in the target image acquired by the electronic device is distorted, and in order to still accurately identify the table content in such a situation, the electronic device may perform affine transformation on the table region to obtain the table image corresponding to the target image.

It is understood that the table is generally rectangular, but in the case of image distortion and the like, the table area in the target image may not be rectangular, but may be trapezoidal and the like, and the electronic device may perform affine transformation on the table area to obtain a table image corresponding to the target image, where the table image is a table image after distortion correction.

The specific implementation manner of performing affine transformation processing on the form area may be any affine transformation processing manner as long as the form image can be subjected to distortion correction. For example, assuming that the target position is a table vertex coordinate in the target image, and the table vertex coordinate indicates that the table area is a trapezoid, the electronic device may determine four vertex coordinates of a rectangle corresponding to the electronic device, and further determine an affine transformation matrix between the four vertex coordinates, and according to the affine transformation matrix, may perform affine transformation on the distorted table area, and may obtain a table image corresponding to the target image.

As an implementation manner of the embodiment of the present invention, the deep learning model may include a correspondence relationship between the table image and the table vertex coordinates. In this case, the step of inputting the target image into a deep learning model trained in advance to obtain the target position of the table in the target image may include:

In this embodiment, the deep learning model may include a correspondence relationship between a table image and table vertex coordinates, where the table vertex coordinates are four vertex coordinates of the table, and the four vertex coordinates determine an area in the image where the table is located.

Because the deep learning model can determine the vertex coordinates of the table area in the image according to the corresponding relation between the table image and the table vertex coordinates, the target image is input into the deep learning model trained in advance, the deep learning model can process the target image, and then the table vertex coordinates, namely the table vertex coordinates of the table in the target image, are output.

Therefore, in this embodiment, the electronic device may input the target image into the deep learning model trained in advance, so as to obtain the table vertex coordinates of the table in the target image, and may accurately determine the table vertex coordinates of the table in the target image, that is, accurately determine the specific area of the table in the target image, and may further improve the accuracy of subsequent table content identification.

As an implementation manner of the embodiment of the present invention, as shown in fig. 10, the training manner of the deep learning model may include:

s1001, obtaining a form image sample and an initial deep learning model;

to obtain the deep learning model, a form image sample and an initial deep learning model may be obtained first. It is reasonable that the initial deep learning model may be established in advance or may be obtained from other electronic devices.

The form image sample is an image including a form, and the form image sample may include only the form, or may include contents other than the form, such as a drawing, characters outside the form, numbers, and the like. The number of the table image samples is multiple, and the specific number can be determined according to actual conditions.

S1002, marking the position of a table area in the table image sample;

after the form image samples are obtained, the location of the form area in each form image sample may be marked. In one embodiment, the four vertex coordinates of the table area may be used as the position of the table area.

S1003, inputting the marked form image sample into the initial deep learning model, and training the initial deep learning model;

after the positions of the table areas in the table image sample are marked, the marked table image sample can be input into the initial deep learning model, and the initial deep learning model is trained. In the training process, the initial deep learning model continuously learns the corresponding relation between the table image characteristics and the positions of the table areas, and continuously adjusts the parameters of the initial deep learning model.

The specific training mode for training the initial deep learning model may adopt a common training mode such as a gradient descent algorithm, and is not specifically limited herein.

And S1004, when the accuracy of the output result of the initial deep learning model reaches a preset value or the training iteration times of the form image sample reach preset times, stopping training to obtain the deep learning model.

When the accuracy of the output result of the initial deep learning model reaches a preset value, or the training iteration times of the form image sample reach preset times, it is indicated that the initial deep learning model can process various images with forms at the moment to obtain the accurate positions of the form areas. Then training may be stopped and the deep learning model described above may be obtained.

The preset value may be determined according to a requirement on accuracy of an output result of the deep learning model, and may be, for example, 90%, 95%, 98%, or the like. The preset times can also be determined according to the requirement on the accuracy of the output result of the deep learning model, and if the requirement on the accuracy is higher, the preset times can be more, for example, 5 ten thousand times, 8 ten thousand times, 10 ten thousand times and the like; if the accuracy requirement is low, the preset number of times may be small, for example, 1 ten thousand times, 2 ten thousand times, 3 ten thousand times, etc.

Therefore, in this embodiment, the electronic device may obtain a form image sample and an initial deep learning model, mark the position of a form area in the form image sample, input the marked form image sample into the initial deep learning model, train the initial deep learning model, and stop training when the accuracy of an output result of the initial deep learning model reaches a preset value or the training iteration number of the form image sample reaches a preset number, to obtain the deep learning model. Thus, a deep learning model capable of accurately determining the position of the table region in the image can be obtained, and the accuracy of table information identification can be further improved.

Corresponding to the identification method of the table information in the image, the embodiment of the invention also provides an identification device of the table information in the image.

The following describes an apparatus for identifying table information in an image according to an embodiment of the present invention.

As shown in fig. 11, an apparatus for recognizing a table in an image, the apparatus comprising:

a target image receiving module 1110 for receiving a target image having a table;

a table image determining module 1120, configured to determine a table image containing a table from the target image;

a text line position determining module 1130, configured to perform text line detection on the form image, and determine a position of a text line in the form image;

the information identifying module 1140 is configured to identify the table image according to the position of the text line, so as to obtain table information of the table image.

As an implementation manner of the embodiment of the present invention, the apparatus may further include:

a table line removing module (not shown in fig. 11) configured to remove a table line of the table image before the table image is identified according to the position of the text line to obtain table information of the table image;

the information recognition module 1140 may include:

an image segmentation unit (not shown in fig. 11) for segmenting a text image from the form image from which the form lines are removed, according to the positions of the text lines;

a character recognition unit (not shown in fig. 11) configured to perform character recognition on the segmented text image to obtain character information of the form image;

a table line determination unit (not shown in fig. 11) for determining whether a table line of the table image is complete;

a table line completion unit (not shown in fig. 11) configured to complete a table line of the table image if the table line of the table image is incomplete;

and a table identifying unit (not shown in fig. 11) configured to perform table identification on the table image with complete table lines to obtain table structure information of the table image.

As an implementation manner of the embodiment of the present invention, the table line determining unit may include:

a character removal unit (not shown in fig. 11) for removing characters in the form image based on the position of the text line in the form image;

a first number determination unit (not shown in fig. 11) for determining the number of intersection points and the number of closed cells in the form image after the character removal;

a second number determination unit (not shown in fig. 11) for determining the number of cells of the table from the number of intersections of the table lines;

a number judgment unit (not shown in fig. 11) for judging whether the number of the closed cells is equal to the number of the cells;

a first table line determination unit (not shown in fig. 11) for determining that the table line of the table image is complete if the number of the closed cells is equal to the number of the cells;

a second table line determining unit (not shown in fig. 11) for determining that the table line of the table image is incomplete if the number of closed cells is not equal to the number of cells.

As an implementation manner of the embodiment of the present invention, the first number determining unit may include:

a binarization processing subunit (not shown in fig. 11) configured to perform binarization processing on the table image with the characters removed and perform inversion processing on pixel values to obtain an intermediate image;

an image erosion subunit (not shown in fig. 11) configured to perform erosion processing on the intermediate image to obtain an eroded image;

an image expansion subunit (not shown in fig. 11) configured to perform expansion processing on the erosion image to obtain an expanded image;

a table line dividing subunit (not shown in fig. 11) configured to perform horizontal and vertical table line dividing processing on the expanded image to obtain a horizontal line image and a vertical line image;

a table line image determining subunit (not shown in fig. 11) configured to perform union processing on the horizontal line image and the vertical line image to obtain a table line image;

an intersection point image determining subunit (not shown in fig. 11) configured to perform intersection processing on the horizontal line image and the vertical line image to obtain an intersection point image;

an intersection number determining subunit (not shown in fig. 11) configured to determine, from the intersection image, the number of intersections in the character-removed table image;

the cell number determination subunit (not shown in fig. 11) is configured to determine, according to the table line image, the number of closed cells in the table image from which the character is removed.

As an implementation manner of the embodiment of the present invention, the character recognition unit may include:

a character recognition subunit (not shown in fig. 11) configured to perform character recognition on the segmented text image to obtain a character recognition result of the form image;

a semantic molecule unit (not shown in fig. 11) configured to perform semantic analysis on the character recognition result to obtain semantics corresponding to each text line;

a classification subunit (not shown in fig. 11) configured to classify the character recognition results according to the semantics corresponding to the text lines, so as to obtain a category corresponding to each character recognition result;

and an identification result storage subunit (not shown in fig. 11) configured to store the character identification result according to the category corresponding to the character identification result, so as to obtain character information of the form image.

As an implementation manner of the embodiment of the present invention, the table image determining module 1120 may include:

a target position determining unit (not shown in fig. 11) configured to input the target image into a deep learning model trained in advance, so as to obtain a target position of a table in the target image;

a distortion determination unit (not shown in fig. 11) configured to determine whether a table region corresponding to the target position is distorted according to the target position;

and a table image determining unit (not shown in fig. 11) configured to, if the table area corresponding to the target position is distorted, perform affine transformation on the table area to obtain a table image corresponding to the target image.

As an implementation manner of the embodiment of the present invention, the text line position determining module includes:

As an implementation manner of the embodiment of the present invention, the positions of the text lines in the form image include positions of all the text lines in the form image;

As an implementation manner of the embodiment of the present invention, the table line removing module includes:

As an implementation manner of the embodiment of the present invention, the character removing unit includes:

As an implementation manner of the embodiment of the present invention, the first number determining unit includes:

As an implementation manner of the embodiment of the present invention, the binarization processing subunit includes:

As an implementation manner of the embodiment of the present invention, the deep learning model includes a correspondence between a table image and table vertex coordinates;

As an implementation manner of the embodiment of the present invention, the deep learning model is obtained by training a model training module;

An embodiment of the present invention further provides an electronic device, as shown in fig. 12, the electronic device may include a processor 1201, a communication interface 1202, a memory 1203, and a communication bus 1204, where the processor 1201, the communication interface 1202, and the memory 1203 complete communication with each other through the communication bus 1204,

a memory 1203 for storing a computer program;

the processor 1201 is configured to implement the following steps when executing the program stored in the memory 1203:

receiving a target image with a table;

determining a form image containing a form from the target image;

and identifying the form image according to the position of the text line to obtain form information of the form image.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

Before the step of identifying the form image according to the position of the text line to obtain the form information of the form image, the method further includes:

removing table lines of the table image;

the step of identifying the form image according to the position of the text line to obtain the form information of the form image comprises the following steps:

determining whether a form line of the form image is complete;

Wherein the step of determining whether the form line of the form image is complete comprises:

determining the number of intersection points and the number of closed cells in the table image after the characters are removed;

The step of determining the number of intersection points and the number of closed cells in the form image after the characters are removed comprises the following steps:

carrying out corrosion treatment on the intermediate image to obtain a corrosion image;

performing expansion processing on the corrosion image to obtain an expansion image;

and determining the number of closed cells in the table image after the characters are removed according to the table line image.

Wherein, the step of identifying the segmented text image to obtain the character information of the table comprises:

Wherein the step of determining a form image containing a form from the target image comprises:

Wherein the step of performing text line detection on the form image and determining the position of the text line in the form image comprises:

Wherein the locations of the text lines in the form image comprise locations of all the text lines in the form image;

Wherein the step of removing form lines of the form image comprises:

Wherein the step of removing characters in the form image based on the location of the text line in the form image comprises:

The step of performing binarization processing on the table image with the characters removed and performing negation processing on pixel values to obtain an intermediate image comprises the following steps of:

The deep learning model comprises a corresponding relation between a table image and table vertex coordinates;

The training mode of the deep learning model comprises the following steps:

obtaining a form image sample and an initial deep learning model;

marking the position of a table area in the table image sample;

An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when executed by a processor, the computer program implements the following steps:

receiving a target image with a table;

determining a form image containing a form from the target image;

As can be seen, in the solution provided in the embodiment of the present invention, when the computer program is executed by the processor, the computer program may first receive a target image with a table, then determine a table image containing the table from the target image, perform text line detection on the table image, determine the position of a text line in the table image, and further identify the image according to the position of the text line to obtain table information of the table image, where the table information includes text information and table structure information. Because the recognized table information includes the character information and the table structure information, not only the character content in the table, the diversity of the table recognition results in the image is improved, and the further processing such as the table recovery is carried out subsequently.

removing table lines of the table image;

determining whether a form line of the form image is complete;

Wherein the step of removing form lines of the form image comprises:

The training mode of the deep learning model comprises the following steps:

obtaining a form image sample and an initial deep learning model;

marking the position of a table area in the table image sample;

It should be noted that, for the above-mentioned apparatus, electronic device and computer-readable storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for identifying table information in an image, the method comprising:

receiving a target image with a table;

determining a form image containing a form from the target image;

removing table lines of the table image;

2. The method of claim 1, wherein the step of determining whether a table line of the table image is complete based on the number of closed cells and the number of cells comprises:

3. The method of claim 1, wherein the step of recognizing the segmented text image to obtain the text information of the table comprises:

4. A method according to any one of claims 1 to 3, wherein the step of determining a form image containing a form from the target image comprises:

5. A method according to any of claims 1-3, wherein said step of performing text line detection on said form image and determining the location of text lines in said form image comprises:

6. The method of any of claims 1-3, wherein the locations of the text lines in the form image comprise locations of all text lines in the form image;

7. The method of any of claims 1-3, wherein the step of removing form lines of the form image comprises:

8. A method according to any of claims 1-3, wherein the step of removing characters in the form image based on the locations of the lines of text in the form image comprises:

9. A method according to any of claims 1-3, wherein the step of determining the number of intersections and the number of closed cells in the character-removed form image comprises:

10. The method according to any one of claims 1 to 3, wherein the step of subjecting the character-removed form image to binarization processing and inverting pixel values to obtain an intermediate image comprises:

11. The method of claim 4, wherein the deep learning model includes correspondence of table images to table vertex coordinates;

12. The method of claim 4, wherein the deep learning model is trained by:

obtaining a form image sample and an initial deep learning model;

marking the position of a table area in the table image sample;

13. An apparatus for recognizing table information in an image, the apparatus comprising:

a target image receiving module for receiving a target image having a table;

an information identification module comprising:

a first quantity determination unit comprising:

14. The apparatus of claim 13, wherein the table line determining unit comprises:

15. The apparatus of claim 13, wherein the text recognition unit comprises:

16. The apparatus of any of claims 13-15, wherein the form image determination module comprises:

17. The apparatus of any of claims 13-15, wherein the text line position determination module comprises:

18. The apparatus of any of claims 13-15, wherein the locations of the text lines in the form image comprise locations of all text lines in the form image;

19. The apparatus of any of claims 13-15, wherein the form line removal module comprises:

20. The apparatus of any one of claims 13-15, wherein the character removal unit comprises:

21. The apparatus of any one of claims 13-15, wherein the first quantity determination unit comprises:

22. The apparatus according to any one of claims 13 to 15, wherein the binarization processing sub-unit includes:

23. The apparatus of claim 16, in which the deep learning model comprises a correspondence of table images to table vertex coordinates;

24. The apparatus of claim 16, wherein the deep learning model is trained by a model training module;

25. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-12 when executing a program stored in the memory.

26. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-12.