CN111814722A

CN111814722A - Method and device for identifying table in image, electronic equipment and storage medium

Info

Publication number: CN111814722A
Application number: CN202010697220.XA
Authority: CN
Inventors: 孔垂鑫; 王鉴宇; 郑嘉文; 李文; 段立新
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-10-23
Anticipated expiration: 2040-07-20
Also published as: CN111814722B

Abstract

The invention provides a method and a device for identifying a table in an image, electronic equipment and a storage medium, wherein the method comprises the steps of obtaining the image to be identified; cascading rectification based on deep learning and straight line detection; table region detection based on deep learning; detecting a grid line of a row list from the table area image, and reconstructing a table structure; recognizing the text in the cell based on a text recognition model of deep learning; and (5) formatting and outputting the recognition result. Through the design, the invention simultaneously solves the identification problems of various forms of different types, such as a full frame line form, a partial frame line form, a frame line-free form and the like, and improves the identification accuracy of the form structure and the content.

Description

Method and device for identifying table in image, electronic equipment and storage medium

Technical Field

The invention belongs to the image recognition technology, and particularly relates to a method and a device for recognizing a table in an image, electronic equipment and a storage medium.

Background

The table is a content form common in the document, can represent structured content, and therefore contains a large amount of information, and has an important position in the document. Paper form documents are visible everywhere in life, and various paper documents in life, such as goods lists, information collection lists, and various list reports. With the mass growth of paper form documents, the work of checking information becomes increasingly heavy and burdensome, which brings huge labor cost, and the error rate of reading data from the documents manually is high, so that the demand of automatically digitizing the form documents by using computer technology becomes more and more common. The automatic digitization of the form document by the computer has the following advantages: (1) the method can reduce manual table look-up cost, reduce workload, accelerate processing speed and improve economic effect. (2) A form document digitization system with excellent performance can reduce errors caused by manual entry. (3) The digitalized table document can be subjected to deeper information mining by using a computer, and more information can be obtained. Currently, for the table reconstruction problem, a typical method is as follows:

and a table reconstruction method based on the detection grid points. Firstly, filtering out transverse lines and vertical lines from a table area through transverse line and vertical line filters, then positioning transverse and vertical line intersection point coordinates, namely grid point coordinates, classifying grid point types, traversing all grid points, and reconstructing a table structure according to the grid point types. The method has the disadvantages that a great number of parameters need to be set manually, and the method is sensitive to parameter change, for example, if the length of the transverse and vertical line filter is too large, some table lines can be missed, and if the length of the transverse and vertical line filter is too small, a great number of wrong line segments can be detected. Different parameters need to be set for different types of tables, and generalization performance is poor.

A table reconstruction method based on template matching. The method comprises the steps of designing a template image, registering an input image and the template image, and identifying the input image based on the structure of the template image. The method has the disadvantages that the method only uses forms with fixed formats such as invoices, identity cards and the like, a template must be independently arranged for each form, the universality is poor, the identification precision of the method strongly depends on the precision of image registration, the texture similarity of images in form images is high, the image registration is difficult, and the speed of image registration with high resolution is low.

Most of the existing table structure identification methods aim at full-frame-line tables, while the processing of the non-frame-line tables is more difficult, but wireless tables are also common forms in the tables. There are no visible ruled lines in a wireless table and identification is more difficult. For the extraction of the line and column dividing lines of the wireless table, the conventional method usually performs expansion and other processing on a table image to thicken characters so as to make the dividing lines clearer during projection, and this kind of method has some problems, for example, parameter setting of image expansion is difficult to control, excessive expansion may cause different lines, different columns of text pixels are adhered to line pixels and noise pixels in the image, so that the dividing effect is poor, and too small expansion may cause too many peaks of a projection sequence after the pixels are projected in a single direction, so that erroneous division is serious, for example, a list of data in the table is often aligned at decimal point positions, and this kind of method is easy to perform erroneous division at decimal point positions. Therefore, the method is often poor in row-column division effect, poor in universality and low in practical value.

Disclosure of Invention

In view of the above-mentioned deficiencies in the prior art, the invention provides a method, an apparatus, an electronic device and a storage medium for identifying tables in an image, which can simultaneously solve the identification problems of various types of tables such as a full frame line table, a partial frame line table, a frame line-free table and the like, and improve the identification accuracy of the table structure and the content.

In order to achieve the above purpose, the invention adopts the technical scheme that:

the scheme provides a method for identifying a table in an image, which comprises the following steps:

s1, acquiring an image to be identified, wherein the image comprises a table area;

s2, performing cascade correction on the image by respectively using a deep learning model and a straight line detection method;

s3, detecting a table area in the image after the cascade correction by using a deep learning model;

s4, detecting a grid line of a row list from the table area, and reconstructing a table structure;

s5, segmenting the texts in the cells in the table structure, and recognizing the texts in the cells by using a text recognition model;

and S6, formatting and outputting the recognition result, and finishing the recognition of the table in the image.

The invention has the beneficial effects that: the method comprises the steps of obtaining an image to be identified; cascading rectification based on deep learning and straight line detection; table region detection based on deep learning; detecting a grid line of a row list from the table area image, and reconstructing a table structure; recognizing the text in the cell based on a text recognition model of deep learning; the identification result is output in a formatted mode, the identification problems of various types of tables such as a full frame line table, a partial frame line table, a frame line-free table and the like are solved, and the identification accuracy of the table structure and the content is improved. Compared with the prior art, the method has the advantages that the image direction classification precision is higher, the form structure identification is more accurate, and various forms such as wired forms, wireless forms and the like can be processed simultaneously.

Further, the step S2 includes the following steps:

s201, classifying the images in the horizontal and vertical directions by using a deep learning model, and adjusting the images to be in the horizontal direction;

s202, forward and backward classification is carried out on the transverse images by respectively utilizing a text detection method and a text recognition method, and the images are adjusted to be forward;

s203, correcting the forward image by using a Hough transform straight line detection method.

The beneficial effects of the further scheme are as follows: the method comprises the steps of image horizontal and vertical classification based on a deep learning model, image forward and backward classification based on text detection and text recognition and fine inclination correction algorithm based on Hough transformation line detection, and can realize high-precision inclination correction. Compared with the conventional algorithm for directly classifying the directions of the whole image, the image direction correction algorithm in S2 can accurately classify the image directions by using the direction information of the characters themselves, and has better generalization performance.

Still further, the step S4 includes the steps of:

s401, performing text detection processing on the table area to obtain a text detection box, and performing post-processing on the text detection box;

s402, carrying out binarization processing on the table area to obtain a binary image;

s403, eliminating characters and noise in the binary image, and obtaining the binary image with the noise and the characters removed;

s404, detecting visible table lines from the binary image with the noise and characters removed;

s405, classifying the table types according to the number of the visible table lines;

s406, according to the classification result, invisible table lines in the table area are detected from the mask image of the text detection frame;

s407, performing post-processing on the visible table lines and the invisible table lines, removing wrong table lines, and completing reconstruction of the table structure.

The beneficial effects of the further scheme are as follows: in the step, the table line detection algorithm utilizes the pixel information of the image and the text detection frame information at the same time, so that noise can be removed better, and the table line can be detected more accurately. The algorithm extracts visible table lines and invisible table lines simultaneously, so that the algorithm is more universal and can process wired tables, partial frame line tables, wireless tables and the like.

Still further, step S400 is further included before step S401;

and S400, performing secondary inclination correction on the table area in the image after the cascade correction.

The beneficial effects of the further scheme are as follows: the secondary inclination correction is carried out on the table area, so that the inclination correction precision can be improved, and the effect of the subsequent steps can be improved.

Still further, the step S403 includes the steps of:

s4031, constructing a text detection box mask image based on a text detection box according to the text detection box of the table area;

s4032, eliminating character pixels in the binary table region image according to the text detection box mask image;

s4033, traversing all connected domains on the binary table region image, and calculating to obtain a minimum circumscribed rectangle of each connected domain;

s4034, whether the size of the minimum circumscribed rectangle meets a preset size or not is judged, if yes, pixels corresponding to the connected domain are removed from the form image, a binary image with noise and characters removed is obtained, and the step S404 is entered, otherwise, the step S4033 is returned to.

The beneficial effects of the further scheme are as follows: the interference caused by the extraction of the subsequent table lines is effectively reduced by eliminating characters and noise of the binary image.

Still further, the step S404 includes the steps of:

s4041, respectively generating a transverse convolution kernel and a longitudinal convolution kernel;

the width w expression of the transverse convolution kernel is as follows:

w＝W*ratio

the width expression of the longitudinal convolution kernel is as follows:

h＝H*ratio

wherein, W and H represent the width and height of the table area image respectively, and ratio represents the proportionality coefficient;

s4042, performing two-dimensional convolution operation on the transverse convolution kernel in the binary image with the noise and the characters removed, filtering out transverse lines and removing longitudinal lines; and

performing two-dimensional convolution operation on the longitudinal convolution kernel in the binary image with the noise and the characters removed, filtering out longitudinal lines and removing transverse lines;

s4043, projecting the binary image after the transverse convolution kernel convolution along a horizontal method, counting the number of foreground pixels, and obtaining a transverse projection sequence; and

projecting the binary image after the longitudinal convolution kernel convolution along a vertical method, and counting the number of foreground pixels to obtain a longitudinal projection sequence;

s4044, respectively searching for peak coordinates in the transverse projection sequence and the longitudinal projection sequence by using a peak detection algorithm to obtain a longitudinal coordinate corresponding to the transverse line and a transverse coordinate corresponding to the longitudinal line, completing the detection of the visible table line, and going to step S405.

The beneficial effects of the further scheme are as follows: in the conventional table image denoising algorithm based on image processing, complicated image processing and complicated parameter setting are often needed to ensure that the algorithm can remove irrelevant pixels except for table lines, so the generalization performance of the algorithm is not good enough, and parameters are often needed to be modified according to specific data. According to the denoising method, text pixels are removed through the text detection box mask image, smaller noise pixels are removed through the connected domain-based screening method, a good denoising effect can be achieved, complex parameter setting is not needed, and the method has good generalization performance.

Still further, the step S406 includes the steps of:

s4061, according to the classification result, detecting invisible table lines from the mask image of the text detection box to obtain coordinate sets of the transverse and longitudinal invisible table lines;

s4062, according to the coordinate set, judging a certain invisible table line l in the same direction₁Some invisible table line l closest to it₂If the distance is less than the preset threshold value, deleting the invisible table line l₁And completing the detection of the invisible table lines in the table image, and proceeding to step S407, otherwise, returning to step S4061.

The beneficial effects of the further scheme are as follows: the detection result of the visible table lines is more accurate than that of the invisible table lines, so that when the distance between a certain visible table line and the invisible table line is close, the invisible table line is removed by taking the visible table line as the standard. The detection results of the visible table lines and the invisible table lines can be better combined by performing distance-based screening on any visible table lines and any invisible table lines.

Based on the method, the invention also discloses a device for identifying the table in the image, which comprises the following steps:

the image acquisition module is used for acquiring an image to be identified, which contains a table area;

the image correction module is used for performing cascade correction on the image by respectively utilizing a deep learning model and a straight line detection method;

the table area detection module is used for detecting the table area in the image after the cascade correction by using the deep learning model;

the table structure identification module is used for detecting a row list ruled line from the table area image and reconstructing a table structure;

the cell text recognition module is used for segmenting the texts in the cells in the table structure and recognizing the texts in the cells by using a text recognition model;

and the output module is used for outputting the identification result in a formatted manner to complete the identification of the form in the image.

Based on the method, the invention also provides an electronic device, which comprises a processor and a memory connected with the processor, wherein the memory is used for storing executable instructions of the processor;

the executable instructions are loaded and executed by a processor to implement the operations performed in the method of table recognition in an image according to any of claims 1 to 7.

Further, the computer readable storage medium has stored therein executable instructions, which are loaded and executed by a processor, to implement the operations performed in the table identification method in an image according to any one of claims 1 to 7.

The invention has the beneficial effects that: the method comprises the steps of obtaining an image to be identified; cascading rectification based on deep learning and straight line detection; table region detection based on deep learning; detecting a grid line of a row list from the table area image, and reconstructing a table structure; recognizing the text in the cell based on a text recognition model of deep learning; the identification result is output in a formatted mode, the identification problems of various types of tables such as a full frame line table, a partial frame line table, a frame line-free table and the like are solved, and the identification accuracy of the table structure and the content is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is an exemplary diagram of an input image in the present embodiment.

Fig. 3 is an exemplary diagram of the corrected image in the present embodiment.

Fig. 4 is a schematic diagram of table reconstruction in this embodiment.

Fig. 5 is a schematic diagram of the table area image input in the embodiment.

Fig. 6 is a schematic diagram of an image after binarization in this embodiment.

Fig. 7 is a schematic diagram of a text detection box mask image of an input image in this embodiment.

Fig. 8 is a schematic diagram of an image with text and noise removed from the input image in the present embodiment.

FIG. 9 is an image of the text detection box mask pattern of this embodiment after being appropriately expanded for column projection segmentation.

FIG. 10 is a pixel projection statistical chart in the present embodiment.

Fig. 11 is a schematic diagram illustrating the principle of the cell merging algorithm in this embodiment.

FIG. 12 is a schematic diagram of the system of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

Example 1

As shown in fig. 1, the present invention provides a method for identifying a table in an image, comprising the following steps:

in this embodiment, the image to be recognized includes one or more table areas, which are not limited in this embodiment. If the form is in a paper format, the paper form can be scanned by a scanner or shot by electronic equipment such as a mobile phone with a scanning function and the like, and an image to be recognized is obtained by projection conversion and correction. As shown in fig. 2.

S2, performing cascade correction on the image by respectively using a deep learning model and a straight line detection method; the realization method comprises the following steps:

and S203, correcting the forward image by using a Hough transform straight line detection method.

In the embodiment, the method comprises image horizontal and vertical classification based on a deep learning model, image forward and backward classification based on text detection and text recognition and a fine inclination correction algorithm based on Hough transformation line detection, and can realize direction correction with 100% accuracy and high-precision inclination correction. For convenience of description, the image direction is the direction of the characters in the image, the image direction is upward when the characters face upward, the image is forward, and the image direction is downward when the characters face downward, and the image is reverse. The horizontal and vertical directions of the image are the directions of text lines in the image, when the direction of the text lines is horizontal, the image direction can be described as horizontal according to the direction of the text lines, meanwhile, according to the character direction, the image direction in the state can be leftward or rightward, similarly, when the direction of the text lines is vertical, the image direction can be described as vertical according to the direction of the text lines, and can be described as upward, downward, forward or reverse according to the character direction. As in fig. 2, the image direction is vertical. The image landscape and portrait have visually more distinct differences, while the forward and reverse of the image can be more difficult to distinguish without knowledge of the language of the text in the image to be recognized. The horizontal and vertical classification is to classify the horizontal and vertical directions of the image direction based on a deep convolutional neural network model, for example, based on an efficient neural network model ResNet18 obtained by training a pre-training model. The forward and backward classification comprises: detecting the text in the image based on a text detection model to obtain a text coordinate frame, wherein the text detection model can be a high-efficiency text detection model Craft obtained based on pre-training model training, and the Craft detects the text by predicting a Gaussian heatmap of a single character and the connectivity among the characters, so that the method has the advantages of high detection speed, high detection precision and the like; after the text detection box obtained by the text detection model in the last step is selected, the longest front detection boxes are selected, for example, 10 detection boxes are selected, the confidence coefficient of the text recognition is calculated through the text recognition model to obtain a text recognition confidence coefficient p1, then the text recognition is carried out after the text detection box is rotated by 180 degrees to obtain a confidence coefficient p2, and the direction with the high confidence coefficient from p1 and p2 is taken as the page forward direction. The image correction algorithm based on Hough transform detects the direction of a straight line by Hough transform on an image which is corrected to be in the positive direction after being classified in the horizontal and vertical directions and forward and backward directions, and calculates the average inclination angle of the straight line in the image so as to correct the straight line. For an image acquired and processed in step S1, the image orientation is unknown. Firstly, classifying the images in the horizontal and vertical directions, adjusting the images to be in the horizontal direction, then classifying the images in the forward and reverse directions, adjusting the images to be in the forward direction, and then finely correcting the images by using an accurate correction algorithm based on Hough transform. It should be noted that, in this step, there is no particular requirement on the accuracy of text detection and text recognition, so that the input image can be reduced by one low-resolution image and then corrected, and the rotation angle from the uncorrected image to the corrected image or the transformation matrix can be calculated for correcting the image to be corrected with the original resolution, thereby increasing the speed of this step. Optionally, after the correction, some post-processing may be performed, for example, removing a large blank area in the picture, so as to improve the accuracy of subsequent table detection and text recognition. Fig. 3 is a corrected result of the graph of fig. 2.

in this embodiment, the target detection model based on deep learning is a target detection model obtained based on pre-training model training. Such as YOLO, FasterRCNN, CascadeRCNN, and the like. In the step, firstly, the collected original form image data is manually marked, and all form areas on each image are marked. The number of original image samples acquired in this step may be 2000, for example. The marked image is used for training based on a pre-training model, and data Augmentation (Augmentation) is carried out on input data in a data reading link in the training process so as to improve the generalization capability of the model and avoid overfitting. The augmentation may be, for example, random cropping, scaling, rotation, affine transformation, contrast adjustment, random erasure, and so forth.

And S4, detecting the grid lines of the row list from the table area and reconstructing the table structure.

In this embodiment, a table is composed of visible or invisible transverse or vertical separation lines, and generally, a cell is a minimum enclosed area surrounded by the separation lines. Any one table can be regarded as a basic table of "checkerboard" and some cells are merged, and it should be noted that "checkerboard" is only used for facilitating understanding, and the line width of each row and the column width of each column of the basic table of "checkerboard" can be different from the real checkerboard. For example, as shown in fig. 4, fig. 4a is a specific table, and fig. 4b is a "checkerboard" basic table corresponding to the table in fig. 4 a. The "name" cell in fig. 4a is the combination of cell 1 and cell 2 in fig. 4b, and the same applies to the case of combining other cells. Therefore, all visible or invisible transverse lines or longitudinal separation lines in the table can be detected firstly, and then the cells are merged by removing the unnecessary separation line segments, so that the table structure can be reconstructed. The realization method comprises the following steps:

s401, performing text detection processing on the form area to obtain a text detection box, and performing post-processing on the text detection box;

in this embodiment, a fully trained deep learning based text detection model is used to perform text detection on the spreadsheet image to detect all text detection boxes, for example, using a Craft text detector. The text detection box refers to a minimum rectangular surrounding box which can frame a piece of text in the image, and a detection box (bounding box) can be expressed as (x)₁,y₁,x₂,y₂) Wherein x is₁，y₁Respectively the horizontal and vertical coordinates, x, of the upper left corner of the detection frame₂,y₂Respectively the horizontal and vertical coordinates of the lower right corner of the detection frame. After the text detection box is obtained, some post-processing is performed on the detection box, for example, a box with an excessively large inclination angle of the text detection box is removed, which may be an inclined watermark word on the background.

in this embodiment, the pixel value of the foreground pixel in the table image is 255, and the pixel value of the background pixel is 0. The foreground pixels are elements such as form lines and characters, and the background pixels are pixels except the foreground pixels. Compared with a global binarization method, the method can better solve the problems of uneven image brightness or multiple colors of backgrounds and the like of the image caused by the influence of illumination conditions during image acquisition, and can enable characters and table lines in the image to be clearer and more complete. Optionally, preprocessing the binary image, for example, expanding the binary image, may improve the detection effect of the table line with the dotted line. As shown in fig. 5 to 6, fig. 5 shows an input table area image, and fig. 6 shows a binarized image.

S403, eliminating characters and noise in the binary image, and obtaining the binary image with the noise and the characters removed, wherein the implementation method comprises the following steps:

s4032, eliminating character pixels in the binary table area image according to the text detection box mask image;

In this embodiment, the characters and noise in the binary image may cause some interference to the subsequent table line extraction, so that the characters and noise need to be eliminated from the binary image first. The method comprises the following specific steps: (1) a text detection box-based Mask (Mask) image is created from the text detection boxes of the form image. The method is to create a new gray scale image, which is recorded as image M, and the size of the image is consistent with the form image and 0 pixel is filled. And traversing all the text detection boxes, and setting the pixel value of the solid rectangular area where the text detection box is positioned on the image M to be 255, thus finishing the manufacture of the mask image M. Optionally, because the text box detected by the actual text detector may be slightly larger or slightly smaller than the minimum bounding rectangle of the actual text, the detection box may be appropriately adjusted according to the actual detection effect of the selected text detector when the mask image is manufactured, for example, each edge of the text detection box is shrunk by 2 pixels toward the center of the box, so as to obtain the text detection box as compact as possible, and reduce the occlusion of the original table line in the image. It should be noted that, after the text is removed through the mask pattern, some pixels at the edge of the text are not removed because they exceed the shrunk detection box, but because these pixels are few, they can be ignored, and even if it needs to be eliminated, they can be easily eliminated from the image as noise by eliminating the small outline. As shown in fig. 7, fig. 7 is a text detection box mask image of an input image. (2) And eliminating character pixels in the binary table area image based on the mask image. And applying the mask image M as a mask on the binary table area image to set the element on the binary table area image corresponding to the white area in the mask image to be 0, so as to achieve the effect of eliminating the text. (3) Optionally, the noise is eliminated based on a contour detection method. The specific method is that all connected domains on the binary image are traversed, the minimum bounding rectangle is calculated for each connected domain, and if the size of the rectangle meets the condition, for example, the length and the width are less than 3 pixels, the pixel corresponding to the connected domain is removed from the image. As shown in fig. 8, fig. 8 is an image of an input image with text and noise removed.

S404, detecting visible table lines from the binary image with the noise and the characters removed, wherein the method for realizing the visible table lines comprises the following steps:

the width w of the transverse convolution kernel is expressed as follows:

w＝W*ratio

the width expression of the vertical convolution kernel is as follows:

h＝H*ratio

projecting the binary image after longitudinal convolution along a vertical method, and counting the number of foreground pixels to obtain a longitudinal projection sequence;

In this embodiment, the text and the noise in the binary image are eliminated, the binary image from which the noise and the text are removed is obtained, the binary image from which the text is removed is extracted, and then a visible table line is extracted from the image, taking the transverse visible table line detection as an example, the process is as follows: a transverse convolution kernel is generated with a height of 1 and a width of l ═ W · ratio. The convolution kernel width l cannot be too large or too small, a short line segment can be missed if the convolution kernel width l is too large, too small can introduce too much noise, and taking l W ratio can enable the filter width to adapt to images with different sizes, so that the algorithm generalization performance is better, and the ratio value can be, for example, 1/40.

And performing two-dimensional convolution on the input image by using the transverse convolution kernel to filter out transverse lines and remove longitudinal lines. The convolution operation can be iterated for multiple times to better filter out the transverse lines, and then the image is expanded to thicken the table lines and fill small gaps on the table lines, so that the subsequent flow can be better processed. Alternatively, the input image is dilated, for example, in the present embodiment, the input image is dilated only in the horizontal line detection.

And projecting the image along a horizontal method, counting the number of foreground pixels and obtaining a projection sequence. And if the height of the image is H, projecting the image along the horizontal direction and counting the number of foreground pixels to obtain a one-dimensional vector with the length of H.

A peak is found in the projection sequence. The peak detection algorithm is used to find the needed peak coordinates in the sequence by setting appropriate conditions, the peak coordinates correspond to the vertical coordinates of the horizontal lines, the condition setting may be, for example, that the peak protrusion degree is greater than 0.3 × H, it should be noted that the conditions for peak detection in this step are relatively loose, in order to avoid missing short horizontal lines as much as possible, and the following algorithm may delete the wrong horizontal line coordinates. After this step, a set L of visible table lines in the transverse and longitudinal directions can be obtained_r＝{r₁,r₂,r₃,...,r_n}L_c＝{c₁,c₂,c₃,...,c_nWherein r is_i,c_iThe vertical coordinates in the table area image of the horizontal table lines and the horizontal coordinates in the table area image of the vertical table lines are represented, respectively.

in this embodiment, the table types are classified according to the number of horizontal lines and the number of vertical lines of the visible table lines, and the specific threshold used for classification may be adjusted according to the specific application, for example, in this embodiment, the input data is a certain information statistical table, the number of rows and columns of the table is large, and the table type classification rule may be set as: when len (L)_r)<5,Len(L_c)<When 5, it is considered as a wireless form, when len (L)_r)<5,Len(L_c)>5 is considered a table with vertical lines only, when len (L)_r)>5,Len(L_c)<5 is considered a table with horizontal lines only, when len (L)_r)>5,Len(L_c)>5 is a table considered to be a full outline, where the len (x) function represents the number of elements in the computation set. It should be noted that, the table type classification is only for achieving the best recognition effect and the most efficient recognition process, and even if the table type is classified incorrectly in some extreme cases, the table can be recognized well, for example, a full-frame table with 2 rows and 10 columns, the number of horizontal lines of the table is easily known to be 3, and the number of vertical lines of the table is easily known to be 11, the full-frame table can be classified as a table with only vertical lines, but even if the classification is incorrect, the table can be recognized correctly according to the method of the present application, because after the table is recognized incorrectly as a table with only vertical lines, the invisible table lines in the table can be further detected, and finally, the obtained visible table lines and invisible table lines can be post-processed, and the correct table structure can be obtained.

S406, according to the classification result, invisible table lines in the table area are detected from the mask image of the text detection box, and the implementation method is as follows:

s4061, according to the classification result, detecting invisible table lines from the mask image of the text detection frame to obtain coordinate sets of the transverse and longitudinal invisible table lines;

s4062 rootAccording to the coordinate set, judging a certain invisible table line l in the same direction₁Some invisible table line l closest to it₂If the distance is less than the preset threshold value, deleting the invisible table line l₁Completing the detection of invisible table lines in the table image, and entering step S407, otherwise, returning to step S4061;

the expression for determining whether an invisible table line is to be removed is as follows:

d(l₁,l₂)<th

where th denotes a threshold value, d (l)₁,l₂) Representing an arbitrarily visible table line l₁And any invisible form line l in the same direction₂The distance between them.

In this embodiment, in detecting visible table lines from a table image from which noise and text are removed, the visible table lines can be detected by using the pixel features of the visible table lines, but the pixel features of the table lines themselves cannot be used for invisible table lines, and at this time, the features of text and background can be used. Taking a table with only vertical lines as an example, invisible transverse table lines in a table image need to be detected, the position of each line of characters can be located by detecting a peak value in a projection statistical sequence of foreground character pixels, and then the position of the invisible table lines between each line of characters can be found. Compared with the foreground character projection statistical effect, the background projection statistical effect is better because the length difference of the foreground characters is larger, a proper filtering condition is not easy to set to find a correct peak value when a transverse projection peak value is found, and when the background pixel transverse projection statistical effect is realized, the accumulated pixel number value of the background pixel projection is close to the image width at the row where the invisible table line is located, so that the height of the peak value is fixed, and the peak value is easier to filter from the projection sequence.

In addition, if the binarized form image is directly projected in this step, since the text lines have different shapes of text contents, many characters in the chinese character are in a top-bottom structure, and thus the line projection peak for the background pixels may appear in the middle area of the text lines, resulting in erroneous line segmentation. When detecting invisible list ruled lines, the wrong column segmentation is more serious because each column of characters of the list generally has a certain alignment relationship, which causes the space between the characters to form a more obvious peak value on background pixels, and therefore, the invisible longitudinal table ruled line detection is more difficult. In order to solve the problem of the projection segmentation error, a simple idea is to appropriately expand the image to fill up gaps which may interfere with the projection segmentation between the characters, but the size of the expansion convolution kernel and the number of expansion iterations are parameters which are difficult to set, and too small expansion has no effect, and too large expansion causes excessive adhesion of the characters. Therefore, the method for detecting invisible table lines based on the text detection box mask image is provided in the present application, the text detection box is used as the minimum bounding rectangle corresponding to the text pixels, so that gaps between texts are filled, the problem that the projection is mistakenly segmented, that is, a wrong peak value is detected is naturally avoided, and the text detection box mask image and the text detection box image in step S403 are multiplexed, so that no extra text detection cost is introduced. The specific steps for detecting invisible form lines in the form image from the text detection box mask image are as follows:

invisible form lines are detected. According to the type of the table, if the table is a full frame line table, the step is skipped, if the table only has vertical lines, the transversely invisible table line is detected, if the table only has horizontal lines, the longitudinally invisible table line is detected, if the table has no frame lines, the transversely and longitudinally invisible table lines are detected, and after the step, a coordinate set L 'of the transversely and longitudinally invisible table lines is obtained'_r＝{r₁,r₂,r₃,...,r_n}L'_c＝{c₁,c₂,c₃,...,c_nWherein r is_i,c_iThe ordinate in the table area image representing the transversely invisible table line and the abscissa in the table area image representing the longitudinally invisible table line, respectively, and L 'if the transversely or longitudinally invisible table line detection is not performed'_r，L'_cIs empty.

Invisible watchAnd (5) carrying out grid line post-treatment. Considering that there may be some visible form lines in the form image, there may be some duplication of detecting invisible form lines and visible form lines, in which case the visible form lines should be the main. Specifically, for a certain invisible table line l in the same direction₁And some invisible form line l closest to it₂If the distance between the two is less than a threshold value, deleting l₁The value of th may be set to, for example, 0.5 times the average of the heights of all the detection frames. FIG. 9 is an image of a text inspection box mask pattern appropriately expanded for column projection segmentation, FIG. 10 is a pixel projection statistical chart, the abscissa represents the x-axis coordinate, the ordinate represents the number of pixels, and the "x" symbol in the chart represents the x-axis coordinate of the column projection segmentation position, i.e., the list grid line

S407, performing post-processing on the visible table lines and the invisible table lines, removing the wrong table lines, and completing reconstruction of the table structure.

In this embodiment, the horizontal table line coordinates are sorted from the small to the large, and the area between every two adjacent 2 table lines is a row. Traversing all rows and deleting empty rows, i.e. deleting rows that do not contain any text detection box, deleting the larger coordinate corresponding to the coordinate of the row, and if a row table line passes through the centers of a plurality of text boxes, then this line also needs to be deleted. Optionally, the cells of the head row are merged. This step may be skipped to increase processing speed if the input image does not have a cell merge. In the present embodiment, there may be a case of cell merging in the input data only at the header, so in the present embodiment, the cell merging algorithm is performed only on the first 3 rows of the basic table. As shown in fig. 11, the solid line in the figure is a real table line, the dotted line is a line that needs to be removed from the basic table, that is,

cells

1 and 2 need to be merged up and down,

cells

3 and 4 need to be merged left and right, and the cell merging algorithm includes the following steps: and traversing all the line segments of the minimum table line, taking traversing all the vertical line segments as an example, judging whether each vertical partition line segment exists or not for each row of the table, if not, merging 2 cells in the horizontal direction, and processing the transverse partition lines in the same way. The method for judging the separation line segment specifically comprises the following steps:

for 2 end points of the line segment to be detected, such as A, B two points in fig. 11, the distance between the two points is recorded as L, and a rectangular area is cut out from the image according to the coordinates of the two end points, wherein the rectangular area has the length of L and the width of several pixels, for example, 10 pixels, so that the line segment to be detected is centered in the small area. And then calculating a line segment included angle formed by the two end points, rotating the image to the horizontal direction according to the included angle, projecting the clipping region in the horizontal direction, judging that a line segment exists between the two end points if the height of the projection peak value is more than half of L, and otherwise, judging that the line segment does not exist.

in this embodiment, the text in the cell is recognized by using the text recognition model based on deep learning, and the cell recognition result is filled in the restored table. For a cell which may contain multiple lines of text, the foreground text is expanded horizontally, then horizontal projection is performed, the range of each line in the cell is divided according to the peak value, and then a text recognition model of technical deep learning, such as CRNN, is used to perform text processing on each line to obtain the text content of a cell.

Example 2

Based on the above method, the present invention further provides an apparatus for identifying a table in an image, as shown in fig. 12, including:

the image acquisition module is used for acquiring an image to be identified, which contains a table area; the image correction module is used for performing cascade correction on the image by respectively utilizing a deep learning model and a straight line detection method; the table area detection module is used for detecting the table area in the image after the cascade correction by using the deep learning model; the table structure identification module is used for detecting a row list ruled line from the table area image and reconstructing a table structure; the cell text recognition module is used for segmenting the texts in the cells in the table structure and recognizing the texts in the cells by using a text recognition model; and the output module is used for outputting the identification result in a formatted manner to complete the identification of the form in the image.

In the embodiment, the image to be identified is obtained; cascading rectification based on deep learning and straight line detection; table region detection based on deep learning; detecting a grid line of a row list from the table area image, and reconstructing a table structure; recognizing the text in the cell based on a text recognition model of deep learning; the identification result is output in a formatted mode, the identification problems of various types of tables such as a full frame line table, a partial frame line table, a frame line-free table and the like are solved, and the identification accuracy of the table structure and the content is improved.

Example 3

The invention also provides an electronic device, which comprises a processor and a memory connected with the processor, wherein the memory is used for storing the executable instructions of the processor; the executable instructions are loaded and executed by the processor to implement the operations performed in the method of table recognition in an image as described in embodiment 1.

In this embodiment, the memory may be used to store a software program and various data, and may include a program storage area and a data storage area, where the program storage area may store an operating system and an application program required for at least one function. The storage data area may store the thumbnail image information or the object image information, etc. Further, the memory may include high speed random access memory or non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

Example 4

The invention also provides a computer-readable storage medium, in which executable instructions are stored, and the executable instructions are loaded and executed by a processor to implement the operations performed in the method for identifying a table in an image according to embodiment 1.

In this embodiment, the computer-readable storage medium includes, but is not limited to, various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims

1. A method for identifying a form in an image, comprising the steps of:

2. The method for recognizing a form in an image according to claim 1, wherein said step S2 includes the steps of:

3. The method for recognizing a form in an image according to claim 1, wherein said step S4 includes the steps of:

4. The method for identifying a table in an image according to claim 3, further comprising a step S400 before the step S401;

and S400, performing secondary inclination correction processing on the table area in the image after the cascade correction.

5. The method for identifying a table in an image according to claim 3, wherein the step S403 comprises the steps of:

6. The method for identifying tables in images according to claim 3, wherein said step S404 comprises the steps of:

the width w expression of the transverse convolution kernel is as follows:

w＝W*ratio

the width expression of the longitudinal convolution kernel is as follows:

h＝H*ratio

7. The method for identifying a table in an image according to claim 3, wherein the step S406 comprises the steps of:

s4062, according to the coordinate set, judging a certain invisible table line l in the same direction₁Some invisible table line l closest to it₂If the distance is less than the preset threshold value, deleting the invisible table line l₁Finishing to form in the imageAnd detecting invisible table lines, and proceeding to step S407, otherwise, returning to step S4061.

8. An apparatus for recognizing a form in an image, comprising:

9. An electronic device comprising a processor, and a memory coupled to the processor for storing executable instructions for the processor;

10. A computer-readable storage medium having stored thereon executable instructions to be loaded and executed by a processor to perform the operations performed in the method for table identification in an image according to any one of claims 1 to 7.