CN112528813B - Table recognition method, device and computer readable storage medium - Google Patents

Table recognition method, device and computer readable storage medium Download PDF

Info

Publication number
CN112528813B
CN112528813B CN202011407580.8A CN202011407580A CN112528813B CN 112528813 B CN112528813 B CN 112528813B CN 202011407580 A CN202011407580 A CN 202011407580A CN 112528813 B CN112528813 B CN 112528813B
Authority
CN
China
Prior art keywords
line
text
image
candidate
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011407580.8A
Other languages
Chinese (zh)
Other versions
CN112528813A (en
Inventor
陈静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yuncong Enterprise Development Co ltd
Original Assignee
Shanghai Yuncong Enterprise Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yuncong Enterprise Development Co ltd filed Critical Shanghai Yuncong Enterprise Development Co ltd
Priority to CN202011407580.8A priority Critical patent/CN112528813B/en
Publication of CN112528813A publication Critical patent/CN112528813A/en
Application granted granted Critical
Publication of CN112528813B publication Critical patent/CN112528813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Abstract

The invention relates to the technical field of form recognition, and particularly provides a form recognition method, aiming at solving the technical problems of poor generalization and poor accuracy of the existing form recognition method. According to the method provided by the embodiment of the invention, a table line foreground image and a text foreground image of a table image to be recognized can be obtained by adopting a preset image recognition model; acquiring a table structure of a table image to be identified according to the table line foreground image; acquiring a final text line position associated with the cell according to the position of the cell, the position of the first text line and the position of the second text line; and acquiring a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, performing text recognition on the text line image, and storing recognized text information into the cell to form a recognized table. Through the steps, the accuracy of table identification can be improved, and the generalization performance is good.

Description

Table recognition method, device and computer readable storage medium
Technical Field
The invention relates to the technical field of form identification, in particular to a form identification method, a form identification device and a computer readable storage medium.
Background
Forms are common contents of texts, the forms in images are often required to be converted into editable file formats in work, manual input is the simplest method, and the method is low in efficiency and easy to make mistakes when a large number of forms are processed. At present, a more common method is to use image acquisition equipment to acquire a form image, and then extract form frame lines by using image features, such as text block features, form area logical relationship features, line intersection features, and the like, to identify the form image, and further output an identification result.
However, the above-mentioned recognition method has a good recognition effect on a clear or relatively simple form image, and has a poor recognition effect on a low-quality form image or a relatively complex form image with problems such as form line breakage and bending, and even has a poor recognition accuracy when recognition is missed.
Accordingly, there is a need in the art for a new table identification scheme to address the above-mentioned problems.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention is proposed to provide a table recognition method, a table recognition apparatus and a computer-readable storage medium, which solve or at least partially solve the technical problems of poor generalization and poor recognition accuracy of the existing table recognition methods.
In a first aspect, a table identification method is provided, where the table identification method includes:
acquiring a table line foreground image and a text foreground image of a table image to be identified by adopting a preset image identification model;
acquiring a table structure of the table image to be identified according to the table line foreground image;
performing text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized;
acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the position of a cell in the table structure;
acquiring a final text line position associated with the cell according to the position of the cell, the first text line position and the second text line position;
and acquiring a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, performing text recognition on the text line image, and storing recognized text information into the cell to form a recognized table.
In one technical solution of the above form recognition method, the step of "obtaining a form line foreground image and a text foreground image of a form image to be recognized" specifically includes:
adopting the preset image recognition model to carry out position recognition on form lines and text lines on the form image to be recognized, and acquiring a form line foreground image and a text foreground image according to a position recognition result;
the preset image recognition model is obtained by training based on a table image sample and a corresponding table line foreground label graph and a corresponding text foreground label graph;
the table line foreground label graph and the table image sample have the same size, and the label value stored in the position of each pixel point in the table line foreground label graph depends on whether a table line exists at the corresponding position in the table image sample;
the text foreground label image and the form image sample have the same size, and the label value stored in the position of each pixel point in the text foreground label image depends on whether a text line exists at the corresponding position in the form image sample.
In a technical solution of the above table recognition method, the preset image recognition model is obtained by training in the following manner:
using the loss function shown below
Figure 492579DEST_PATH_IMAGE001
Calculating a loss value of the image recognition model according to the table image sample and the corresponding table line foreground label graph and text foreground label graph;
Figure 608303DEST_PATH_IMAGE002
wherein, thenRepresents the number of table image samples,
Figure 995553DEST_PATH_IMAGE003
(ii) a The above-mentionedhAndwrespectively representing the height and width of the table image sample,
Figure 335267DEST_PATH_IMAGE004
Figure 96550DEST_PATH_IMAGE005
(ii) a The above-mentioned
Figure 999653DEST_PATH_IMAGE006
Is shown according tonThe table line foreground label graph and the text foreground label graph corresponding to each table image sample are determined at the second stepnPosition in a form image sample
Figure 287415DEST_PATH_IMAGE007
The tag value of (d); the above-mentioned
Figure 407817DEST_PATH_IMAGE008
Representing the output of the image recognition model at the second stagenPosition in a form image sample
Figure 718713DEST_PATH_IMAGE007
The tag predicted value of (c);
and calculating the gradient corresponding to each model parameter in the image recognition model according to the loss value, and updating the model parameters of the image recognition model according to the gradient back propagation to perform model optimization so as to complete training.
In one technical solution of the above table identification method, the table image sample and the corresponding table line foreground label map and text foreground label map are obtained by the following method:
randomly setting a filling text for each cell in the table image;
generating an initial table line foreground label graph according to the position of the edge line of the cell; generating an initial text foreground label image according to the position of the filling text;
respectively carrying out horizontal direction alignment and vertical direction alignment on the filled texts, wherein the horizontal direction alignment and the vertical direction alignment respectively comprise left alignment, middle alignment and right alignment;
carrying out random blank filling on the area between the edge line of the cell and the filling text in the cell;
randomly setting the edge line width of the cell and the proportion of the dotted line in the edge line to obtain a regular form image;
decomposing the regular form image into a filling text foreground, a form line foreground, a cell inner background and a form outer background, and respectively carrying out random pixel value filling on the filling text foreground, the form line foreground, the cell inner background and the form outer background to form an initial form image sample;
carrying out random perspective processing and rotation processing on the initial form image sample to obtain a final form image sample;
and simultaneously respectively carrying out the same perspective processing and rotation processing on the initial text foreground label image and the initial table line foreground label image according to the perspective processing mode and the rotation processing mode adopted by the final table image sample to obtain the final text foreground label image and the final table line foreground label image.
In one embodiment of the above table identification method, "acquiring a final text line position associated with the cell" specifically includes:
according to the result of text line detection on the form image to be recognized, acquiring a text line detection frame and a first text line position corresponding to a text line in the form image to be recognized;
according to the positions of the cells, acquiring overlapping detection boxes of the text line detection boxes, which have overlapping areas with the cells, and calculating the overlapping proportion of the cells and each overlapping detection box;
and selecting the overlapping detection frame with the overlapping proportion more than or equal to a preset overlapping threshold value, and taking the first text line position corresponding to the overlapping detection frame as the final text line position associated with the cell.
In an embodiment of the above table identifying method, after the step of "setting the first text line position corresponding to the overlap detection box as the final text line position associated with the cell", the method further includes:
judging whether a cell at the position of the unassociated text line exists or not;
when a cell which is not associated with a text line position exists, acquiring a second text line position corresponding to the cell;
respectively carrying out horizontal projection and vertical projection on the second text line position to obtain a horizontal projection line segment in the horizontal direction and a vertical projection line segment in the vertical direction;
judging whether the sum of the lengths of the horizontal projection line segment and the vertical projection line segment is greater than or equal to a length threshold value;
and if so, taking the second text line position as the final text line position associated with the cell.
In one technical solution of the above form recognition method, "obtaining a text line image corresponding to an associated cell from the form image to be recognized according to the final text line position" specifically includes:
acquiring a text line image corresponding to the final text line position in the form image to be identified;
if the final text line position is associated with a cell, directly taking the text line image as a text line image corresponding to the cell;
and if the final text line position is associated with a plurality of cells, segmenting the text line image according to the position of each cell and the final text line position to obtain a text line image segment corresponding to each cell.
In one technical solution of the above table identification method, "obtaining the table structure of the table image to be identified according to the table line foreground map" specifically includes:
extracting a horizontal contour line and a vertical contour line in the table line foreground graph;
acquiring the intersection point of the transverse contour line and the vertical contour line and taking the intersection point as a candidate intersection point;
judging whether other adjacent candidate intersection points exist on the transverse contour line corresponding to each candidate intersection point;
if so, connecting each candidate intersection point and the adjacent other candidate intersection points corresponding to the candidate intersection points to obtain a first candidate connecting line;
if not, connecting each candidate intersection point and a first candidate intersection point arranged behind each candidate intersection point according to a preset candidate intersection point arrangement sequence, and selecting a connecting line with a horizontal angle smaller than or equal to a preset angle threshold value as a second candidate connecting line;
calculating the coverage proportion of each candidate connecting line to be analyzed in the table line foreground graph, and selecting the candidate connecting line to be analyzed with the coverage proportion more than or equal to a preset coverage threshold value as an effective connecting line;
searching a cell path according to the effective connecting line to obtain a candidate cell;
concatenating the candidate cells having the common vertices to generate one or more table structures;
the candidate connecting line to be analyzed is a first candidate connecting line or a second candidate connecting line, and the preset coverage threshold corresponding to the first candidate connecting line is smaller than the preset coverage threshold corresponding to the second candidate connecting line.
In one technical solution of the above table identification method, the step of "extracting the horizontal contour lines and the vertical contour lines in the table line foreground map" specifically includes:
carrying out corrosion and expansion treatment on the surface line foreground graph by using a preset transverse check to obtain the transverse contour line;
corroding and expanding the table line foreground graph by using a preset vertical check to obtain the vertical contour line;
the preset horizontal kernels are matrixes of 1 row and N columns, matrix elements are all 1, the preset vertical kernels are matrixes of N rows and 1 column, and matrix elements are all 1;
and/or the like and/or,
the step of "calculating the coverage ratio of each candidate connecting line to be analyzed in the table line foreground map" specifically includes:
acquiring the total number of pixels of the candidate connecting line to be analyzed passing through the table line foreground graph;
acquiring the number of pixels belonging to the table line in the table line foreground graph from the pixels of the candidate connecting line to be analyzed passing through the table line foreground graph;
calculating the ratio of the number of the pixels belonging to the table line in the table line foreground graph to the total number of the pixels to obtain the coverage proportion of the candidate connecting line to be analyzed;
and/or the like and/or,
the step of performing cell path search according to the effective connection line to obtain the candidate cell specifically includes:
sequentially searching for effective connecting lines from each effective connecting line according to a clockwise searching direction formed by horizontal-vertical-horizontal-vertical to obtain other corresponding effective connecting lines of each effective connecting line;
and generating candidate cells according to each effective connecting line and the other corresponding effective connecting lines.
In one embodiment of the above table identification method, after the step of "connecting candidate cells having a common vertex to generate one or more table structures", the method further includes generating a table index for each of the table structures in the following manner:
connecting the horizontal edges of the candidate cells with the common vertex in the table structure to form a candidate horizontal line;
connecting longitudinal edges of candidate cells having a common vertex in the table structure to form a candidate longitudinal line;
acquiring a candidate transverse line to be processed which can be extended and overlapped with other candidate transverse lines, and extending and combining the candidate transverse line to be processed and the other candidate transverse lines to form a combined candidate transverse line;
acquiring candidate vertical lines to be processed which can be extended and overlapped with other candidate vertical lines, and extending and combining the candidate vertical lines to be processed and the other candidate vertical lines to form combined candidate vertical lines;
distributing horizontal index numbers to the merged candidate horizontal lines and other candidate horizontal lines which are not merged in the extending way according to a preset horizontal line arrangement sequence;
respectively allocating longitudinal index numbers to the merged candidate longitudinal lines and other candidate longitudinal lines which are not merged in the extending way according to a preset longitudinal line arrangement sequence;
and generating a table index of a table structure according to the transverse index number and the longitudinal index number.
In a second aspect, there is provided a form recognition apparatus, comprising:
the foreground image acquisition module is configured to acquire a table line foreground image and a text foreground image of a table image to be recognized by adopting a preset image recognition model;
a table structure obtaining module configured to obtain a table structure of the table image to be identified according to the table line foreground map;
a text line association module configured to perform text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized; acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the position of a cell in the table structure; acquiring a final text line position associated with the cell according to the position of the cell, the first text line position and the second text line position;
and the table generating module is configured to acquire a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, perform text recognition on the text line image and store the recognized text information into the cell to form a recognized table.
In an aspect of the above table identifying apparatus, the foreground map obtaining module is further configured to perform the following operations:
adopting the preset image recognition model to carry out position recognition on form lines and text lines on the form image to be recognized, and acquiring a form line foreground image and a text foreground image according to a position recognition result;
the preset image recognition model is obtained by training based on a table image sample and a corresponding table line foreground label graph and a corresponding text foreground label graph;
the table line foreground label graph and the table image sample have the same size, and the label value stored in the position of each pixel point in the table line foreground label graph depends on whether a table line exists at the corresponding position in the table image sample;
the text foreground label image and the form image sample have the same size, and the label value stored in the position of each pixel point in the text foreground label image depends on whether a text line exists at the corresponding position in the form image sample.
In an aspect of the above table identifying apparatus, the foreground map obtaining module is further configured to perform the following operations:
the preset image recognition model is obtained by training in the following way:
using the loss function shown below
Figure 458130DEST_PATH_IMAGE001
Calculating a loss value of the image recognition model according to the table image sample and the corresponding table line foreground label graph and text foreground label graph;
Figure 475765DEST_PATH_IMAGE002
wherein, thenRepresents the number of table image samples,
Figure DEST_PATH_IMAGE009
(ii) a The above-mentionedhAndwrespectively representing the height and width of the table image sample,
Figure 672129DEST_PATH_IMAGE010
Figure 1479DEST_PATH_IMAGE011
(ii) a The above-mentioned
Figure 544587DEST_PATH_IMAGE012
Is shown according tonThe table line foreground label graph and the text foreground label graph corresponding to each table image sample are determined at the second stepnPosition in a form image sample
Figure 416728DEST_PATH_IMAGE013
The tag value of (d); the above-mentioned
Figure 269146DEST_PATH_IMAGE008
Representing the output of the image recognition model at the second stagenPosition in a form image sample
Figure 289055DEST_PATH_IMAGE013
The tag predicted value of (c);
and calculating the gradient corresponding to each model parameter in the image recognition model according to the loss value, and updating the model parameters of the image recognition model according to the gradient back propagation to perform model optimization so as to complete training.
In an aspect of the above table identifying apparatus, the foreground map obtaining module is further configured to perform the following operations:
the table image sample and the corresponding table line foreground label image and text foreground label image are obtained by the following method:
randomly setting a filling text for each cell in the table image;
generating an initial table line foreground label graph according to the position of the edge line of the cell; generating an initial text foreground label image according to the position of the filling text;
respectively carrying out horizontal direction alignment and vertical direction alignment on the filled texts, wherein the horizontal direction alignment and the vertical direction alignment respectively comprise left alignment, middle alignment and right alignment;
carrying out random blank filling on the area between the edge line of the cell and the filling text in the cell;
randomly setting the edge line width of the cell and the proportion of the dotted line in the edge line to obtain a regular form image;
decomposing the regular form image into a filling text foreground, a form line foreground, a cell inner background and a form outer background, and respectively carrying out random pixel value filling on the filling text foreground, the form line foreground, the cell inner background and the form outer background to form an initial form image sample;
carrying out random perspective processing and rotation processing on the initial form image sample to obtain a final form image sample;
and simultaneously respectively carrying out the same perspective processing and rotation processing on the initial text foreground label image and the initial table line foreground label image according to the perspective processing mode and the rotation processing mode adopted by the final table image sample to obtain the final text foreground label image and the final table line foreground label image.
In an aspect of the above table identifying apparatus, the text line association module is further configured to perform the following operations:
according to the result of text line detection on the form image to be recognized, acquiring a text line detection frame and a first text line position corresponding to a text line in the form image to be recognized;
according to the positions of the cells, acquiring overlapping detection boxes of the text line detection boxes, which have overlapping areas with the cells, and calculating the overlapping proportion of the cells and each overlapping detection box;
and selecting the overlapping detection frame with the overlapping proportion more than or equal to a preset overlapping threshold value, and taking the first text line position corresponding to the overlapping detection frame as the final text line position associated with the cell.
In one embodiment of the above table identifying apparatus, after the step of "taking the first text line position corresponding to the overlap detection box as the final text line position associated with the cell", the text line associating module is further configured to perform the following operations:
judging whether a cell at the position of the unassociated text line exists or not;
when a cell which is not associated with a text line position exists, acquiring a second text line position corresponding to the cell;
respectively carrying out horizontal projection and vertical projection on the second text line position to obtain a horizontal projection line segment in the horizontal direction and a vertical projection line segment in the vertical direction;
judging whether the sum of the lengths of the horizontal projection line segment and the vertical projection line segment is greater than or equal to a length threshold value;
and if so, taking the second text line position as the final text line position associated with the cell.
In an aspect of the above table identifying apparatus, the table generating module is further configured to perform the following operations:
acquiring a text line image corresponding to the final text line position in the form image to be identified;
if the final text line position is associated with a cell, directly taking the text line image as a text line image corresponding to the cell;
and if the final text line position is associated with a plurality of cells, segmenting the text line image according to the position of each cell and the final text line position to obtain a text line image segment corresponding to each cell.
In an aspect of the above table identifying apparatus, the table structure obtaining module is further configured to perform the following operations:
extracting a horizontal contour line and a vertical contour line in the table line foreground graph;
acquiring the intersection point of the transverse contour line and the vertical contour line and taking the intersection point as a candidate intersection point;
judging whether other adjacent candidate intersection points exist on the transverse contour line corresponding to each candidate intersection point;
if so, connecting each candidate intersection point and the adjacent other candidate intersection points corresponding to the candidate intersection points to obtain a first candidate connecting line;
if not, connecting each candidate intersection point and a first candidate intersection point arranged behind each candidate intersection point according to a preset candidate intersection point arrangement sequence, and selecting a connecting line with a horizontal angle smaller than or equal to a preset angle threshold value as a second candidate connecting line;
calculating the coverage proportion of each candidate connecting line to be analyzed in the table line foreground graph, and selecting the candidate connecting line to be analyzed with the coverage proportion more than or equal to a preset coverage threshold value as an effective connecting line;
searching a cell path according to the effective connecting line to obtain a candidate cell;
concatenating the candidate cells having the common vertices to generate one or more table structures;
the candidate connecting line to be analyzed is a first candidate connecting line or a second candidate connecting line, and the preset coverage threshold corresponding to the first candidate connecting line is smaller than the preset coverage threshold corresponding to the second candidate connecting line.
In an aspect of the above table identifying apparatus, the table structure obtaining module is further configured to perform the following operations:
carrying out corrosion and expansion treatment on the surface line foreground graph by using a preset transverse check to obtain the transverse contour line;
corroding and expanding the table line foreground graph by using a preset vertical check to obtain the vertical contour line;
the preset horizontal kernels are matrixes of 1 row and N columns, matrix elements are all 1, the preset vertical kernels are matrixes of N rows and 1 column, and matrix elements are all 1;
and/or the like and/or,
the step of "calculating the coverage ratio of each candidate connecting line to be analyzed in the table line foreground map" specifically includes:
acquiring the total number of pixels of the candidate connecting line to be analyzed passing through the table line foreground graph;
acquiring the number of pixels belonging to the table line in the table line foreground graph from the pixels of the candidate connecting line to be analyzed passing through the table line foreground graph;
calculating the ratio of the number of the pixels belonging to the table line in the table line foreground graph to the total number of the pixels to obtain the coverage proportion of the candidate connecting line to be analyzed;
and/or the like and/or,
the step of performing cell path search according to the effective connection line to obtain the candidate cell specifically includes:
sequentially searching for effective connecting lines from each effective connecting line according to a clockwise searching direction formed by horizontal-vertical-horizontal-vertical to obtain other corresponding effective connecting lines of each effective connecting line;
and generating candidate cells according to each effective connecting line and the other corresponding effective connecting lines.
In an aspect of the above table identifying apparatus, after the step of "connecting candidate cells having a common vertex to generate one or more table structures", the table structure obtaining module is further configured to perform the following operations:
connecting the horizontal edges of the candidate cells with the common vertex in the table structure to form a candidate horizontal line;
connecting longitudinal edges of candidate cells having a common vertex in the table structure to form a candidate longitudinal line;
acquiring a candidate transverse line to be processed which can be extended and overlapped with other candidate transverse lines, and extending and combining the candidate transverse line to be processed and the other candidate transverse lines to form a combined candidate transverse line;
acquiring candidate vertical lines to be processed which can be extended and overlapped with other candidate vertical lines, and extending and combining the candidate vertical lines to be processed and the other candidate vertical lines to form combined candidate vertical lines;
distributing horizontal index numbers to the merged candidate horizontal lines and other candidate horizontal lines which are not merged in the extending way according to a preset horizontal line arrangement sequence;
respectively allocating longitudinal index numbers to the merged candidate longitudinal lines and other candidate longitudinal lines which are not merged in the extending way according to a preset longitudinal line arrangement sequence;
and generating a table index of a table structure according to the transverse index number and the longitudinal index number.
In a third aspect, a table recognition apparatus is provided, which includes a processor and a storage device, wherein the storage device is adapted to store a plurality of program codes, and the program codes are adapted to be loaded and run by the processor to perform the table recognition method according to any one of the above-mentioned technical solutions.
In a fourth aspect, a computer-readable storage medium is provided, in which a plurality of program codes are stored, the program codes being adapted to be loaded and run by a processor to perform the table identification method according to any of the above-mentioned technical solutions.
One or more technical schemes of the invention at least have one or more of the following beneficial effects:
in the technical scheme of the invention, a table line foreground image and a text foreground image of a table image to be recognized are obtained by adopting a preset image recognition model; acquiring a table structure of a table image to be identified according to the table line foreground image; performing text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized; acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the positions of the cells in the table structure; acquiring a final text line position associated with the cell according to the position of the cell, the position of the first text line and the position of the second text line; and acquiring a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, performing text recognition on the text line image, and storing recognized text information into the cell to form a recognized table.
When the table structure in the table image to be recognized is obtained, the table structure of the table image to be recognized can be reconstructed according to the position information of the table lines in the table line foreground graph, even if the low-quality image has the problems of table line breakage, bending and the like, the table lines can be connected again according to the position information of the table lines in the table line foreground graph, and a regular table structure is constructed, namely, the correction of the table lines having the problems of breakage, bending and the like is realized.
When the text information stored in each cell in the table structure is obtained, the text line position corresponding to the cell can be obtained according to the text line position (first text line position) of the text line in the table image to be recognized and the position of the cell, and then the relevant image is obtained from the table image to be recognized according to the text line position for text recognition, wherein the recognized text information is the information to be stored in the cell. However, in practical applications, the fonts of a part of text in the form image to be recognized may be relatively small, and when the text line detection is performed on the form image to be recognized, the text lines with relatively small fonts may be omitted. In contrast, the position comparison can be performed according to the text line position (second text line position) of each text line in the image to be recognized stored in the text foreground image and the position of the cell, so as to obtain a text line position that is the same as or close to the position of the cell, and according to the text line position (final text line position), the relevant image is obtained from the form image to be recognized for text recognition, and the recognized text information is the information to be stored in the cell. By the method, text missing recognition in the image to be recognized can be prevented, or recognized text information is stored in wrong cells, so that the text recognition accuracy of the form image to be recognized is improved.
Drawings
Embodiments of the invention are described below with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart illustrating the main steps of a table identification method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a sample form image of one embodiment of the present invention;
FIG. 3 is a label diagram of a form image sample of one embodiment of the invention;
FIG. 4 is a schematic diagram of a table line foreground diagram of one embodiment of the present invention;
FIG. 5 is a schematic illustration of a transverse profile of one embodiment of the present invention;
FIG. 6 is a schematic illustration of the vertical profile of one embodiment of the present invention;
FIG. 7 is a schematic diagram of candidate intersections for one embodiment of the present invention;
FIG. 8 is a schematic diagram of a candidate link according to one embodiment of the invention;
FIG. 9 is a diagram illustrating coverage of candidate links in a foreground plot of table lines according to an embodiment of the present invention;
FIG. 10 is a schematic illustration of search directions for one embodiment of the present invention;
FIG. 11 is a schematic diagram of a cell path search of one embodiment of the present invention;
FIG. 12 is a table index of a table structure of one embodiment of the present invention;
FIG. 13 is a diagram of a form image text line detection box according to one embodiment of the invention;
FIG. 14 is a schematic illustration of a text foreground of an embodiment of the present invention;
FIG. 15 is an enlarged partial schematic view of the foreground of the text of FIG. 14;
fig. 16 is a main configuration block diagram of a table identifying apparatus according to an embodiment of the present invention.
List of reference numerals:
11: a foreground image acquisition module; 12: a table structure acquisition module; 13: a text line association module; 14: and a table generation module.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.
In the conventional form recognition method, after a form image is acquired by using an image acquisition device, form frame lines are extracted by using image features, such as text block features, form area logical relationship features, line intersection features and the like, to recognize the form image, and then a recognition result is output. However, the above-mentioned recognition method has a good recognition effect on a clear or relatively simple form image, and has a poor recognition effect on a form image with a large blur or interference or a relatively complex form image, and the accuracy is poor.
In the embodiment of the invention, a preset image recognition model can be adopted to obtain a table line foreground image and a text foreground image of a table image to be recognized; acquiring a table structure of a table image to be identified according to the table line foreground image; performing text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized; acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the positions of the cells in the table structure; acquiring a final text line position associated with the cell according to the position of the cell, the position of the first text line and the position of the second text line; and acquiring a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, performing text recognition on the text line image, and storing recognized text information into the cell to form a recognized table. Through the steps, the defects of poor generalization and poor recognition accuracy of the conventional form recognition method are completely overcome, the low-quality images with the problems of form line fracture, bending and the like can be recognized, and the text recognition accuracy of the form image to be recognized can also be improved.
In an application scenario of the present invention, a unit wants to recognize a form in a ticket image, the ticket image may be input into a computer device installed with a form recognition apparatus according to an embodiment of the present invention, so that the computer device can recognize the form in the ticket image by using the form recognition apparatus, and the computer device may display the recognized form through a screen after the recognition is completed, or load the recognized form into Microsoft Office Excel software for display.
Referring to fig. 1, fig. 1 is a flow chart illustrating the main steps of a table identification method according to an embodiment of the present invention. As shown in fig. 1, the table identification method in the embodiment of the present invention mainly includes the following steps:
step S101: and acquiring a table line foreground image and a text foreground image of the table image to be recognized by adopting a preset image recognition model.
The table line foreground diagram is a diagram for separately separating table lines from the table image to be recognized and showing the positions of the table lines in the table image to be recognized, and the table line foreground diagram only shows whether the table lines exist and the positions of the table lines, and is irrelevant to whether the actual thickness of the table lines breaks or not.
The text foreground image is a image which is used for independently separating a text from a form image to be recognized and representing the position of the text in the form image to be recognized, and the text foreground image only represents whether the text exists and the position of the text and is irrelevant to the actual content of the text.
In this embodiment, the table line foreground map and the text foreground map may be represented by different channel images in an RGB three-channel image, so that the table line foreground map and the text foreground map are displayed in one image at the same time. For example: the table line foreground map may be a 0 channel image (blue channel image) and the text foreground map may be a 1 channel image (green channel image).
In one embodiment, the table line foreground map and the text foreground map of the table image to be recognized can be obtained according to the following steps:
carrying out position recognition on form lines and text lines on a form image to be recognized by adopting a preset image recognition model, and acquiring a form line foreground image and a text foreground image according to a position recognition result; the preset image recognition model is obtained by training based on the table image sample and the corresponding table line foreground label graph and text foreground label graph; the table line foreground label graph and the table image sample have the same size, and the label value stored at the position of each pixel point in the table line foreground label graph depends on whether a table line exists at the corresponding position in the table image sample; the text foreground label image and the form image sample have the same size, and the label value stored in the position of each pixel point in the text foreground label image depends on whether a text line exists at the corresponding position in the form image sample. In the embodiment, a preset image recognition model is used for obtaining a table line foreground graph and a text foreground graph, wherein the table line foreground graph represents the position of a table line in a table image to be recognized, so that the table structure of the table image to be recognized can be conveniently obtained through the table line foreground graph in the follow-up process; the text foreground graph represents the position of the text line in the form image to be recognized, so that the recognized text line is convenient to be overlooked and filled in gaps through the text foreground graph subsequently.
In a possible implementation manner, if a table line exists at a certain position in the table image sample, the tag value stored at the corresponding position in the table line foreground tag map is 1, otherwise, the tag value is 0; similarly, if a text line exists at a certain position in the table image sample, the label value stored at the corresponding position in the text foreground label map is 1, otherwise, the label value is 0.
In one possible embodiment, as shown in fig. 2 and 3, fig. 2 represents a generated sample of a form image, figure 3 is a table line foreground-text foreground label diagram of the table image sample shown in figure 2, wherein the line marked with 1 represents the table line in the table line foreground label diagram and the table line is stored in the 0 channel of the table line foreground-text foreground label diagram, namely, the 0-channel image of the table line foreground-text foreground label image is the table line foreground label image of the table image sample, the rectangular box indicated by label 2 represents the text line in the text foreground label image and the text line is stored in the 1-channel of the table line foreground-text foreground label image, namely, the 1-channel image of the table line foreground-text foreground label image is the text foreground label image of the table image sample. And (4) decomposing the table lines and the text lines of the 0-channel image and the 1-channel image in the figure 3 to obtain a table line foreground label image and a text foreground label image.
In one embodiment, the preset image recognition model is obtained by training in the following way: using the loss function shown in equation (1)
Figure 494908DEST_PATH_IMAGE001
Calculating a loss value of the image recognition model according to the table image sample and the corresponding table line foreground label image and text foreground label image;
Figure 64299DEST_PATH_IMAGE014
(1)
wherein the content of the first and second substances,nrepresents the number of table image samples,
Figure 87618DEST_PATH_IMAGE009
handwrespectively representing the height and width of the table image sample,
Figure 532506DEST_PATH_IMAGE010
Figure 417417DEST_PATH_IMAGE011
Figure 529729DEST_PATH_IMAGE015
is shown according tonThe table line foreground label graph corresponding to each table image sample and the text foreground label graph are determined to be in the second placenPosition in a form image sample
Figure 458371DEST_PATH_IMAGE013
The tag value of (d);
Figure 656134DEST_PATH_IMAGE016
representing the output of the image recognition modelnPosition in a form image sample
Figure 577691DEST_PATH_IMAGE013
The tag predicted value of (c); and calculating the gradient corresponding to each model parameter in the image recognition model according to the loss value, and updating the model parameters of the image recognition model according to gradient back propagation to perform model optimization so as to complete training. In the embodiment, the accuracy of recognizing the table line foreground image and the text foreground image by the image recognition model is improved by the setting.
In this embodiment, the image recognition model may be a Unet neural network model, including a 4-layer downsampled convolutional layer and a 4-layer upsampled convolutional layer. When model training is carried out, the model can be optimized by finishing iterative training for specified times to obtain a preset image recognition model, and the iterative training can also be used for realizing
Figure 544510DEST_PATH_IMAGE017
And optimizing the model to obtain a preset image recognition model when the preset value is reached.
In one embodiment, the table image sample and the corresponding table line foreground label map and text foreground label map are obtained by the following method: randomly setting a filling text for each cell in the table image; generating an initial table line foreground label graph according to the position of the edge line of the cell; generating an initial text foreground label image according to the position of the filled text; respectively carrying out horizontal direction alignment and vertical direction alignment on the filled texts, wherein the horizontal direction alignment and the vertical direction alignment comprise left alignment, middle alignment and right alignment; random blank filling is carried out on the area between the sideline of the cell and the filling text in the cell; randomly setting the edge line width of the cell and the proportion of the dotted line in the edge line to obtain a regular form image; decomposing the regular form image into a filling text foreground, a form line foreground, a cell inner background and a form outer background, and respectively carrying out random pixel value filling on the filling text foreground, the form line foreground, the cell inner background and the form outer background to form an initial form image sample; carrying out random perspective processing and rotation processing on the initial form image sample to obtain a final form image sample; and simultaneously respectively carrying out the same perspective processing and rotation processing on the initial text foreground label image and the initial table line foreground label image according to the perspective processing mode and the rotation processing mode adopted by the final table image sample to obtain the final text foreground label image and the final table line foreground label image. In the embodiment, through the steps, the form image is subjected to data enhancement, various conditions of a real form are simulated, the generation and labeling work of form image samples is reduced, a large number of training samples are obtained, and the accuracy of model training is improved. In addition, the regular form image is decomposed into four layers of a filling text foreground, a form line foreground, a cell inner background and a form outer background to carry out random pixel value filling, on one hand, the diversity of the appearance of the form image is ensured through the random pixel value filling, and the cell color filling in a real scene, the form line thickness and color depth difference caused by photographing, the character stroke thickness and color depth difference and the like are simulated; on the other hand, by dividing the image into four levels, when random pixel value filling is carried out on each level, the control on the random range of each level is realized, the table lines and characters are ensured to be visible, a certain contrast is kept between the table lines and the characters and the background, and the table lines and the characters are not hidden by the background and cannot be seen, so that the quality of the table image sample is ensured, and the accuracy of model training is improved.
In the present embodiment, the pad text includes, but is not limited to, chinese characters, letters, and numbers, the pad text may be provided as one line or multiple lines, and each line of text may be provided as one character or multiple characters. Decomposing the regular form image into a filling text foreground, a form line foreground, a cell inner background and a form outer background, respectively filling the filling text foreground, the form line foreground, the cell inner background and the form outer background with random pixel values, and then combining the filled filling text foreground, the form line foreground, the cell inner background and the form outer background to form an initial form image sample.
Step S102: and acquiring the table structure of the table image to be identified according to the table line foreground image.
In this embodiment, the table structure of the table image to be recognized may be obtained according to the table line foreground map according to the following steps:
step 11: and extracting the horizontal contour lines and the vertical contour lines in the foreground graph of the table lines.
In one embodiment, the step 11 specifically includes: carrying out corrosion and expansion treatment on the surface line foreground graph by using a preset transverse check to obtain a transverse contour line; corroding and expanding the surface line foreground graph by using a preset vertical check to obtain a vertical contour line; the preset horizontal kernels are matrixes of 1 row and N columns, matrix elements are all 1, and the preset vertical kernels are matrixes of N rows and 1 column, matrix elements are all 1.
Step 12: and acquiring the intersection point of the horizontal contour line and the vertical contour line and taking the intersection point as a candidate intersection point.
Step 13: judging whether other adjacent candidate intersection points exist on the transverse contour line corresponding to each candidate intersection point; if so, connecting each candidate intersection point and other adjacent candidate intersection points corresponding to each candidate intersection point to obtain a first candidate connecting line; if not, connecting each candidate intersection point and the first candidate intersection point arranged behind each candidate intersection point according to a preset candidate intersection point arrangement sequence, and selecting a connecting line with the horizontal angle smaller than or equal to a preset angle threshold value as a second candidate connecting line.
Step 14: calculating the coverage proportion of each candidate connecting line to be analyzed in the table line foreground graph, and selecting the candidate connecting line to be analyzed with the coverage proportion more than or equal to a preset coverage threshold value as an effective connecting line; the candidate connecting line to be analyzed is a first candidate connecting line or a second candidate connecting line, and the preset coverage threshold corresponding to the first candidate connecting line is smaller than the preset coverage threshold corresponding to the second candidate connecting line.
In one embodiment, the step of "calculating the coverage ratio of each candidate connecting line to be analyzed in the table line foreground map" in the step 14 specifically includes: acquiring the total number of pixels of a candidate connecting line to be analyzed passing through a table line foreground graph; acquiring the number of pixels belonging to the grid line in the table line foreground graph in the pixels through which the candidate connecting line to be analyzed passes on the table line foreground graph; and calculating the ratio of the number of the pixels belonging to the table line in the table line foreground graph to the total number of the pixels so as to obtain the coverage proportion of the candidate connecting line to be analyzed.
Step 15: and searching a cell path according to the effective connecting line to obtain a candidate cell.
In one embodiment, the step 15 specifically includes: sequentially searching for effective connecting lines from each effective connecting line according to a clockwise searching direction formed by horizontal-vertical-horizontal-vertical to obtain other corresponding effective connecting lines of each effective connecting line; and generating candidate cells according to each effective connecting line and other corresponding effective connecting lines.
Step 16: candidate cells having common vertices are connected to generate one or more table structures.
In the embodiment, by selecting a connection line with a horizontal angle smaller than or equal to a preset angle threshold as a second candidate connection line, candidate intersection points on a non-table structure can be prevented from being connected into a candidate connection line, and meanwhile, a connection error caused by a connection error of the candidate intersection points can be prevented; further screening whether the candidate connecting lines are connected wrongly or not by calculating the coverage proportion of each candidate connecting line to be analyzed in the table line foreground graph; whether other adjacent candidate intersection points exist on the transverse contour line corresponding to each candidate intersection point or not is judged firstly, and the connection is carried out to obtain a first candidate connection line, wherein the preset coverage threshold corresponding to the first candidate connection line is smaller than the preset coverage threshold corresponding to the second candidate connection line, and the connection line with a certain bending degree can be judged as an effective connection line, so that the bending form image can be processed, and the robustness of the bending form image identification is improved.
In this embodiment, the preset order of the candidate intersections may be arranged from left to right and from top to bottom, and it is first determined whether there are other candidate intersections adjacent to each candidate intersection on the corresponding transverse contour line of each candidate intersection, if so, each candidate intersection and the corresponding adjacent other candidate intersections are connected, and after once connection, each candidate intersection is not connected with other transverse intersections. The preset angle threshold value can be flexibly set by a person skilled in the art according to practical situations, for example, the preset angle threshold value can be 3 °, 4 ° or other values; in addition, a person skilled in the art may flexibly set the preset coverage threshold according to the actual situation, for example, the preset coverage threshold corresponding to the first candidate connection line may be 0.9, and the preset coverage threshold corresponding to the second candidate connection line may be 0.6, or of course, other values may be used as long as the preset coverage threshold corresponding to the first candidate connection line is smaller than the preset coverage threshold corresponding to the second candidate connection line.
In this embodiment, a specific embodiment of obtaining the table structure of the table image to be recognized may be: as shown in fig. 4, fig. 4 shows an obtained table line foreground map, first, a preset horizontal check is used to perform erosion and expansion processing on the table line foreground map (fig. 4) to obtain a horizontal contour line as shown in fig. 5, and a preset vertical check is used to perform erosion and expansion processing on the table line foreground map (fig. 4) to obtain a vertical contour line as shown in fig. 6, the erosion and expansion being conventional methods in image morphology processing, the erosion being an operation of finding a local minimum, and the erosion operation gradually reducing a highlight area in an image; dilation is an operation of local maxima, and erosion gradually increases the highlights in the image. Then, the intersection of the horizontal contour line and the vertical contour line is acquired and taken as a candidate intersection, and as shown in fig. 7, the points in fig. 7 represent the candidate intersections. Then, candidate intersection points are connected to obtain candidate connecting lines, as shown in fig. 8, circles in fig. 8 represent candidate intersection points, the candidate intersection points are connected in an arrangement sequence from left to right and from top to bottom, for example, point a exists on a transverse contour line and is adjacent to point B, point a is connected to point B, similarly, point B is connected to point C, point C is connected to point D, point D does not exist on a transverse contour line and is not connected to the adjacent point, and is not connected transversely, point a is returned to point a, the first candidate intersection point arranged after point a is point E, point a is connected to point E, and so on, all candidate intersection points are connected to obtain candidate connecting lines. Then, whether the candidate connecting line is an effective connecting line is judged, as shown in fig. 9, points R, S and T in fig. 9 respectively represent candidate intersection points, if the points R and S are connected, the horizontal angle of the line segment RS is 0, and all pixels passed by the line segment RS on the table line foreground diagram belong to pixels (white points in the diagram) of the table line in the table line foreground diagram, that is, the line segment RS is an effective connecting line, if the points R and T are connected, the horizontal angle of the line segment RS is greater than a preset angle threshold value of 5 °, that is, the line segment RT is not an effective connecting line, and the ratio of the number of pixels (white points in the diagram) belonging to the table line in the table line foreground diagram to the total number of pixels of the line segment RS passed by the table line foreground diagram to the total number of pixels is less than a preset coverage threshold value of 0.9. Then, searching a cell path according to an effective connecting line to obtain a candidate cell, wherein the specific searching process is shown in fig. 11, the candidate cell is divided into a closed cell and a non-closed cell, the searching start points of the closed cell are the same intersection point, the searching start points of the non-closed cell are end points, the intersection point a in the figure is used as a starting point for searching, the effective connecting line searching is sequentially carried out according to a clockwise searching direction formed by horizontal-vertical-horizontal-vertical direction to obtain the closed cell A, the intersection point b in the figure is used as a starting point for searching, the effective connecting line searching is sequentially carried out according to the clockwise searching direction formed by horizontal-vertical-horizontal direction to obtain the non-closed cell E, the intersection point c in the figure is used as a starting point for searching, and the searching is stopped because no effective connecting line exists in the horizontal direction, and by analogy, generating the candidate cells according to each effective connecting line and other corresponding effective connecting lines. As shown in fig. 10, the specific search directions may be that the horizontal effective links are marked as 0 and 2, the vertical effective links are marked as 1 and 3, the cell path search is performed clockwise according to the sequence of 0 → 1 → 2 → 3 to obtain candidate cells, when searching is performed in each direction, the next effective link has two choices of straight line and right turn, and the principle of connection selection is that right turn connection meeting direction transition preferentially selects right turn, so as to generate the smallest cell structure. Finally, candidate cells having common vertices are connected to generate one or more table structures.
Through the steps 11-16, the table structure can be identified, but in practical application, the table is often searched according to the index to accurately obtain the content to be searched, so that after the table structure is identified, the index can be added, which is convenient for locating and searching the cells, and is also convenient for corresponding and storing the subsequent text and the cells.
In this embodiment, the table index of each table structure may be generated according to the following steps:
and step 17: the horizontal edges of the candidate cells having a common vertex in the table structure are connected to form a horizontal line candidate.
Step 18: the longitudinal edges of candidate cells having a common vertex in the table structure are connected to form a longitudinal line candidate.
Step 19: and acquiring candidate transverse lines to be processed which can be extended and overlapped with other candidate transverse lines, and extending and combining the candidate transverse lines to be processed and the other candidate transverse lines to form combined candidate transverse lines.
Step 20: and acquiring candidate vertical lines to be processed which can be extended and overlapped with other candidate vertical lines, and extending and combining the candidate vertical lines to be processed and the other candidate vertical lines to form combined candidate vertical lines.
Step 21: and respectively distributing transverse index numbers to the merged candidate transverse lines and the other candidate transverse lines which are not merged in an extending way according to a preset transverse line arrangement sequence.
Step 22: and respectively distributing longitudinal index numbers to the merged candidate longitudinal lines and other candidate longitudinal lines which are not merged in an extending way according to a preset longitudinal line arrangement sequence.
And generating a table index of the table structure according to the transverse index number and the longitudinal index number.
In this embodiment, a specific embodiment of generating the table index may be: as shown in fig. 12, the candidate transverse lines a, b and c to be processed can be overlapped in an extending manner, the candidate transverse lines a, b and c to be processed are extended and merged to form merged candidate transverse lines, and transverse indexes E1, E2, E3, E4 and E5 are respectively assigned to the merged candidate transverse lines and other non-extended merged candidate transverse lines according to the arrangement sequence of the transverse lines from top to bottom; similarly, according to the longitudinal line arrangement order from left to right, longitudinal index numbers F1, F2, F3, F4, F5 and F6 are respectively allocated to the longitudinal line candidates to generate a table index of a table structure, and then the row start index of the cell (r) in the drawing is E2, the row end index is E4, the column start index is F2, and the column end index is F3.
Step S103: performing text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized; acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the positions of the cells in the table structure; and acquiring a final text line position associated with the cell according to the position of the cell, the position of the first text line and the position of the second text line.
In this embodiment, the final text line position associated with the cell may be obtained as follows:
step 31: and acquiring a text line detection frame and a first text line position corresponding to the text line in the form image to be recognized according to the result of text line detection on the form image to be recognized.
Step 32: acquiring overlapping detection frames in the text line detection frames, which have overlapping areas with the cells, according to the positions of the cells, and calculating the overlapping proportion of the cells and each overlapping detection frame;
step 33: and selecting an overlapping detection frame with the overlapping proportion more than or equal to a preset overlapping threshold value, and taking a first text line position corresponding to the overlapping detection frame as a final text line position associated with the cell.
In the present embodiment, the method for text line detection on the form image to be recognized includes, but is not limited to, pixellink algorithm, seglink algorithm, and pixelachor algorithm. The target of the text line detection is to detect the position of a text line, and the text line detection box may be represented by a rectangle, as shown in fig. 13, the rectangle in the drawing is the text line detection box, if an overlap detection box in an overlap region with a cell in the text line detection box exists and the overlap ratio is greater than or equal to a preset overlap threshold value, it is indicated that the cell has the text line, and the detection position corresponding to the overlap detection box is taken as the text line position associated with the cell.
Through the steps 31-33, the final text line position associated with the cell can be obtained, but in practical application, omission may occur in the process of detecting the text line of the form image to be recognized, so that after the final text line position associated with the cell is obtained, the omission and the omission of the text line can be checked and repaired, and the text line omission is avoided.
In one embodiment, after the step of "taking the first text line position corresponding to the overlap detection box as the final text line position associated with the cell", the method further comprises: judging whether a cell at the position of the unassociated text line exists or not; when a cell which is not associated with the text line position exists, acquiring a second text line position corresponding to the cell; respectively carrying out horizontal projection and vertical projection on the position of the second text line to obtain a horizontal projection line segment in the horizontal direction and a vertical projection line segment in the vertical direction; judging whether the sum of the lengths of the horizontal projection line segment and the vertical projection line segment is greater than or equal to a length threshold value or not; if so, the second text line position is taken as the final text line position associated with the cell.
In this embodiment, as shown in fig. 14, fig. 14 shows a text foreground diagram, where a square in the diagram represents a text line position of a corresponding cell, because the square has noise interference, the square may be horizontally projected to obtain a horizontal projection line segment in a horizontal direction, and the square may be vertically projected to obtain a vertical projection line segment in a vertical direction, if a sum of lengths of the horizontal projection line segment and the vertical projection line segment is greater than or equal to a length threshold, it is determined that the square is not noise, and a text line exists in a corresponding area, thereby avoiding noise interference.
Step S104: and acquiring a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, performing text recognition on the text line image, and storing recognized text information into the cell to form a recognized table.
In the present embodiment, the text line image may be subjected to text Recognition by an OCR (Optical Character Recognition) method.
In this embodiment, the text line image corresponding to the associated cell may be acquired from the form image to be recognized according to the following steps:
step 41: acquiring a text line image corresponding to the final text line position in the form image to be identified; if the final text line position is associated with a cell, go to step 42; if the final text line position is associated with multiple cells, go to step 43.
Step 42: and directly taking the text line image as the text line image corresponding to the cell.
Step 43: and segmenting the text line image according to the position of each cell and the final text line position to obtain the text line image segment corresponding to each cell.
In the embodiment, the text line image is segmented according to the position of each cell and the final text line position, so that each cell is associated with the corresponding text line image segment, and the storage error of the text line position is avoided.
In the present embodiment, a part of the text line detection box in fig. 13 spans two cells, that is, two cells are associated with corresponding text line positions, and as shown in fig. 15, fig. 15 is a partially enlarged view of fig. 13, where a thin-line rectangular box represents a text line detection box representing a text line position, and a text line position where a numeral 58 is located is associated with two cells, a text line image is segmented according to a position of each cell and the text line position, and a thick-line rectangular box represents a segmented text line position, so that each cell is associated with its corresponding text line image segment.
In the embodiment of the invention, a table line foreground image and a text foreground image of a table image to be recognized are obtained by adopting a preset image recognition model; acquiring a table structure of a table image to be identified according to the table line foreground image; performing text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized; acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the positions of the cells in the table structure; acquiring a final text line position associated with the cell according to the position of the cell, the position of the first text line and the position of the second text line; and acquiring a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, performing text recognition on the text line image, and storing recognized text information into the cell to form a recognized table. When the table structure in the table image to be recognized is obtained, the table structure of the table image to be recognized can be reconstructed according to the position information of the table lines in the table line foreground graph, even if the low-quality image has the problems of table line breakage, bending and the like, the table lines can be connected again according to the position information of the table lines in the table line foreground graph, and a regular table structure is constructed, namely, the correction of the table lines having the problems of breakage, bending and the like is realized. When the text information stored in each cell in the table structure is obtained, the text line position corresponding to the cell can be obtained according to the text line position (first text line position) of the text line in the table image to be recognized and the position of the cell, and then the relevant image is obtained from the table image to be recognized according to the text line position for text recognition, wherein the recognized text information is the information to be stored in the cell. However, in practical applications, the fonts of a part of text in the form image to be recognized may be relatively small, and when the text line detection is performed on the form image to be recognized, the text lines with relatively small fonts may be omitted. In contrast, the position comparison can be performed according to the text line position (second text line position) of each text line in the image to be recognized stored in the text foreground image and the position of the cell, so as to obtain a text line position that is the same as or close to the position of the cell, and according to the text line position (final text line position), the relevant image is obtained from the form image to be recognized for text recognition, and the recognized text information is the information to be stored in the cell. By the method, text missing recognition in the image to be recognized can be prevented, or recognized text information is stored in wrong cells, so that the text recognition accuracy of the form image to be recognized is improved.
It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.
Furthermore, the invention also provides a table identification device.
Referring to fig. 16, fig. 16 is a main structural block diagram of a table identifying apparatus according to an embodiment of the present invention. As shown in fig. 16, the table identifying apparatus in the embodiment of the present invention mainly includes a foreground diagram obtaining module 11, a table structure obtaining module 12, a text line associating module 13, and a table generating module 14. In some embodiments, one or more of the foreground map acquisition module 11, the table structure acquisition module 12, the text line association module 13, and the table generation module 14 may be combined together into one module. In some embodiments, the foreground map obtaining module 11 may be configured to obtain the form line foreground map and the text foreground map of the form image to be recognized by using a preset image recognition model. The table structure obtaining module 12 may be configured to obtain the table structure of the table image to be identified according to the table line foreground map. The text line association module 13 may be configured to perform text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized; acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the positions of the cells in the table structure; and acquiring a final text line position associated with the cell according to the position of the cell, the position of the first text line and the position of the second text line. The table generating module 14 may be configured to obtain a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, perform text recognition on the text line image, and store the recognized text information into the cell to form a recognized table. In one embodiment, the description of the specific implementation function may refer to steps S101 to S104.
In one embodiment, the foreground map obtaining module 11 is further configured to perform the following operations: carrying out position recognition on form lines and text lines on a form image to be recognized by adopting a preset image recognition model, and acquiring a form line foreground image and a text foreground image according to a position recognition result; the preset image recognition model is obtained by training based on the table image sample and the corresponding table line foreground label graph and text foreground label graph; the table line foreground label graph and the table image sample have the same size, and the label value stored at the position of each pixel point in the table line foreground label graph depends on whether a table line exists at the corresponding position in the table image sample; the text foreground label image and the form image sample have the same size, and the label value stored in the position of each pixel point in the text foreground label image depends on whether a text line exists at the corresponding position in the form image sample. In one embodiment, the description of the specific implementation function may be referred to in step S101.
In one embodiment, the foreground map obtaining module 11 is further configured to perform the following operations: the preset image recognition model is obtained by training in the following way: using the loss function shown in equation (1)
Figure 316157DEST_PATH_IMAGE018
Calculating a loss value of the image recognition model according to the table image sample and the corresponding table line foreground label image and text foreground label image; and calculating the gradient corresponding to each model parameter in the image recognition model according to the loss value, and updating the model parameters of the image recognition model according to gradient back propagation to perform model optimization so as to complete training. In one embodiment, the description of the specific implementation function may be referred to in step S101.
In one embodiment, the foreground map obtaining module 11 is further configured to perform the following operations: the table image sample and the corresponding table line foreground label image and text foreground label image are obtained by the following method: randomly setting a filling text for each cell in the table image; generating an initial table line foreground label graph according to the position of the edge line of the cell; generating an initial text foreground label image according to the position of the filled text; respectively carrying out horizontal direction alignment and vertical direction alignment on the filled texts, wherein the horizontal direction alignment and the vertical direction alignment comprise left alignment, middle alignment and right alignment; random blank filling is carried out on the area between the sideline of the cell and the filling text in the cell; randomly setting the edge line width of the cell and the proportion of the dotted line in the edge line to obtain a regular form image; decomposing the regular form image into a filling text foreground, a form line foreground, a cell inner background and a form outer background, and respectively carrying out random pixel value filling on the filling text foreground, the form line foreground, the cell inner background and the form outer background to form an initial form image sample; carrying out random perspective processing and rotation processing on the initial form image sample to obtain a final form image sample; and simultaneously respectively carrying out the same perspective processing and rotation processing on the initial text foreground label image and the initial table line foreground label image according to the perspective processing mode and the rotation processing mode adopted by the final table image sample to obtain the final text foreground label image and the final table line foreground label image. In one embodiment, the description of the specific implementation function may be referred to in step S101.
In one embodiment, the table structure acquisition module 12 is further configured to perform the following operations: extracting a horizontal contour line and a vertical contour line in the table line foreground graph; acquiring an intersection point of the horizontal contour line and the vertical contour line and taking the intersection point as a candidate intersection point; judging whether other adjacent candidate intersection points exist on the transverse contour line corresponding to each candidate intersection point; if so, connecting each candidate intersection point and other adjacent candidate intersection points corresponding to each candidate intersection point to obtain a first candidate connecting line; if not, connecting each candidate intersection point and a first candidate intersection point arranged behind each candidate intersection point according to a preset candidate intersection point arrangement sequence, and selecting a connecting line with a horizontal angle smaller than or equal to a preset angle threshold value as a second candidate connecting line; calculating the coverage proportion of each candidate connecting line to be analyzed in the table line foreground graph, and selecting the candidate connecting line to be analyzed with the coverage proportion more than or equal to a preset coverage threshold value as an effective connecting line; searching a cell path according to the effective connecting line to obtain a candidate cell; concatenating the candidate cells having the common vertices to generate one or more table structures; the candidate connecting line to be analyzed is a first candidate connecting line or a second candidate connecting line, and the preset coverage threshold corresponding to the first candidate connecting line is smaller than the preset coverage threshold corresponding to the second candidate connecting line. In one embodiment, the description of the specific implementation function may be referred to in step S102.
In one embodiment, the table structure acquisition module 12 is further configured to perform the following operations: carrying out corrosion and expansion treatment on the surface line foreground graph by using a preset transverse check to obtain a transverse contour line; corroding and expanding the surface line foreground graph by using a preset vertical check to obtain a vertical contour line; the preset horizontal kernels are matrixes of 1 row and N columns, matrix elements are all 1, and the preset vertical kernels are matrixes of N rows and 1 column, matrix elements are all 1; and/or the step of "calculating the coverage ratio of each candidate connecting line to be analyzed in the table line foreground image" specifically includes: acquiring the total number of pixels of a candidate connecting line to be analyzed passing through a table line foreground graph; acquiring the number of pixels belonging to the grid line in the table line foreground graph in the pixels through which the candidate connecting line to be analyzed passes on the table line foreground graph; calculating the ratio of the number of pixels belonging to the table line in the table line foreground graph to the total number of the pixels to obtain the coverage ratio of the candidate connecting line to be analyzed; and/or the step of performing cell path search according to the effective connecting line to obtain the candidate cell specifically includes: sequentially searching for effective connecting lines from each effective connecting line according to a clockwise searching direction formed by horizontal-vertical-horizontal-vertical to obtain other corresponding effective connecting lines of each effective connecting line; and generating candidate cells according to each effective connecting line and other corresponding effective connecting lines. In one embodiment, the description of the specific implementation function may be referred to in step S102.
In one embodiment, after the step of "connecting candidate cells having a common vertex to generate one or more table structures", the table structure acquisition module 12 is further configured to perform the following operations: connecting the horizontal edges of the candidate cells with the common vertex in the table structure to form a candidate horizontal line; connecting longitudinal edges of candidate cells having a common vertex in the table structure to form a candidate longitudinal line; acquiring candidate transverse lines to be processed which can be extended and overlapped with other candidate transverse lines, and extending and combining the candidate transverse lines to be processed and the other candidate transverse lines to form combined candidate transverse lines; acquiring candidate vertical lines to be processed which can be extended and overlapped with other candidate vertical lines, and extending and combining the candidate vertical lines to be processed and the other candidate vertical lines to form combined candidate vertical lines; distributing horizontal index numbers for the merged candidate horizontal lines and other candidate horizontal lines which are not merged in an extending way according to a preset horizontal line arrangement sequence; respectively allocating longitudinal index numbers to the merged candidate longitudinal lines and other candidate longitudinal lines which are not merged in an extending way according to a preset longitudinal line arrangement sequence; and generating a table index of the table structure according to the transverse index number and the longitudinal index number. In one embodiment, the description of the specific implementation function may be referred to in step S102.
In one embodiment, the text line association module 13 is further configured to perform the following operations: according to the result of text line detection on the form image to be recognized, acquiring a text line detection frame and a first text line position corresponding to a text line in the form image to be recognized; acquiring overlapping detection frames in the text line detection frames, which have overlapping areas with the cells, according to the positions of the cells, and calculating the overlapping proportion of the cells and each overlapping detection frame; and selecting an overlapping detection frame with the overlapping proportion more than or equal to a preset overlapping threshold value, and taking a first text line position corresponding to the overlapping detection frame as a final text line position associated with the cell. In one embodiment, the description of the specific implementation function may refer to that in step S103.
In one embodiment, after the step of "taking the first text line position corresponding to the overlap detection box as the final text line position associated with the cell", the text line association module 13 is further configured to perform the following operations: judging whether a cell at the position of the unassociated text line exists or not; when a cell which is not associated with the text line position exists, acquiring a second text line position corresponding to the cell; respectively carrying out horizontal projection and vertical projection on the position of the second text line to obtain a horizontal projection line segment in the horizontal direction and a vertical projection line segment in the vertical direction; judging whether the sum of the lengths of the horizontal projection line segment and the vertical projection line segment is greater than or equal to a length threshold value or not; if so, the second text line position is taken as the final text line position associated with the cell. In one embodiment, the description of the specific implementation function may refer to that in step S103.
In one embodiment, table generation module 14 is further configured to perform the following operations: acquiring a text line image corresponding to the final text line position in the form image to be identified; if the final text line position is associated with a cell, directly taking the text line image as the text line image corresponding to the cell; and if the final text line position is associated with a plurality of cells, segmenting the text line image according to the position of each cell and the final text line position to obtain a text line image segment corresponding to each cell. In one embodiment, the description of the specific implementation function may refer to that in step S104.
The above-mentioned table identifying device is used for executing the embodiment of the table identifying method shown in fig. 1, and the technical principles, the solved technical problems and the generated technical effects of the two are similar, and it can be clearly understood by those skilled in the art that for convenience and simplicity of description, the specific working process and related descriptions of the table identifying device may refer to the contents described in the embodiment of the table identifying method, and no further description is given here.
It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
Furthermore, the invention also provides a table identification device. In an embodiment of the table identifying apparatus according to the present invention, the table identifying apparatus comprises a processor and a storage device, the storage device may be configured to store a program for executing the table identifying method of the above-mentioned method embodiment, and the processor may be configured to execute the program in the storage device, the program including but not limited to the program for executing the table identifying method of the above-mentioned method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The table identifying means may be a control device apparatus formed including various electronic devices.
Further, the invention also provides a computer readable storage medium. In one computer-readable storage medium embodiment according to the present invention, a computer-readable storage medium may be configured to store a program that executes the table identification method of the above-described method embodiment, and the program may be loaded and executed by a processor to implement the above-described table identification method. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the storage in the embodiment of the present invention is a non-transitory computer readable storage medium.
Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.
So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (22)

1. A method of form recognition, the method comprising:
acquiring a table line foreground image and a text foreground image of a table image to be identified by adopting a preset image identification model;
acquiring a table structure of the table image to be identified according to the table line foreground image;
performing text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized;
acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the position of a cell in the table structure;
acquiring a final text line position associated with the cell according to the position of the cell, the first text line position and the second text line position;
and acquiring a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, performing text recognition on the text line image, and storing recognized text information into the cell to form a recognized table.
2. The form recognition method according to claim 1, wherein the step of "obtaining a form line foreground map and a text foreground map of the form image to be recognized" specifically comprises:
adopting the preset image recognition model to carry out position recognition on form lines and text lines on the form image to be recognized, and acquiring a form line foreground image and a text foreground image according to a position recognition result;
the preset image recognition model is obtained by training based on a table image sample and a corresponding table line foreground label graph and a corresponding text foreground label graph;
the table line foreground label graph and the table image sample have the same size, and the label value stored in the position of each pixel point in the table line foreground label graph depends on whether a table line exists at the corresponding position in the table image sample;
the text foreground label image and the form image sample have the same size, and the label value stored in the position of each pixel point in the text foreground label image depends on whether a text line exists at the corresponding position in the form image sample.
3. The form recognition method of claim 2, wherein the preset image recognition model is trained by:
using the loss function shown below
Figure 827007DEST_PATH_IMAGE002
Calculating a loss value of the image recognition model according to the table image sample and the corresponding table line foreground label graph and text foreground label graph;
Figure 178354DEST_PATH_IMAGE004
wherein, thenRepresents the number of table image samples,
Figure 124313DEST_PATH_IMAGE006
(ii) a The above-mentionedhAndwrespectively representing the height and width of the table image sample,
Figure 535703DEST_PATH_IMAGE008
Figure 317845DEST_PATH_IMAGE010
(ii) a The above-mentioned
Figure 66358DEST_PATH_IMAGE012
Is shown according tonThe table line foreground label graph and the text foreground label graph corresponding to each table image sample are determined at the second stepnPosition in a form image sample
Figure 7770DEST_PATH_IMAGE014
The tag value of (d); the above-mentioned
Figure 698383DEST_PATH_IMAGE016
Representing the output of the image recognition model at the second stagenPosition in a form image sample
Figure 92455DEST_PATH_IMAGE014
The tag predicted value of (c);
and calculating the gradient corresponding to each model parameter in the image recognition model according to the loss value, and updating the model parameters of the image recognition model according to the gradient back propagation to perform model optimization so as to complete training.
4. The form recognition method of claim 2, wherein the form image sample and the corresponding form line foreground label map and text foreground label map are obtained by:
randomly setting a filling text for each cell in the table image;
generating an initial table line foreground label graph according to the position of the edge line of the cell; generating an initial text foreground label image according to the position of the filling text;
respectively carrying out horizontal direction alignment and vertical direction alignment on the filled texts, wherein the horizontal direction alignment and the vertical direction alignment respectively comprise left alignment, middle alignment and right alignment;
carrying out random blank filling on the area between the edge line of the cell and the filling text in the cell;
randomly setting the edge line width of the cell and the proportion of the dotted line in the edge line to obtain a regular form image;
decomposing the regular form image into a filling text foreground, a form line foreground, a cell inner background and a form outer background, and respectively carrying out random pixel value filling on the filling text foreground, the form line foreground, the cell inner background and the form outer background to form an initial form image sample;
carrying out random perspective processing and rotation processing on the initial form image sample to obtain a final form image sample;
and simultaneously respectively carrying out the same perspective processing and rotation processing on the initial text foreground label image and the initial table line foreground label image according to the perspective processing mode and the rotation processing mode adopted by the final table image sample to obtain the final text foreground label image and the final table line foreground label image.
5. The form recognition method of claim 1, wherein the step of obtaining the final text line position associated with the cell specifically comprises:
according to the result of text line detection on the form image to be recognized, acquiring a text line detection frame and a first text line position corresponding to a text line in the form image to be recognized;
according to the positions of the cells, acquiring overlapping detection boxes of the text line detection boxes, which have overlapping areas with the cells, and calculating the overlapping proportion of the cells and each overlapping detection box;
and selecting the overlapping detection frame with the overlapping proportion more than or equal to a preset overlapping threshold value, and taking the first text line position corresponding to the overlapping detection frame as the final text line position associated with the cell.
6. The form recognition method of claim 5, wherein after the step of "taking the first text line position corresponding to the overlap detection box as the final text line position associated with the cell", the method further comprises:
judging whether a cell at the position of the unassociated text line exists or not;
when a cell which is not associated with a text line position exists, acquiring a second text line position corresponding to the cell;
respectively carrying out horizontal projection and vertical projection on the second text line position to obtain a horizontal projection line segment in the horizontal direction and a vertical projection line segment in the vertical direction;
judging whether the sum of the lengths of the horizontal projection line segment and the vertical projection line segment is greater than or equal to a length threshold value;
and if so, taking the second text line position as the final text line position associated with the cell.
7. The form recognition method of claim 1, wherein the step of obtaining the text line image corresponding to the associated cell from the form image to be recognized according to the final text line position specifically comprises:
acquiring a text line image corresponding to the final text line position in the form image to be identified;
if the final text line position is associated with a cell, directly taking the text line image as a text line image corresponding to the cell;
and if the final text line position is associated with a plurality of cells, segmenting the text line image according to the position of each cell and the final text line position to obtain a text line image segment corresponding to each cell.
8. The form recognition method according to claim 1, wherein the step of obtaining the form structure of the form image to be recognized according to the form line foreground map specifically comprises:
extracting a horizontal contour line and a vertical contour line in the table line foreground graph;
acquiring the intersection point of the transverse contour line and the vertical contour line and taking the intersection point as a candidate intersection point;
connecting the candidate intersection points according to an arrangement sequence from left to right and from top to bottom, and judging whether other candidate intersection points adjacent to the right side of each candidate intersection point exist on the transverse contour line corresponding to each candidate intersection point;
if so, connecting each candidate intersection point and the adjacent other candidate intersection points corresponding to the candidate intersection points to obtain a first candidate connecting line;
if not, connecting each candidate intersection point and a first candidate intersection point arranged behind each candidate intersection point according to a preset candidate intersection point arrangement sequence, and selecting a connecting line with a horizontal angle smaller than or equal to a preset angle threshold value as a second candidate connecting line;
calculating the coverage proportion of each candidate connecting line to be analyzed in the table line foreground graph, and selecting the candidate connecting line to be analyzed with the coverage proportion more than or equal to a preset coverage threshold value as an effective connecting line;
searching a cell path according to the effective connecting line to obtain a candidate cell;
concatenating the candidate cells having the common vertices to generate one or more table structures;
the candidate connecting line to be analyzed is a first candidate connecting line or a second candidate connecting line, and the preset coverage threshold corresponding to the first candidate connecting line is smaller than the preset coverage threshold corresponding to the second candidate connecting line.
9. The form recognition method of claim 8, wherein the step of extracting the horizontal contour lines and the vertical contour lines in the form line foreground map specifically comprises:
carrying out corrosion and expansion treatment on the surface line foreground graph by using a preset transverse check to obtain the transverse contour line;
corroding and expanding the table line foreground graph by using a preset vertical check to obtain the vertical contour line;
the preset horizontal kernels are matrixes of 1 row and N columns, matrix elements are all 1, the preset vertical kernels are matrixes of N rows and 1 column, and matrix elements are all 1;
and/or the like and/or,
the step of "calculating the coverage ratio of each candidate connecting line to be analyzed in the table line foreground map" specifically includes:
acquiring the total number of pixels of the candidate connecting line to be analyzed passing through the table line foreground graph;
acquiring the number of pixels belonging to the table line in the table line foreground graph from the pixels of the candidate connecting line to be analyzed passing through the table line foreground graph;
calculating the ratio of the number of the pixels belonging to the table line in the table line foreground graph to the total number of the pixels to obtain the coverage proportion of the candidate connecting line to be analyzed;
and/or the like and/or,
the step of performing cell path search according to the effective connection line to obtain the candidate cell specifically includes:
sequentially searching for effective connecting lines from each effective connecting line according to a clockwise searching direction formed by horizontal-vertical-horizontal-vertical to obtain other corresponding effective connecting lines of each effective connecting line;
and generating candidate cells according to each effective connecting line and the other corresponding effective connecting lines.
10. The form recognition method of claim 8, wherein after the step of concatenating candidate cells having a common vertex to generate one or more form structures, the method further comprises generating a form index for each of the form structures in the following manner:
connecting the horizontal edges of the candidate cells with the common vertex in the table structure to form a candidate horizontal line;
connecting longitudinal edges of candidate cells having a common vertex in the table structure to form a candidate longitudinal line;
acquiring a candidate transverse line to be processed which can be extended and overlapped with other candidate transverse lines, and extending and combining the candidate transverse line to be processed and the other candidate transverse lines to form a combined candidate transverse line;
acquiring candidate vertical lines to be processed which can be extended and overlapped with other candidate vertical lines, and extending and combining the candidate vertical lines to be processed and the other candidate vertical lines to form combined candidate vertical lines;
distributing horizontal index numbers to the merged candidate horizontal lines and other candidate horizontal lines which are not merged in the extending way according to a preset horizontal line arrangement sequence;
respectively allocating longitudinal index numbers to the merged candidate longitudinal lines and other candidate longitudinal lines which are not merged in the extending way according to a preset longitudinal line arrangement sequence;
and generating a table index of a table structure according to the transverse index number and the longitudinal index number.
11. A form recognition apparatus, the apparatus comprising:
the foreground image acquisition module is configured to acquire a table line foreground image and a text foreground image of a table image to be recognized by adopting a preset image recognition model;
a table structure obtaining module configured to obtain a table structure of the table image to be identified according to the table line foreground map;
a text line association module configured to perform text line detection on the form image to be recognized to obtain a first text line position of a text line in the form image to be recognized; acquiring a second text line position of a text line in the table image to be identified, which is stored at a corresponding position in the text foreground image, according to the position of a cell in the table structure; acquiring a final text line position associated with the cell according to the position of the cell, the first text line position and the second text line position;
and the table generating module is configured to acquire a text line image corresponding to the associated cell from the table image to be recognized according to the final text line position, perform text recognition on the text line image and store the recognized text information into the cell to form a recognized table.
12. The form recognition apparatus of claim 11, wherein the foreground map acquisition module is further configured to:
adopting the preset image recognition model to carry out position recognition on form lines and text lines on the form image to be recognized, and acquiring a form line foreground image and a text foreground image according to a position recognition result;
the preset image recognition model is obtained by training based on a table image sample and a corresponding table line foreground label graph and a corresponding text foreground label graph;
the table line foreground label graph and the table image sample have the same size, and the label value stored in the position of each pixel point in the table line foreground label graph depends on whether a table line exists at the corresponding position in the table image sample;
the text foreground label image and the form image sample have the same size, and the label value stored in the position of each pixel point in the text foreground label image depends on whether a text line exists at the corresponding position in the form image sample.
13. The form recognition apparatus of claim 12, wherein the foreground map acquisition module is further configured to:
the preset image recognition model is obtained by training in the following way:
using the loss function shown below
Figure 910238DEST_PATH_IMAGE002
Calculating a loss value of the image recognition model according to the table image sample and the corresponding table line foreground label graph and text foreground label graph;
Figure 706156DEST_PATH_IMAGE017
wherein, thenRepresents the number of table image samples,
Figure 334715DEST_PATH_IMAGE006
(ii) a The above-mentionedhAndwrespectively representing the height and width of the table image sample,
Figure 216083DEST_PATH_IMAGE008
Figure 571978DEST_PATH_IMAGE010
(ii) a The above-mentioned
Figure 222402DEST_PATH_IMAGE012
Is shown according tonThe table line foreground label graph and the text foreground label graph corresponding to each table image sample are determined at the second stepnPosition in a form image sample
Figure 412075DEST_PATH_IMAGE014
The tag value of (d); the above-mentioned
Figure 154641DEST_PATH_IMAGE016
Representing the output of the image recognition model at the second stagenPosition in a form image sample
Figure 189593DEST_PATH_IMAGE014
The tag predicted value of (c);
and calculating the gradient corresponding to each model parameter in the image recognition model according to the loss value, and updating the model parameters of the image recognition model according to the gradient back propagation to perform model optimization so as to complete training.
14. The form recognition apparatus of claim 12, wherein the foreground map acquisition module is further configured to:
the table image sample and the corresponding table line foreground label image and text foreground label image are obtained by the following method:
randomly setting a filling text for each cell in the table image;
generating an initial table line foreground label graph according to the position of the edge line of the cell; generating an initial text foreground label image according to the position of the filling text;
respectively carrying out horizontal direction alignment and vertical direction alignment on the filled texts, wherein the horizontal direction alignment and the vertical direction alignment respectively comprise left alignment, middle alignment and right alignment;
carrying out random blank filling on the area between the edge line of the cell and the filling text in the cell;
randomly setting the edge line width of the cell and the proportion of the dotted line in the edge line to obtain a regular form image;
decomposing the regular form image into a filling text foreground, a form line foreground, a cell inner background and a form outer background, and respectively carrying out random pixel value filling on the filling text foreground, the form line foreground, the cell inner background and the form outer background to form an initial form image sample;
carrying out random perspective processing and rotation processing on the initial form image sample to obtain a final form image sample;
and simultaneously respectively carrying out the same perspective processing and rotation processing on the initial text foreground label image and the initial table line foreground label image according to the perspective processing mode and the rotation processing mode adopted by the final table image sample to obtain the final text foreground label image and the final table line foreground label image.
15. The form recognition apparatus of claim 11, wherein the text line association module is further configured to:
according to the result of text line detection on the form image to be recognized, acquiring a text line detection frame and a first text line position corresponding to a text line in the form image to be recognized;
according to the positions of the cells, acquiring overlapping detection boxes of the text line detection boxes, which have overlapping areas with the cells, and calculating the overlapping proportion of the cells and each overlapping detection box;
and selecting the overlapping detection frame with the overlapping proportion more than or equal to a preset overlapping threshold value, and taking the first text line position corresponding to the overlapping detection frame as the final text line position associated with the cell.
16. The form recognition device of claim 15, wherein after the step of "taking the first text line position corresponding to the overlap detection box as the final text line position associated with the cell", the text line association module is further configured to:
judging whether a cell at the position of the unassociated text line exists or not;
when a cell which is not associated with a text line position exists, acquiring a second text line position corresponding to the cell;
respectively carrying out horizontal projection and vertical projection on the second text line position to obtain a horizontal projection line segment in the horizontal direction and a vertical projection line segment in the vertical direction;
judging whether the sum of the lengths of the horizontal projection line segment and the vertical projection line segment is greater than or equal to a length threshold value;
and if so, taking the second text line position as the final text line position associated with the cell.
17. The form recognition apparatus of claim 11, wherein the form generation module is further configured to:
acquiring a text line image corresponding to the final text line position in the form image to be identified;
if the final text line position is associated with a cell, directly taking the text line image as a text line image corresponding to the cell;
and if the final text line position is associated with a plurality of cells, segmenting the text line image according to the position of each cell and the final text line position to obtain a text line image segment corresponding to each cell.
18. The form recognition apparatus of claim 11, wherein the form structure acquisition module is further configured to:
extracting a horizontal contour line and a vertical contour line in the table line foreground graph;
acquiring the intersection point of the transverse contour line and the vertical contour line and taking the intersection point as a candidate intersection point;
connecting the candidate intersection points according to an arrangement sequence from left to right and from top to bottom, and judging whether other candidate intersection points adjacent to the right side of each candidate intersection point exist on the transverse contour line corresponding to each candidate intersection point;
if so, connecting each candidate intersection point and the adjacent other candidate intersection points corresponding to the candidate intersection points to obtain a first candidate connecting line;
if not, connecting each candidate intersection point and a first candidate intersection point arranged behind each candidate intersection point according to a preset candidate intersection point arrangement sequence, and selecting a connecting line with a horizontal angle smaller than or equal to a preset angle threshold value as a second candidate connecting line;
calculating the coverage proportion of each candidate connecting line to be analyzed in the table line foreground graph, and selecting the candidate connecting line to be analyzed with the coverage proportion more than or equal to a preset coverage threshold value as an effective connecting line;
searching a cell path according to the effective connecting line to obtain a candidate cell;
concatenating the candidate cells having the common vertices to generate one or more table structures;
the candidate connecting line to be analyzed is a first candidate connecting line or a second candidate connecting line, and the preset coverage threshold corresponding to the first candidate connecting line is smaller than the preset coverage threshold corresponding to the second candidate connecting line.
19. The form recognition apparatus of claim 18, wherein the form structure acquisition module is further configured to:
carrying out corrosion and expansion treatment on the surface line foreground graph by using a preset transverse check to obtain the transverse contour line;
corroding and expanding the table line foreground graph by using a preset vertical check to obtain the vertical contour line;
the preset horizontal kernels are matrixes of 1 row and N columns, matrix elements are all 1, the preset vertical kernels are matrixes of N rows and 1 column, and matrix elements are all 1;
and/or the like and/or,
the step of "calculating the coverage ratio of each candidate connecting line to be analyzed in the table line foreground map" specifically includes:
acquiring the total number of pixels of the candidate connecting line to be analyzed passing through the table line foreground graph;
acquiring the number of pixels belonging to the table line in the table line foreground graph from the pixels of the candidate connecting line to be analyzed passing through the table line foreground graph;
calculating the ratio of the number of the pixels belonging to the table line in the table line foreground graph to the total number of the pixels to obtain the coverage proportion of the candidate connecting line to be analyzed;
and/or the like and/or,
the step of performing cell path search according to the effective connection line to obtain the candidate cell specifically includes:
sequentially searching for effective connecting lines from each effective connecting line according to a clockwise searching direction formed by horizontal-vertical-horizontal-vertical to obtain other corresponding effective connecting lines of each effective connecting line;
and generating candidate cells according to each effective connecting line and the other corresponding effective connecting lines.
20. The form recognition apparatus of claim 18, wherein after the step of "concatenating candidate cells with common vertices to generate one or more form structures", the form structure acquisition module is further configured to:
connecting the horizontal edges of the candidate cells with the common vertex in the table structure to form a candidate horizontal line;
connecting longitudinal edges of candidate cells having a common vertex in the table structure to form a candidate longitudinal line;
acquiring a candidate transverse line to be processed which can be extended and overlapped with other candidate transverse lines, and extending and combining the candidate transverse line to be processed and the other candidate transverse lines to form a combined candidate transverse line;
acquiring candidate vertical lines to be processed which can be extended and overlapped with other candidate vertical lines, and extending and combining the candidate vertical lines to be processed and the other candidate vertical lines to form combined candidate vertical lines;
distributing horizontal index numbers to the merged candidate horizontal lines and other candidate horizontal lines which are not merged in the extending way according to a preset horizontal line arrangement sequence;
respectively allocating longitudinal index numbers to the merged candidate longitudinal lines and other candidate longitudinal lines which are not merged in the extending way according to a preset longitudinal line arrangement sequence;
and generating a table index of a table structure according to the transverse index number and the longitudinal index number.
21. A form recognition apparatus comprising a processor and a storage device, the storage device adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by the processor to perform the form recognition method of any one of claims 1 to 10.
22. A computer-readable storage medium, in which a plurality of program codes are stored, characterized in that the program codes are adapted to be loaded and run by a processor to perform the table identification method of any one of claims 1 to 10.
CN202011407580.8A 2020-12-03 2020-12-03 Table recognition method, device and computer readable storage medium Active CN112528813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011407580.8A CN112528813B (en) 2020-12-03 2020-12-03 Table recognition method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011407580.8A CN112528813B (en) 2020-12-03 2020-12-03 Table recognition method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112528813A CN112528813A (en) 2021-03-19
CN112528813B true CN112528813B (en) 2021-07-23

Family

ID=74997545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011407580.8A Active CN112528813B (en) 2020-12-03 2020-12-03 Table recognition method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112528813B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287654A (en) * 2019-07-25 2021-01-29 珠海金山办公软件有限公司 Document element alignment method and device
CN113283355A (en) * 2021-05-31 2021-08-20 平安国际智慧城市科技股份有限公司 Form image recognition method and device, computer equipment and storage medium
CN113343866A (en) * 2021-06-15 2021-09-03 杭州数梦工场科技有限公司 Identification method and device of form information and electronic equipment
CN113536951B (en) * 2021-06-22 2023-11-24 科大讯飞股份有限公司 Form identification method, related device, electronic equipment and storage medium
CN113269153B (en) * 2021-06-26 2024-03-19 中国电子系统技术有限公司 Form identification method and device
CN113657274B (en) * 2021-08-17 2022-09-20 北京百度网讯科技有限公司 Table generation method and device, electronic equipment and storage medium
CN116311301A (en) * 2023-02-17 2023-06-23 北京感易智能科技有限公司 Wireless form identification method and system
CN116503888B (en) * 2023-06-29 2023-09-05 杭州同花顺数据开发有限公司 Method, system and storage medium for extracting form from image
CN116612487B (en) * 2023-07-21 2023-10-13 亚信科技(南京)有限公司 Table identification method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102317933A (en) * 2009-01-02 2012-01-11 苹果公司 Content Profiling to Dynamically Configure Content Processing
CN109241894A (en) * 2018-08-28 2019-01-18 南京安链数据科技有限公司 A kind of specific aim ticket contents identifying system and method based on form locating and deep learning
CN109522816A (en) * 2018-10-26 2019-03-26 北京慧流科技有限公司 Table recognition method and device, computer storage medium
CN110188649A (en) * 2019-05-23 2019-08-30 成都火石创造科技有限公司 Pdf document analysis method based on tesseract-ocr
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN111062259A (en) * 2019-11-25 2020-04-24 泰康保险集团股份有限公司 Form recognition method and device
CN111178154A (en) * 2019-12-10 2020-05-19 北京明略软件系统有限公司 Table frame prediction model generation method and device and table positioning method and device
CN111460927A (en) * 2020-03-17 2020-07-28 北京交通大学 Method for extracting structured information of house property certificate image
CN111814722A (en) * 2020-07-20 2020-10-23 电子科技大学 Method and device for identifying table in image, electronic equipment and storage medium
CN111860502A (en) * 2020-07-15 2020-10-30 北京思图场景数据科技服务有限公司 Picture table identification method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10242257B2 (en) * 2017-05-18 2019-03-26 Wipro Limited Methods and devices for extracting text from documents
US20190122043A1 (en) * 2017-10-23 2019-04-25 Education & Career Compass Electronic document processing
GB2574608B (en) * 2018-06-11 2020-12-30 Innoplexus Ag System and method for extracting tabular data from electronic document

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102317933A (en) * 2009-01-02 2012-01-11 苹果公司 Content Profiling to Dynamically Configure Content Processing
CN109241894A (en) * 2018-08-28 2019-01-18 南京安链数据科技有限公司 A kind of specific aim ticket contents identifying system and method based on form locating and deep learning
CN109522816A (en) * 2018-10-26 2019-03-26 北京慧流科技有限公司 Table recognition method and device, computer storage medium
CN110188649A (en) * 2019-05-23 2019-08-30 成都火石创造科技有限公司 Pdf document analysis method based on tesseract-ocr
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN111062259A (en) * 2019-11-25 2020-04-24 泰康保险集团股份有限公司 Form recognition method and device
CN111178154A (en) * 2019-12-10 2020-05-19 北京明略软件系统有限公司 Table frame prediction model generation method and device and table positioning method and device
CN111460927A (en) * 2020-03-17 2020-07-28 北京交通大学 Method for extracting structured information of house property certificate image
CN111860502A (en) * 2020-07-15 2020-10-30 北京思图场景数据科技服务有限公司 Picture table identification method and device, electronic equipment and storage medium
CN111814722A (en) * 2020-07-20 2020-10-23 电子科技大学 Method and device for identifying table in image, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"复杂表格文档图像的模板识别与提取";杨靖明;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190815(第8期);I138-884 *

Also Published As

Publication number Publication date
CN112528813A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN112528813B (en) Table recognition method, device and computer readable storage medium
CN110738207B (en) Character detection method for fusing character area edge information in character image
US10817741B2 (en) Word segmentation system, method and device
CN110032998B (en) Method, system, device and storage medium for detecting characters of natural scene picture
CN108960229B (en) Multidirectional character detection method and device
US11042742B1 (en) Apparatus and method for detecting road based on convolutional neural network
CN103455814B (en) Text line segmenting method and text line segmenting system for document images
CN111595850B (en) Slice defect detection method, electronic device and readable storage medium
CN110334709B (en) License plate detection method based on end-to-end multi-task deep learning
CN111275034B (en) Method, device, equipment and storage medium for extracting text region from image
CN111754536B (en) Image labeling method, device, electronic equipment and storage medium
US9082019B2 (en) Method of establishing adjustable-block background model for detecting real-time image object
WO2017088479A1 (en) Method of identifying digital on-screen graphic and device
CN114266794B (en) Pathological section image cancer region segmentation system based on full convolution neural network
CN112446262A (en) Text analysis method, text analysis device, text analysis terminal and computer-readable storage medium
CN111415364A (en) Method, system and storage medium for converting image segmentation samples in computer vision
CN112446259A (en) Image processing method, device, terminal and computer readable storage medium
CN111626145A (en) Simple and effective incomplete form identification and page-crossing splicing method
CN115331245A (en) Table structure identification method based on image instance segmentation
CN113505772A (en) License plate image generation method and system based on generation countermeasure network
CN113158977B (en) Image character editing method for improving FANnet generation network
CN110874170A (en) Image area correction method, image segmentation method and device
CN114357958A (en) Table extraction method, device, equipment and storage medium
CN112541505B (en) Text recognition method, text recognition device and computer-readable storage medium
CN112837329B (en) Tibetan ancient book document image binarization method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant