CN110598575A - Table layout analysis and extraction method and related device - Google Patents

Table layout analysis and extraction method and related device Download PDF

Info

Publication number
CN110598575A
CN110598575A CN201910773607.6A CN201910773607A CN110598575A CN 110598575 A CN110598575 A CN 110598575A CN 201910773607 A CN201910773607 A CN 201910773607A CN 110598575 A CN110598575 A CN 110598575A
Authority
CN
China
Prior art keywords
single connected
line
chains
picture
connected chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910773607.6A
Other languages
Chinese (zh)
Other versions
CN110598575B (en
Inventor
王鹏飞
殷兵
胡金水
柳林
景子君
谢名亮
韩球
刘驰
魏冲洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201910773607.6A priority Critical patent/CN110598575B/en
Publication of CN110598575A publication Critical patent/CN110598575A/en
Application granted granted Critical
Publication of CN110598575B publication Critical patent/CN110598575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a table layout analysis and extraction method and a related device, wherein the method comprises the following steps: acquiring a picture containing a table layout; determining a plurality of single connected chain sets which meet the table line constraint condition in the picture, wherein each single connected chain set in the plurality of single connected chain sets comprises at least one single connected chain, the single connected chain corresponds to a transverse or vertical short line segment in the picture, and the table line constraint condition refers to that the single connected chains have the same direction and are positioned in the same straight line; generating a reference table line corresponding to each single connected chain set according to at least one single connected chain contained in each single connected chain set; and generating a table layout of the picture according to a plurality of reference table lines corresponding to the plurality of single connected chain sets. The method and the device are beneficial to improving the efficiency and the accuracy of extracting the table layout.

Description

Table layout analysis and extraction method and related device
Technical Field
The application relates to the technical field of computers, in particular to a table layout analysis and extraction method and a related device.
Background
In recent years, with the development of information technology, in informatization systems in the industries of finance, judicial sciences, social security, education, medical treatment and the like, scanning and information structuring recognition of existing paper document data are often required, automatic recognition and layout analysis and extraction of forms in the paper document data are also required, and otherwise, subsequent structuring processing and layout reprocessing processing cannot be performed on character recognition results in the forms. The existing method mainly detects a single connected chain through a DSCC algorithm, and then performs layout analysis on a table. However, when the cells are small and the characters are large, the cells are mistakenly cut, and after the table lines are detected, the positions of the cells are determined by adopting a cross point connection mode, which not only consumes time, but also depends too much on the cross points, so that the cells are easily mistakenly detected, and the problems of high information extraction cost and low efficiency exist.
Disclosure of Invention
The embodiment of the application provides a table layout analysis and extraction method and a related device, aiming to improve the efficiency and the accuracy of the equipment for analyzing and extracting the table layout.
In a first aspect, an embodiment of the present application provides a table layout analysis and extraction method, including:
acquiring a picture containing a table layout;
determining a plurality of single connected chain sets which meet the constraint condition of the form line in the picture, wherein each single connected chain set in the plurality of single connected chain sets comprises at least one single connected chain, the single connected chain corresponds to a transverse or vertical short line segment in the picture, and the constraint condition of the form line means that the directions of the single connected chains are the same and are in the same straight line;
generating a reference table line corresponding to each single connected chain set according to at least one single connected chain contained in each single connected chain set;
and generating the table layout of the picture according to a plurality of reference table lines corresponding to the plurality of single connected chain sets.
In a second aspect, an embodiment of the present application provides a table layout analysis and extraction apparatus, including a processing unit and a communication unit, wherein,
the processing unit is used for acquiring a picture containing a table layout through the communication unit; determining a plurality of single connected chain sets which meet the constraint condition of the form line in the picture, wherein each single connected chain set in the plurality of single connected chain sets comprises at least one single connected chain, the single connected chain corresponds to a transverse or vertical short line segment in the picture, and the constraint condition of the form line means that the directions of the single connected chains are the same and are in the same straight line; generating a reference table line corresponding to each single connected chain set according to at least one single connected chain contained in each single connected chain set; and generating the table layout of the picture according to a plurality of reference table lines corresponding to the plurality of single connected chain sets.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing steps in any method of the first aspect of the embodiment of the present application.
In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform part or all of the steps described in any one of the methods of the first aspect of the present application.
In a fifth aspect, the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform some or all of the steps as described in any one of the methods of the first aspect of the embodiments of the present application. The computer program product may be a software installation package.
It can be seen that, in the embodiment of the present application, after an image including a form layout is obtained, a plurality of single connected chain sets meeting a form line constraint condition in the image may be determined, and then a reference form line corresponding to each single connected chain set is generated according to at least one single connected chain included in each single connected chain set; finally, the form layout of the picture is generated according to the plurality of reference form lines corresponding to the plurality of single connected chain sets, and therefore the reference form lines generated by the plurality of single connected chain sets meeting the form line constraint conditions in the picture are determined, the form layout is obtained, the complexity of form detection is reduced, and the efficiency and the accuracy of extracting the form layout are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of an electronic device provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a table layout analysis and extraction method according to an embodiment of the present application;
FIG. 3a (1) is a schematic diagram of a method for extracting transverse single-connectivity chain provided in the embodiments of the present application;
FIG. 3a (2) is a schematic diagram of an extraction vertical single-connectivity chain provided in an embodiment of the present application;
FIG. 3b is a schematic diagram of a method for screening single-connectivity strands according to an embodiment of the present disclosure;
FIG. 3c is a schematic diagram of a linear single-connectivity chain extraction provided by an embodiment of the present application;
FIG. 3d is a schematic diagram of extracting a single connected chain according to a reference grid line according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a single-connectivity link provided by an embodiment of the present application;
fig. 5a is a schematic diagram illustrating table line misjudgment caused by overlarge characters according to an embodiment of the present application;
FIG. 5b is a schematic diagram illustrating a misjudgment of a table line caused by a square stamp according to an embodiment of the present disclosure;
FIG. 5c is a schematic diagram illustrating another table line misjudgment caused by a square stamp according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 7 is a block diagram of functional units of a table layout analysis and extraction apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, fig. 1 is a schematic diagram of an electronic device 100, where the electronic device 100 includes a picture obtaining device 110 and a form extracting device 120, where the picture obtaining device 110 obtains a picture including a form layout, and the form extracting device 120 is configured to extract a single connected link and process the single connected link to obtain the form layout. The electronic device according to the embodiment of the present application may include various handheld devices, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, which have wireless communication functions, and various forms of User Equipment (UE), Mobile Stations (MS), terminal devices (terminal device), and the like.
At present, table layout extraction is mainly performed by detecting a Single Connected Chain through a DSCC (direct Single-Connected Chain), then screening out a Single Connected Chain which may be a table line through length, then connecting the Single Connected Chain with a reduced table line, and finally determining a cell position through an intersection position of the table line, thereby performing layout analysis on the table. However, this method is time-consuming when a large number of straight lines are detected, and is likely to cause a cell to be erroneously detected.
Based on this, the embodiment of the present application provides a method for analyzing and extracting a table layout to solve the above problem, and the following describes the embodiment of the present application in detail.
Referring to fig. 2, fig. 2 is a schematic flowchart of a table layout analysis and extraction method provided in the present embodiment, which is applied to the electronic device shown in fig. 1, and as shown in the figure, the table layout analysis and extraction method includes:
s201, acquiring a picture containing a table layout;
the picture of the form layout can be any picture given by a user and containing a form, and the picture can be a picture obtained by shooting, scanning or any other way.
S202, determining a plurality of single connected chain sets which meet the constraint condition of the form line in the picture, wherein each single connected chain set in the plurality of single connected chain sets comprises at least one single connected chain, the single connected chain corresponds to a transverse or vertical short line segment in the picture, and the constraint condition of the form line refers to that the directions of the single connected chains are the same and the single connected chains are in the same straight line;
the single connected chain has horizontal and vertical parts corresponding to the horizontal short line segment or the vertical short line segment of the characters, the seals or the form lines in the picture. For example, the transverse single-connected chain may be composed of image runs with a transverse width of 1 pixel, and each image run is connected with two runs at the head and the tail, and only one run is connected with the two runs. Each single connected chain set in the plurality of single connected chain sets comprises one or more single connected chains corresponding to each table line.
In specific implementation, all the single connected chains in the picture are extracted, at least one single connected chain belonging to the same table line is screened from all the single connected chains, and the at least one single connected chain is sequenced from left to right or from top to bottom.
S203, generating a reference table line corresponding to each single connected chain set according to at least one single connected chain contained in each single connected chain set;
wherein each set of single connected chains comprises one or more single connected chains belonging to the same table line.
In a specific implementation, the multiple single connected chains arranged in each single connected chain set are connected from left to right or from top to bottom to obtain a reference table line.
And S204, generating the table layout of the picture according to the plurality of reference table lines corresponding to the plurality of single connected chain sets.
And the reference table line is a straight line obtained by connecting each single connected chain in each single connected chain set. And obtaining a reference table according to the connection and combination of the plurality of reference table lines, and carrying out illegal table filtering on the reference table to obtain the table layout of the picture.
It can be seen that, in the embodiment of the present application, after an image including a form layout is obtained, a plurality of single connected chain sets meeting a form line constraint condition in the image may be determined, and then a reference form line corresponding to each single connected chain set is generated according to at least one single connected chain included in each single connected chain set; finally, the form layout of the picture is generated according to the plurality of reference form lines corresponding to the plurality of single connected chain sets, and therefore the reference form lines generated by the plurality of single connected chain sets meeting the form line constraint conditions in the picture are determined, the form layout is obtained, the complexity of form detection is reduced, and the efficiency and the accuracy of extracting the form layout are improved.
In one possible example, the determining a plurality of sets of single connected chains in the picture that meet the table line constraint includes: detecting and extracting all single connected chains in the picture; screening a plurality of single connected chains which meet preset conditions in all the single connected chains; and dividing the single connected chains into a plurality of single connected chain sets according to the constraint condition of the table lines, wherein the constraint condition of the table lines refers to that the directions are the same and the single connected chains are in the same straight line.
All the single connected chains in the picture are detected and extracted, all the transverse single connected chains and all the vertical single connected chains can be extracted through a DSCC algorithm, the directed single connected chains are combined under certain constraint conditions, and straight lines can be extracted quickly and accurately. And after all the single connected chains are extracted, screening a plurality of single connected chains meeting the requirements according to preset conditions, and sequencing the screened single connected chains in sequence transversely and vertically to determine a single connected chain set belonging to the same table line.
In specific implementation, as shown in fig. 3a (1), all horizontal single-connected chains are extracted according to a picture, and as shown in fig. 3a (2), all vertical single-connected chains are extracted according to the picture, then the extracted horizontal single-connected chains and vertical single-connected chains are screened, single-connected chains with too short length, non-straight lines and the like are removed, and then the single-connected chains belonging to the same table line are divided into a single-connected chain set.
Therefore, a plurality of single connected chain sets meeting the table line constraint conditions can be obtained by screening and sequencing all the extracted single connected chains, the detection efficiency and accuracy of the single connected chains of the table lines are improved, and the extraction speed of the table layout is accelerated.
In one possible example, the screening a plurality of single connected chains among all the single connected chains, which meet a preset condition, includes: acquiring a self-adaptive length threshold value of the table layout in the picture; comparing each single connected chain in all the single connected chains with the self-adaptive length threshold value, and eliminating the single connected chains with the length smaller than the self-adaptive length threshold value to obtain the single connected chains meeting the self-adaptive length threshold value; rejecting non-linear single connected chains in the single connected chains which meet the self-adaptive length threshold value to obtain single connected chains which meet the linear requirement; calculating the inclination angle of the single-connection chain meeting the straight line requirement, and marking the maximum angle as the main direction of the whole table; rejecting single-connection chains with the difference value between the inclination angle and the main direction exceeding a preset angle threshold; and carrying out inclination correction on the single connected chains meeting the preset angle threshold according to the main direction to obtain a plurality of single connected chains meeting the preset conditions.
The adaptive length threshold of the table layout in the picture can be obtained by an OTSU ohy method, and the single connected link with the length being too small is removed according to the adaptive length threshold, wherein the threshold can be obtained by prior knowledge. The picture comprises an image foreground (namely a table) and a background, the proportion of the number of pixels belonging to the foreground in the whole image is recorded as p, the proportion of the number of pixels belonging to the background in the whole image is recorded as q, and the length threshold of the single-connection chain is determined according to the foreground proportion. For example, when the image size is L × W, and the foreground ratio is 50%, the transverse length threshold is greater than L × 50% × 15%, and the vertical length threshold is W × 50% × 20%, and the single-link links smaller than the length threshold are eliminated. Then, a straight line fitting method can be adopted, non-linear single connected chains are removed by judging the variance, finally, the inclination angle of each single connected chain is calculated, the maximum inclination angle is used as the main direction of the extracted table layout, the single connected chains with the inclination angle and the main direction different by more than a set threshold value are removed, wherein the set threshold value can be obtained through priori knowledge, and finally, the inclination correction is carried out on the picture by the main direction angle of the table.
In a specific implementation, the sizes of the scanned images are different, and the sizes of the tables are also different, and the adaptive threshold is obtained through the OTSU universe method, as shown in fig. 3b, fig. 3b is a schematic diagram of screening a single connected chain provided in an embodiment of the present application, a single connected chain with an excessively small length is rejected according to the adaptive threshold, as shown in fig. 3c, fig. 3c is a schematic diagram of extracting a linear single connected chain provided in an embodiment of the present application, a linear fitting method is adopted to reject a non-linear single connected chain by judging the size of the variance, for example, for a single connected chain, each pixel coordinate of the single connected chain is determined, a straight line is determined according to each pixel coordinate, so that the straight line passes through the pixel coordinates of the single connected chain as much as possible, the variance between the pixel coordinate of the single connected chain and the straight line is minimized, and when the variance is judged to be greater than a preset variance threshold, the single-link chain is determined to be non-linear and rejected. And calculating the inclination angle of the removed single connected chain, removing the single connected chain in the direction of the non-surface line, such as an inclined straight line, and performing inclination correction on the picture.
Therefore, the single connected chain set meeting the preset conditions can be screened based on the adaptive length threshold value, the inclination angle and the like of the single connected chain, the single connected chain set with the short length, the non-linear and oblique single connected chains can be effectively eliminated, and the accuracy and the efficiency of table layout extraction are improved.
In one possible example, the dividing the plurality of single connected chains into a plurality of single connected chain sets according to the table line constraint condition includes: acquiring a vertical difference value of any two transverse single-connected chains in the plurality of single-connected chains; when the vertical difference value is smaller than a first preset threshold value, determining that any two transverse single-connected chains are the same transverse table line; sequencing all the transverse single-connected chains on the same transverse table line from left to right; or acquiring a transverse difference value of any two vertical single-connected chains in the plurality of single-connected chains; when the transverse difference value is smaller than a second preset threshold value, determining that any two vertical single-connected chains are the same vertical table line; sequencing all vertical single-connected chains on the same vertical table line from top to bottom; and determining a plurality of single connected chain sets according to the sorted single connected chains, wherein each single connected chain set corresponds to a plurality of single connected chains of one table line.
Each table line corresponds to a single connected chain set, and a plurality of table lines correspond to a plurality of single connected chain sets. And acquiring a vertical difference value of any two transverse single connected chains, and determining that the two single connected chains belong to the same transverse table line when the vertical difference value is smaller than a preset threshold value. And acquiring a transverse difference value of any two transverse single-connection chains, and determining that the two vertical single-connection chains belong to the same table line when the transverse difference value is smaller than a preset threshold value. And sequencing the single connected chains belonging to the same transverse table line from left to right, and sequencing the single connected chains belonging to the same vertical table line from top to bottom.
Therefore, the method can determine whether two single connected chains belong to the same table line or not based on calculation of the difference between any two homodromous single connected chains, sequence the single connected chains belonging to the same table line to obtain a plurality of single connected chain sets meeting the table constraint conditions, and improve the accuracy of extracting the table line and the table layout.
In one possible example, the determining a plurality of sets of single connected chains in the picture that meet the table line constraint includes: extracting all single connected chains in the picture; determining a gridline reference point for each of the all single connected chains; determining a longitudinal reference grid line and a transverse reference grid line of each single connected chain in the picture by taking the grid line reference point as a reference point; and determining a plurality of single connected chain sets which are completely positioned at the same reference grid line in all the single connected chains.
The method comprises the steps of determining any point on each of the plurality of single connected chains as a grid line reference point, taking the grid line reference point as a longitudinal reference grid line and a transverse reference grid line, taking all transverse single connected chains in the same transverse reference grid line as a single connected chain set, taking all vertical single connected chains in the same longitudinal reference grid line as a single connected chain set, and obtaining a plurality of single connected chain sets corresponding to the plurality of reference grid lines.
Therefore, the reference grid lines can be generated based on each single connected chain, a plurality of single connected chain sets in the same reference grid line are determined according to the reference grid lines and the single connected chains, and the efficiency of extracting the table layout is improved.
In one possible example, the determining a plurality of single connected chain sets that are all at the same reference grid line in all the single connected chains includes: determining a plurality of linear single connected chains in all the single connected chains; determining at least one single connected chain in the same reference grid line according to the reference grid line of each linear single connected chain in the plurality of linear single connected chains; and sequencing the at least one single connected chain to obtain a plurality of single connected chain sets.
Wherein the reference grid line is an auxiliary tool line made for each single connected chain that determines a plurality of sets of single connected chains. Determining a longitudinal reference grid line and a transverse reference grid line of each single connected chain according to the extracted single connected chain, and determining that at least one single connected chain with the superposed transverse reference grid lines is in the same transverse grid line when the superposition of the transverse reference grid lines of the transverse single connected chains is detected; and when the coincidence of the longitudinal reference grid lines of the vertical single connected chains is detected, determining that at least one single connected chain coincident with the longitudinal reference grid lines is in the same vertical grid line. And determining whether the single connected chain is positioned on the reference grid line according to the single connected chain and the corresponding reference grid line, if so, determining that the single connected chain is a linear single connected chain, otherwise, determining that the single connected chain is a non-linear single connected chain, and rejecting the non-linear single connected chain. Determining transverse reference grid lines of a plurality of transverse single-connected chains in the straight single-connected chain, and if the transverse reference grid lines of the plurality of transverse single-connected chains are in the same straight line, determining that the plurality of transverse single-connected chains belong to the same reference table line to obtain a straight single-connected chain set. And removing the single connected chains with the length smaller than a preset threshold value in the single connected chains in the linear single connected chain set, which are superposed with the transverse reference line or the vertical reference line, to obtain a plurality of single connected chain sets. As shown in fig. 3d, the dotted line in fig. 3d is a reference grid line, one point a of the horizontal single-connected chain H1 is taken as a reference grid line, the single-connected chain H1 coincides with the horizontal reference grid line, the point b of the horizontal single-connected chain H2 coincides with the reference grid line, the single-connected chain H2 coincides with the horizontal reference grid line, and the horizontal reference lines of the single-connected chains H1 and H2 coincide, so that it can be determined that the single-connected chains H1 and H2 are straight lines and belong to the same horizontal grid line. One point c of the vertical single-connected chain S1 is taken as a reference grid line, the single-connected chain S1 is overlapped with the longitudinal reference grid line, the point d of the vertical single-connected chain S2 is taken as a reference grid line, the single-connected chain S2 is overlapped with the longitudinal reference grid line, and the vertical reference lines of the single-connected chain S1 and the single-connected chain S2 are overlapped, so that the single-connected chain S1 and the single-connected chain S2 can be judged to be straight lines and be on the same straight line, namely belong to the same vertical grid line. Dividing a plurality of single connected chains belonging to the same table line into a single connected chain set, determining the single connected chain which is not overlapped with the transverse reference line or the vertical reference line as a non-linear single connected chain, and removing the non-linear single connected chain.
Therefore, the non-linear single connected chains can be removed based on the reference grid lines of the single connected chains to obtain a single connected chain set, and the accuracy of extracting the table layout is improved.
In one possible example, generating the table layout of the picture according to a plurality of reference table lines corresponding to the plurality of single connected chain sets includes: determining a reference table according to a plurality of reference table lines corresponding to the single connected chain sets; and filtering illegal forms according to an illegal form source and the reference form to obtain the form layout of the picture, wherein the illegal forms are forms generated by the illegal form source, and the illegal form source comprises straight lines, except for form lines, in the picture, which cause misjudgment of reference form lines.
As shown in fig. 4, fig. 4 is a schematic diagram of single connected chain connection provided in an embodiment of the present application, where at least one single connected chain included in each single connected chain set is connected to obtain a reference table line, and a plurality of reference table lines obtained according to a plurality of single connected chain sets are connected to obtain a reference table. Wherein, the connecting at least one single connected chain included in each single connected chain set to obtain a reference table line includes: judging whether a plurality of single connected chains on the same table line are broken lines or not; if so, performing broken line connection on the plurality of single connected chains to obtain a reference table line; the broken lines may be caused by the plain lines in the image being printed unclear or the form lines being covered by other images.
In the specific implementation, in order to extract the table layout more accurately, after the reference table is generated, the reference table is subjected to post-processing, so that illegal tables are filtered out, and the table layout is extracted. The illegal table comprises a table formed by misjudging a connecting line due to too large square seal or font.
Therefore, the reference table can be generated based on the reference table line, illegal tables in the reference table are filtered, and the accuracy of extracting the table layout is improved.
In one possible example, the filtering out illegal forms according to illegal form sources and the reference form to obtain the form layout of the picture includes: identifying the colors of all reference form lines; identifying the straight lines except the table lines according to the colors and filtering misjudged reference table lines to obtain the table layout of the picture; or, obtaining the line width of the reference table line to obtain a line width set; judging whether the line width larger than a preset width threshold value exists in the line width set or not; if so, filtering out a reference table line corresponding to the line width larger than a preset width threshold value to obtain the table layout of the picture; or, identifying the form characteristics of the table layout; filtering out a reference table line corresponding to a reference table with the morphological characteristics not conforming to a preset morphological rule to obtain the table layout of the picture; or detecting straight lines in the picture except for the table lines by an image processing technology; calculating a horizontal pixel or a vertical pixel of the straight line; calculating the pixel overlapping degree of the straight line and all the reference table lines according to the horizontal pixels or the vertical pixels; obtaining the accumulated value of the overlapping degree of each reference table line according to the pixel overlapping degree; and when the accumulated value is judged to be larger than a preset character overlapping threshold value, eliminating the reference table line corresponding to the straight line to obtain the table layout of the picture.
The straight lines except the table lines, which cause misjudgment of the reference table lines in the picture, are identified according to the colors of all the reference table lines, for example, the color of a common seal is red or blue, and the table lines formed by a square seal can be removed by the color of the seal; in practice, the width of the frame line of the square seal is far greater than that of the common table line, and the table line far greater than the preset width threshold of the table line can be removed by detecting the width of the table line of the table layout; the table lines of the reference table which do not accord with the table morphological characteristics can be removed through the morphological characteristics of the table; and the illegal table generated by the square seal can be removed by any one or more methods.
In addition, the illegal tables can be removed according to the overlapping degree of straight lines except the table lines and the pixels of the table lines. For example, firstly, characters in the picture can be extracted in a connected domain mode during preprocessing; and then, sequentially calculating the pixel overlapping degree of each character and all detected form lines, wherein the pixel overlapping degree only refers to the number of pixel points between the pixel points occupied by strokes in the characters and the form lines and is recorded as the pixel overlapping degree. And then, for each table line, obtaining the accumulated value of the overlapping degree on the table line, and finally, determining the reference table line with the accumulated value of the overlapping degree larger than a preset character overlapping threshold value as the error-identified table line caused by overlarge characters and rejecting the error-identified table line.
In a specific implementation, as shown in fig. 5a, fig. 5a is a schematic diagram of misjudgment of a table line due to an excessively large character according to an embodiment of the present application, where when an "eleven" character in the diagram is excessively large, a horizontal sum and a "one" of "ten" are misjudged as a horizontal table line, and a vertical sum of "ten" is misjudged as a vertical table line. For overlarge characters, preprocessing a picture to extract eleven characters, calculating the overlapping degree of pixel points occupied by horizontal strokes in the characters and pixel points of misjudged horizontal form lines to obtain a cumulative value of the overlapping degree of the horizontal form lines, calculating the overlapping degree of pixel points occupied by vertical strokes and pixel points of misjudged vertical form lines to obtain a cumulative value of the overlapping degree of the vertical form lines, determining the misjudged form lines due to the overlarge characters when the cumulative value is judged to be larger than a preset character overlapping threshold value, and rejecting the misjudged form lines; and when the accumulated value is judged to be smaller than the preset character overlapping threshold value, determining the accumulated value as a legal form line.
Therefore, the speed of extracting the table layout can be increased and the accuracy of extracting the table layout can be improved by removing illegal tables caused by straight lines except the table lines in the picture.
In one possible example, the filtering out a reference table line corresponding to a reference table whose morphological feature does not meet a preset morphological rule includes: determining the corner points of the table through the intersection point form of the horizontal reference table lines and the vertical reference table lines; determining the number of unit tables in the reference table according to the corner points; determining the form features of the table according to the number of the unit tables, wherein the form features comprise the number of corner points and the number of intersection points; and filtering out reference table lines corresponding to the reference tables which do not accord with the preset form rule according to the form characteristics of the tables.
Firstly, the number of tables is obtained by the number of intersections of the horizontal table lines and the vertical table lines of the table layout, that is, whether the intersection is an angular point of the table is judged, as shown in fig. 5b, fig. 5b is a schematic diagram of table line misjudgment caused by a square seal provided by the embodiment of the present application, in the diagram, the intersection 1 is an intersection of the horizontal table line X1 and the vertical table line Y1, and if no pixel point exists on the left and above the intersection 1, the intersection 1 is an angular point of the upper left corner of the table to which the horizontal table line X1 and the vertical table line Y1 belong. The number of independent tables is determined by the number of judged corner points, for example, as shown in fig. 5b, where there are 8 corner points, two independent tables are detected. For example, as shown in fig. 5b, the table generated by the square seal in the figure has two edge lines with intersection points, namely intersection point 2 and intersection point 3, but the two edge lines are adjacent edges rather than opposite edges, the table generated by the square seal is determined to be an illegal table, and the corresponding table line is removed.
In a specific implementation, there may be a situation that the square seal is inside the table, as shown in fig. 5c, fig. 5c is another schematic diagram of misjudgment of the table lines caused by the square seal according to the embodiment of the present application, the number of the corner points is determined by detecting the number of intersection points of the table, and then the table lines generated by the square seal are removed according to the intersection points 4 and 5 and the morphological features of the table. Or after the number of the corner points is determined, coordinates of the corner points are determined, whether the corner points are in each cell of the table layout or not is judged according to the coordinates of the corner points, if yes, morphological characteristics of the cells with the corner points are determined, and if the morphological characteristics do not accord with preset morphological rules, the tables which do not accord with the morphological characteristics of the tables are removed.
Therefore, illegal forms generated by the square seal can be removed in various ways, the speed of extracting the form layout is increased, and the accuracy of extracting the form layout is improved.
The present application is described in detail below with reference to some examples.
Referring to fig. 6 in accordance with the embodiment shown in fig. 2, fig. 6 is a schematic structural diagram of an electronic device 600 according to an embodiment of the present application, and as shown in the figure, the electronic device 600 includes an application processor 610, a memory 620, a communication interface 630, and one or more programs 621, where the one or more programs 621 are stored in the memory 620 and configured to be executed by the application processor 610, and the one or more programs 621 include instructions for performing the following steps;
acquiring a picture containing a table layout;
determining a plurality of single connected chain sets which meet the constraint condition of the form line in the picture, wherein each single connected chain set in the plurality of single connected chain sets comprises at least one single connected chain, the single connected chain corresponds to a transverse or vertical short line segment in the picture, and the constraint condition of the form line means that the directions of the single connected chains are the same and are in the same straight line;
generating a reference table line corresponding to each single connected chain set according to at least one single connected chain contained in each single connected chain set;
and generating the table layout of the picture according to a plurality of reference table lines corresponding to the plurality of single connected chain sets.
It can be seen that, in the embodiment of the present application, after an image including a form layout is obtained, a plurality of single connected chain sets meeting a form line constraint condition in the image may be determined, and then a reference form line corresponding to each single connected chain set is generated according to at least one single connected chain included in each single connected chain set; finally, the form layout of the picture is generated according to the plurality of reference form lines corresponding to the plurality of single connected chain sets, and therefore the reference form lines generated by the plurality of single connected chain sets meeting the form line constraint conditions in the picture are determined, the form layout is obtained, the complexity of form detection is reduced, and the efficiency and the accuracy of extracting the form layout are improved.
In one possible example, in the determining the plurality of sets of single connected chains in the picture that satisfy the tablewire constraint, the instructions in the program are specifically configured to: detecting and extracting all single connected chains in the picture; screening a plurality of single connected chains which meet preset conditions in all the single connected chains; and dividing the single connected chains into a plurality of single connected chain sets according to the table line constraint condition.
In one possible example, in the aspect of the screening of the plurality of single connected chains meeting the preset condition in all the single connected chains, the instructions in the program are specifically configured to: acquiring a self-adaptive length threshold value of the table layout in the picture; comparing each single connected chain in all the single connected chains with the self-adaptive length threshold value, and eliminating the single connected chains with the length smaller than the self-adaptive length threshold value to obtain the single connected chains meeting the self-adaptive length threshold value; rejecting non-linear single connected chains in the single connected chains which meet the self-adaptive length threshold value to obtain single connected chains which meet the linear requirement; calculating the inclination angle of the single-connection chain meeting the straight line requirement, and marking the maximum angle as the main direction of the whole table; rejecting single-connection chains with the difference value between the inclination angle and the main direction exceeding a preset angle threshold; and carrying out inclination correction on the single connected chains meeting the preset angle threshold according to the main direction to obtain a plurality of single connected chains meeting the preset conditions.
In one possible example, in terms of the dividing the plurality of single connected chains into a plurality of single connected chain sets according to the watchline constraint, the instructions in the program are specifically configured to perform the following operations: acquiring a vertical difference value of any two transverse single-connected chains in the plurality of single-connected chains; when the vertical difference value is smaller than a first preset threshold value, determining that any two transverse single-connected chains are the same transverse table line; sequencing all the transverse single-connected chains on the same transverse table line from left to right; or acquiring a transverse difference value of any two vertical single-connected chains in the plurality of single-connected chains; when the transverse difference value is smaller than a second preset threshold value, determining that any two vertical single-connected chains are the same vertical table line; sequencing all vertical single-connected chains on the same vertical table line from top to bottom; and determining a plurality of single connected chain sets according to the sorted single connected chains, wherein each single connected chain set corresponds to a plurality of single connected chains of one table line.
In one possible example, in the determining the plurality of sets of single connected chains in the picture that satisfy the tablewire constraint, the instructions in the program are specifically configured to: extracting all single connected chains in the picture; determining a gridline reference point for each of the all single connected chains; determining a longitudinal reference grid line and a transverse reference grid line of each single connected chain in the picture by taking the grid line reference point as a reference point; and determining a plurality of single connected chain sets which are completely positioned at the same reference grid line in all the single connected chains.
In one possible example, in the determining a plurality of single connected chain sets that are all at the same reference grid line in all the single connected chains, the instructions in the program are specifically configured to: determining a plurality of linear single connected chains in all the single connected chains; determining at least one single connected chain in the same reference grid line according to the reference grid line of each linear single connected chain in the plurality of linear single connected chains; and sequencing the at least one single connected chain to obtain a plurality of single connected chain sets.
In one possible example, in the aspect that the table layout of the picture is generated according to a plurality of reference table lines corresponding to the plurality of single connected chain sets, the instructions in the program are specifically configured to perform the following operations: determining a reference table according to a plurality of reference table lines corresponding to the single connected chain sets; and filtering illegal forms according to an illegal form source and the reference form to obtain the form layout of the picture, wherein the illegal forms are forms generated by the illegal form source, and the illegal form source comprises straight lines, except for form lines, in the picture, which cause misjudgment of reference form lines.
In a possible example, when the illegal source is a reference table line misjudgment caused by a square seal, the illegal table is filtered according to the illegal table source and the reference table to obtain the table layout of the picture, and the instruction in the program is specifically configured to perform the following operations: identifying the colors of all reference form lines; identifying the straight lines except the table lines according to the colors and filtering misjudged reference table lines to obtain the table layout of the picture; or, obtaining the line width of the reference table line to obtain a line width set; judging whether the line width larger than a preset width threshold value exists in the line width set or not; if so, filtering out a reference table line corresponding to the line width larger than a preset width threshold value to obtain the table layout of the picture; or, identifying the form characteristics of the table layout; filtering out a reference table line corresponding to a reference table with the morphological characteristics not conforming to a preset morphological rule to obtain the table layout of the picture; or detecting straight lines in the picture except for the table lines by an image processing technology; calculating a horizontal pixel or a vertical pixel of the straight line; calculating the pixel overlapping degree of the straight line and all the reference table lines according to the horizontal pixels or the vertical pixels; obtaining the accumulated value of the overlapping degree of each reference table line according to the pixel overlapping degree; and when the accumulated value is judged to be larger than a preset character overlapping threshold value, eliminating the reference table line corresponding to the straight line to obtain the table layout of the picture.
In one possible example, in terms of filtering out a reference table line corresponding to a reference table whose morphological feature does not meet a preset morphological rule, the instructions in the program are specifically configured to perform the following operations: determining the corner points of the table through the intersection point form of the horizontal reference table lines and the vertical reference table lines; determining the number of unit tables in the reference table according to the corner points; determining the form features of the table according to the number of the unit tables, wherein the form features comprise the number of corner points and the number of intersection points; and filtering out reference table lines corresponding to the reference tables which do not accord with the preset form rule according to the form characteristics of the tables.
The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Fig. 7 is a block diagram of functional units of a table layout analysis and extraction apparatus 700 according to an embodiment of the present application. The form layout analysis and extraction apparatus 700 is applied to an electronic device including a processing unit 701 and a communication unit 702, wherein,
the processing unit 701 is configured to acquire a picture including a form layout through the communication unit 702; determining a plurality of single connected chain sets which meet the constraint condition of the form line in the picture, wherein each single connected chain set in the plurality of single connected chain sets comprises at least one single connected chain, the single connected chain corresponds to a transverse or vertical short line segment in the picture, and the constraint condition of the form line means that the directions of the single connected chains are the same and are in the same straight line; generating a reference table line corresponding to each single connected chain set according to at least one single connected chain contained in each single connected chain set; and generating the table layout of the picture according to a plurality of reference table lines corresponding to the plurality of single connected chain sets.
The table layout analyzing and extracting apparatus 700 may further include a storage unit 703 for storing program codes and data of the electronic device. The processing unit 701 may be a processor, the communication unit 702 may be an internal communication interface, and the storage unit 703 may be a memory.
It can be seen that, in the embodiment of the present application, after an image including a form layout is obtained, a plurality of single connected chain sets meeting a form line constraint condition in the image may be determined, and then a reference form line corresponding to each single connected chain set is generated according to at least one single connected chain included in each single connected chain set; finally, the form layout of the picture is generated according to the plurality of reference form lines corresponding to the plurality of single connected chain sets, and therefore the reference form lines generated by the plurality of single connected chain sets meeting the form line constraint conditions in the picture are determined, the form layout is obtained, the complexity of form detection is reduced, and the efficiency and the accuracy of extracting the form layout are improved.
In one possible example, in the aspect of determining a plurality of single connected chain sets meeting the table line constraint condition in the picture, the processing unit 701 is specifically configured to: detecting and extracting all single connected chains in the picture; screening a plurality of single connected chains which meet preset conditions in all the single connected chains; and dividing the single connected chains into a plurality of single connected chain sets according to the table line constraint condition.
In a possible example, in the aspect of the screening a plurality of single connected chains among all the single connected chains that meet a preset condition, the processing unit 701 is specifically configured to: acquiring a self-adaptive length threshold value of the table layout in the picture; comparing each single connected chain in all the single connected chains with the self-adaptive length threshold value, and eliminating the single connected chains with the length smaller than the self-adaptive length threshold value to obtain the single connected chains meeting the self-adaptive length threshold value; rejecting non-linear single connected chains in the single connected chains which meet the self-adaptive length threshold value to obtain single connected chains which meet the linear requirement; calculating the inclination angle of the single-connection chain meeting the straight line requirement, and marking the maximum angle as the main direction of the whole table; rejecting single-connection chains with the difference value between the inclination angle and the main direction exceeding a preset angle threshold; and carrying out inclination correction on the single connected chains meeting the preset angle threshold according to the main direction to obtain a plurality of single connected chains meeting the preset conditions.
In a possible example, in terms of dividing the plurality of single connected chains into a plurality of single connected chain sets according to a table line constraint, the processing unit 701 is specifically configured to: acquiring a vertical difference value of any two transverse single-connected chains in the plurality of single-connected chains; when the vertical difference value is smaller than a first preset threshold value, determining that any two transverse single-connected chains are the same transverse table line; sequencing all the transverse single-connected chains on the same transverse table line from left to right; or acquiring a transverse difference value of any two vertical single-connected chains in the plurality of single-connected chains; when the transverse difference value is smaller than a second preset threshold value, determining that any two vertical single-connected chains are the same vertical table line; sequencing all vertical single-connected chains on the same vertical table line from top to bottom; and determining a plurality of single connected chain sets according to the sorted single connected chains, wherein each single connected chain set corresponds to a plurality of single connected chains of one table line.
In one possible example, in the aspect of determining a plurality of single connected chain sets meeting the table line constraint condition in the picture, the processing unit 701 is specifically configured to: extracting all single connected chains in the picture; determining a gridline reference point for each of the all single connected chains; determining a longitudinal reference grid line and a transverse reference grid line of each single connected chain in the picture by taking the grid line reference point as a reference point; and determining a plurality of single connected chain sets which are completely positioned at the same reference grid line in all the single connected chains.
In one possible example, in terms of the determining that a plurality of single connected chains in all the single connected chains are completely in a single connected chain set of the same reference grid line, the processing unit 701 is specifically configured to: determining a plurality of linear single connected chains in all the single connected chains; determining at least one single connected chain in the same reference grid line according to the reference grid line of each linear single connected chain in the plurality of linear single connected chains; and sequencing the at least one single connected chain to obtain a plurality of single connected chain sets.
In a possible example, in the aspect that the table layout of the picture is generated according to a plurality of reference table lines corresponding to the plurality of single connected chain sets, the processing unit 701 is specifically configured to: determining a reference table according to a plurality of reference table lines corresponding to the single connected chain sets; and filtering illegal forms according to an illegal form source and the reference form to obtain the form layout of the picture, wherein the illegal forms are forms generated by the illegal form source, and the illegal form source comprises straight lines, except for form lines, in the picture, which cause misjudgment of reference form lines.
In a possible example, when the illegal source is a reference table line misjudgment caused by a square seal, in terms of filtering an illegal table according to the illegal table source and the reference table to obtain the table layout of the picture, the processing unit 701 is specifically configured to: identifying the colors of all reference form lines; identifying the straight lines except the table lines according to the colors and filtering misjudged reference table lines to obtain the table layout of the picture; or, obtaining the line width of the reference table line to obtain a line width set; judging whether the line width larger than a preset width threshold value exists in the line width set or not; if so, filtering out a reference table line corresponding to the line width larger than a preset width threshold value to obtain the table layout of the picture; or, identifying the form characteristics of the table layout; filtering out a reference table line corresponding to a reference table with the morphological characteristics not conforming to a preset morphological rule to obtain the table layout of the picture; or detecting straight lines in the picture except for the table lines by an image processing technology; calculating a horizontal pixel or a vertical pixel of the straight line; calculating the pixel overlapping degree of the straight line and all the reference table lines according to the horizontal pixels or the vertical pixels; obtaining the accumulated value of the overlapping degree of each reference table line according to the pixel overlapping degree; and when the accumulated value is judged to be larger than a preset character overlapping threshold value, eliminating the reference table line corresponding to the straight line to obtain the table layout of the picture.
In a possible example, in terms of filtering out a reference table line corresponding to a reference table whose morphological feature does not meet a preset morphological rule, the processing unit 701 is specifically configured to: determining the corner points of the table through the intersection point form of the horizontal reference table lines and the vertical reference table lines; determining the number of unit tables in the reference table according to the corner points; determining the form features of the table according to the number of the unit tables, wherein the form features comprise the number of corner points and the number of intersection points; and filtering out reference table lines corresponding to the reference tables which do not accord with the preset form rule according to the form characteristics of the tables.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. A method for analyzing and extracting table layout is characterized by comprising the following steps:
acquiring a picture containing a table layout;
determining a plurality of single connected chain sets which meet the constraint condition of the form line in the picture, wherein each single connected chain set in the plurality of single connected chain sets comprises at least one single connected chain, the single connected chain corresponds to a transverse or vertical short line segment in the picture, and the constraint condition of the form line means that the directions of the single connected chains are the same and are in the same straight line;
generating a reference table line corresponding to each single connected chain set according to at least one single connected chain contained in each single connected chain set;
and generating the table layout of the picture according to a plurality of reference table lines corresponding to the plurality of single connected chain sets.
2. The method of claim 1, wherein determining a plurality of sets of single connected chains in the picture that satisfy a tablewire constraint comprises:
detecting and extracting all single connected chains in the picture;
screening a plurality of single connected chains which meet preset conditions in all the single connected chains;
and dividing the single connected chains into a plurality of single connected chain sets according to the table line constraint condition.
3. The method according to claim 2, wherein the screening a plurality of the all single connected chains that meet a preset condition comprises:
acquiring a self-adaptive length threshold value of the table layout in the picture;
comparing each single connected chain in all the single connected chains with the self-adaptive length threshold value, and eliminating the single connected chains with the length smaller than the self-adaptive length threshold value to obtain the single connected chains meeting the self-adaptive length threshold value;
rejecting non-linear single connected chains in the single connected chains which meet the self-adaptive length threshold value to obtain single connected chains which meet the linear requirement;
calculating the inclination angle of the single-connection chain meeting the straight line requirement, and marking the maximum angle as the main direction of the whole table;
rejecting single-connection chains with the difference value between the inclination angle and the main direction exceeding a preset angle threshold;
and carrying out inclination correction on the single connected chains meeting the preset angle threshold according to the main direction to obtain a plurality of single connected chains meeting the preset conditions.
4. The method according to claim 2 or 3, wherein the dividing the plurality of single connected chains into a plurality of single connected chain sets according to the table line constraint condition comprises:
acquiring a vertical difference value of any two transverse single-connected chains in the plurality of single-connected chains;
when the vertical difference value is smaller than a first preset threshold value, determining that any two transverse single-connected chains are the same transverse table line;
sequencing all the transverse single-connected chains on the same transverse table line from left to right; or the like, or, alternatively,
acquiring a transverse difference value of any two vertical single-connected chains in the plurality of single-connected chains;
when the transverse difference value is smaller than a second preset threshold value, determining that any two vertical single-connected chains are the same vertical table line;
sequencing all vertical single-connected chains on the same vertical table line from top to bottom;
and determining a plurality of single connected chain sets according to the sorted single connected chains, wherein each single connected chain set corresponds to a plurality of single connected chains of one table line.
5. The method of claim 1, wherein determining a plurality of sets of single connected chains in the picture that satisfy a tablewire constraint comprises:
extracting all single connected chains in the picture;
determining a gridline reference point for each of the all single connected chains;
determining a longitudinal reference grid line and a transverse reference grid line of each single connected chain in the picture by taking the grid line reference point as a reference point;
and determining a plurality of single connected chain sets which are completely positioned at the same reference grid line in all the single connected chains.
6. The method of claim 5, wherein determining a plurality of single connected chain sets that are all at the same reference grid line in all the single connected chains comprises:
determining a plurality of linear single connected chains in all the single connected chains;
determining at least one single connected chain in the same reference grid line according to the reference grid line of each linear single connected chain in the plurality of linear single connected chains;
and sequencing the at least one single connected chain to obtain a plurality of single connected chain sets.
7. The method according to any one of claims 1-6, wherein the generating the table layout of the picture according to the plurality of reference table lines corresponding to the plurality of single connected chain sets comprises:
determining a reference table according to a plurality of reference table lines corresponding to the single connected chain sets;
and filtering illegal forms according to an illegal form source and the reference form to obtain the form layout of the picture, wherein the illegal forms are forms generated by the illegal form source, and the illegal form source comprises straight lines, except for form lines, in the picture, which cause misjudgment of reference form lines.
8. The method of claim 7, wherein said filtering out illegal forms according to illegal form sources and said reference form to obtain said form layout of said picture comprises:
identifying the colors of all reference form lines;
identifying the straight lines except the table lines according to the colors and filtering misjudged reference table lines to obtain the table layout of the picture; or the like, or, alternatively,
acquiring the line width of the reference table line to obtain a line width set;
judging whether the line width larger than a preset width threshold value exists in the line width set or not;
if so, filtering out a reference table line corresponding to the line width larger than a preset width threshold value to obtain the table layout of the picture; or the like, or, alternatively,
identifying morphological characteristics of the table layout;
filtering out a reference table line corresponding to a reference table with the morphological characteristics not conforming to a preset morphological rule to obtain the table layout of the picture; or the like, or, alternatively,
detecting straight lines in the picture except for the table lines through an image processing technology;
calculating a horizontal pixel or a vertical pixel of the straight line;
calculating the pixel overlapping degree of the straight line and all the reference table lines according to the horizontal pixels or the vertical pixels;
obtaining the accumulated value of the overlapping degree of each reference table line according to the pixel overlapping degree;
and when the accumulated value is judged to be larger than a preset character overlapping threshold value, eliminating the reference table line corresponding to the straight line to obtain the table layout of the picture.
9. The method according to claim 8, wherein the filtering out the reference table line corresponding to the reference table whose morphological feature does not conform to the preset morphological rule comprises:
determining the corner points of the table through the intersection point form of the horizontal reference table lines and the vertical reference table lines;
determining the number of unit tables in the reference table according to the corner points;
determining the form features of the table according to the number of the unit tables, wherein the form features comprise the number of corner points and the number of intersection points;
and filtering out reference table lines corresponding to the reference tables which do not accord with the preset form rule according to the form characteristics of the tables.
10. A table layout analysis and extraction device is characterized by comprising a processing unit and a communication unit, wherein,
the processing unit is used for acquiring a picture containing a table layout through the communication unit; determining a plurality of single connected chain sets which meet the constraint condition of the form line in the picture, wherein each single connected chain set in the plurality of single connected chain sets comprises at least one single connected chain, the single connected chain corresponds to a transverse or vertical short line segment in the picture, and the constraint condition of the form line means that the directions of the single connected chains are the same and are in the same straight line; generating a reference table line corresponding to each single connected chain set according to at least one single connected chain contained in each single connected chain set; and generating the table layout of the picture according to a plurality of reference table lines corresponding to the plurality of single connected chain sets.
11. An electronic device comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-9.
12. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-9.
CN201910773607.6A 2019-08-21 2019-08-21 Form layout analysis and extraction method and related device Active CN110598575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910773607.6A CN110598575B (en) 2019-08-21 2019-08-21 Form layout analysis and extraction method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910773607.6A CN110598575B (en) 2019-08-21 2019-08-21 Form layout analysis and extraction method and related device

Publications (2)

Publication Number Publication Date
CN110598575A true CN110598575A (en) 2019-12-20
CN110598575B CN110598575B (en) 2023-06-02

Family

ID=68854963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910773607.6A Active CN110598575B (en) 2019-08-21 2019-08-21 Form layout analysis and extraction method and related device

Country Status (1)

Country Link
CN (1) CN110598575B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611883A (en) * 2020-05-07 2020-09-01 北京智通云联科技有限公司 Table layout analysis method, system and equipment based on minimum cell clustering
CN113705576A (en) * 2021-11-01 2021-11-26 江西中业智能科技有限公司 Text recognition method and device, readable storage medium and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014018482A2 (en) * 2012-07-24 2014-01-30 Alibaba Group Holding Ltd Form recognition method and device
CN110147774A (en) * 2019-05-23 2019-08-20 阳光保险集团股份有限公司 Sheet format picture printed page analysis method and computer storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014018482A2 (en) * 2012-07-24 2014-01-30 Alibaba Group Holding Ltd Form recognition method and device
CN110147774A (en) * 2019-05-23 2019-08-20 阳光保险集团股份有限公司 Sheet format picture printed page analysis method and computer storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卜飞宇等: "版面分析中表格与图形的鉴别", 《计算机工程与应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611883A (en) * 2020-05-07 2020-09-01 北京智通云联科技有限公司 Table layout analysis method, system and equipment based on minimum cell clustering
CN111611883B (en) * 2020-05-07 2023-08-15 北京智通云联科技有限公司 Table layout analysis method, system and equipment based on minimum cell clustering
CN113705576A (en) * 2021-11-01 2021-11-26 江西中业智能科技有限公司 Text recognition method and device, readable storage medium and equipment
CN113705576B (en) * 2021-11-01 2022-03-25 江西中业智能科技有限公司 Text recognition method and device, readable storage medium and equipment

Also Published As

Publication number Publication date
CN110598575B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
WO2019237549A1 (en) Verification code recognition method and apparatus, computer device, and storage medium
CN113139445A (en) Table recognition method, apparatus and computer-readable storage medium
CN111259891B (en) Method, device, equipment and medium for identifying identity card in natural scene
CN110378351B (en) Seal identification method and device
CN112507782A (en) Text image recognition method and device
US7110568B2 (en) Segmentation of a postal object digital image by Hough transform
CN111737478A (en) Text detection method, electronic device and computer readable medium
CN108241859A (en) The bearing calibration of car plate and device
CN110598575A (en) Table layout analysis and extraction method and related device
CN108830275A (en) Dot character, the recognition methods of dot matrix digit and device
CN110210467B (en) Formula positioning method of text image, image processing device and storage medium
CN110378337B (en) Visual input method and system for drawing identification information of metal cutting tool
CN113033562A (en) Image processing method, device, equipment and storage medium
CN115410191B (en) Text image recognition method, device, equipment and storage medium
CN111583156B (en) Document image shading removing method and system
CN111401352B (en) Text picture underline identification method, text picture underline identification device, computer equipment and storage medium
CN108230538B (en) Paper money identification method, device, equipment and storage medium
CN114494052A (en) Book counting method and device, computer equipment and storage medium
Santiago et al. Efficient 2× 2 block-based connected components labeling algorithms
CN106846603A (en) A kind of recognition methods of forge or true or paper money and its device
CN112883977A (en) License plate recognition method and device, electronic equipment and storage medium
CN112749599A (en) Image enhancement method and device and server
CN112215783B (en) Image noise point identification method, device, storage medium and equipment
CN115984865B (en) Text recognition method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant