CN117133002A - Table identification method, system, electronic equipment and storage medium - Google Patents

Table identification method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN117133002A
CN117133002A CN202311140761.2A CN202311140761A CN117133002A CN 117133002 A CN117133002 A CN 117133002A CN 202311140761 A CN202311140761 A CN 202311140761A CN 117133002 A CN117133002 A CN 117133002A
Authority
CN
China
Prior art keywords
line
lines
longitudinal
transverse
abscissa
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311140761.2A
Other languages
Chinese (zh)
Inventor
张海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Glodon Co Ltd
Original Assignee
Glodon Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glodon Co Ltd filed Critical Glodon Co Ltd
Priority to CN202311140761.2A priority Critical patent/CN117133002A/en
Publication of CN117133002A publication Critical patent/CN117133002A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The application relates to the technical field of computer aided design and discloses a form identification method, a system, electronic equipment and a storage medium, wherein the form identification method comprises the steps of screening form lines used for constructing a form in lines of a designated drawing area, wherein the form lines comprise transverse lines and longitudinal lines, and each line has an abscissa and an ordinate; constructing an initial table based on the ordinate of the transverse line and the abscissa of the longitudinal line; in the initial table, if the common edge line between adjacent cells is not positioned on the transverse line or the longitudinal line, merging the adjacent cells to obtain a drawing table; in the drawing sheet, the contents in each cell are recognized and outputted in units of cells. The form in the drawing can be automatically identified, and the working efficiency of technicians is improved.

Description

Table identification method, system, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computer aided design, in particular to a form identification method, a form identification system, electronic equipment and a storage medium.
Background
In the field of construction, the identification of drawings is the basis of building engineering calculation or pricing. Of the various requirements for identifying drawings, identifying a form in a drawing is one of the most central requirements.
Currently, for some drawings (such as PDF vector drawings), when a technician uses the drawings to calculate or rate building engineering, a table in the drawings cannot be automatically identified through software, and the contents in the table can only be manually input into the calculation software, so that the efficiency is low.
Therefore, there is a need for a table identification method that can automatically identify the contents of a table.
Disclosure of Invention
In view of the above, embodiments of the present application provide a form recognition method, a form recognition system, an electronic device, and a computer readable storage medium, which can automatically recognize a form in a drawing, and improve the working efficiency of technicians.
In one aspect, the present application provides a method for identifying a table, where the method includes:
screening form lines for constructing a form from lines of a designated drawing area, wherein the form lines comprise transverse lines and longitudinal lines, and each line has an abscissa and an ordinate;
constructing an initial table based on the ordinate of the transverse line and the abscissa of the longitudinal line;
in the initial table, if the common edge line between adjacent cells is not positioned on the transverse line or the longitudinal line, merging the adjacent cells to obtain a drawing table;
in the drawing sheet, the contents in each cell are recognized and outputted in units of cells.
In the technical solutions of some embodiments of the present application, among the lines in the designated drawing area, the transverse lines and the longitudinal lines are obtained by screening. An initial table is constructed based on the ordinate of the transverse line and the abscissa of the longitudinal line. In the initial table, the cells in the initial table are combined based on whether the common edge line between adjacent cells is positioned on a transverse line or a longitudinal line, so as to obtain a drawing table. And identifying and outputting contents in each cell by taking the cell in the drawing table as a unit. Therefore, the table and the table content in the drawing can be automatically identified, a technician does not need to manually input the table content, and the working efficiency of the technician is improved.
In some embodiments, it is determined whether a common edge between adjacent cells is not located at the lateral line or the longitudinal line based on the following method:
and if one or more target points on the common side line are not positioned on the transverse line or the longitudinal line, determining that the common side line is not positioned on the transverse line or the longitudinal line.
Judging whether the common edge is positioned on the transverse line and the longitudinal line or not by the positions of one or more target points on the common edge can simplify the judgment logic.
In some embodiments, the abscissa of each line is represented by the abscissa of the two endpoints of the line, and the ordinate of each line is represented by the ordinate of the two endpoints of the line:
the screening the form lines for constructing the form comprises the following steps:
in the appointed drawing area, if the ordinate of the two endpoints of the line is the same, determining the corresponding line as an alternative transverse line, and if the abscissa of the two endpoints of the line is the same, determining the corresponding line as an alternative longitudinal line;
and removing lines which do not meet preset conditions from the alternative transverse lines and the alternative longitudinal lines to obtain the transverse lines and the longitudinal lines.
The method for screening the lines is in accordance with the storage form of the line coordinates in the computer storage according to whether the abscissa and the ordinate of the two endpoints of the lines are the same, and the method for screening can screen the alternative transverse lines and the alternative longitudinal lines directly according to the data stored in the computer storage. The screening logic is simpler.
In some embodiments, the removing the lines that do not meet the preset condition from the candidate transverse lines and the candidate longitudinal lines includes:
and removing the lines with the length smaller than a length threshold value from the alternative transverse lines and the alternative longitudinal lines.
Removing lines with the length smaller than a length threshold, and on one hand, reducing interference of the lines in subsequent form recognition; on the other hand, the line data volume can be reduced, and then the data processing volume is reduced.
In some embodiments, the removing the lines that do not meet the preset condition from the candidate transverse lines and the candidate longitudinal lines includes:
and removing lines which do not form a ring with other lines from the alternative transverse lines and the alternative longitudinal lines.
The lines which do not form a ring with other lines are removed, so that the lines which are not necessarily the lines of the table can be removed, interference of the lines can be reduced in subsequent table identification, and the accuracy of the table identification is improved. At the same time, the data processing amount can be reduced.
In some embodiments, the culling lines that do not form a loop with other lines includes:
for any one of the alternative transverse line and the alternative longitudinal line, if an intersection point exists between the line and other lines at a non-end point position, breaking the line from the intersection point;
counting the number of lines associated with each line endpoint in the lines after the breaking operation is completed;
and for any line end point, if the number of the lines associated with the line end point is 1, taking the lines associated with the line end point as lines which do not form a ring with other lines, and eliminating the corresponding lines.
In the lines after the breaking operation is executed, the lines which do not form a ring with other lines are screened and removed by counting the number of the lines associated with the endpoints of each line, and the screening mode is more accurate and can prevent omission.
In some embodiments, the content in each cell of the drawing sheet includes text content, the text content being composed of lines;
identifying content in individual cells, comprising:
and printing lines in the cells into pictures, and identifying the pictures to obtain text contents in each cell.
The content in the cell is printed to the picture, the text content in the cell is acquired through a picture identification technology, the dependence on text primitives is reduced, and the application range of the scheme is higher.
In some embodiments, the abscissa and the ordinate of each line are determined based on a first coordinate system;
the printing the lines in the cells into the picture comprises the following steps:
constructing a first bounding box for a cell, and acquiring a line positioned in the first bounding box;
constructing a second bounding box for the line set in the cell bounding box, and constructing a second coordinate system based on the second bounding box;
and converting the coordinates of each line in the cell under the first coordinate system to the second coordinate system, and printing the lines in the cell into the picture based on the coordinates of each line under the second coordinate system.
The coordinates of the lines are converted from the first coordinate system to the second coordinate system, so that implementation of the scheme is facilitated. When the second coordinate system is constructed, the construction is performed based on the bounding box, so that the shape structure of the cell or the line set can be simplified, and the calculated amount is reduced.
In another aspect, the present application further provides a form identification system, including:
the screening module is used for screening form lines for constructing a form in lines of a designated drawing area, wherein the form lines comprise transverse lines and longitudinal lines, and each line has an abscissa and an ordinate;
the initial construction module is used for constructing an initial table based on the ordinate of the transverse line and the abscissa of the longitudinal line;
the merging module is used for merging the adjacent cells in the initial table to obtain a drawing table if the common edge line between the adjacent cells is not positioned on the transverse line or the longitudinal line;
and the content identification module is used for identifying and outputting text contents in each cell by taking the cell as a unit in the drawing table.
In a further aspect the application provides an electronic device comprising a processor and a memory for storing a computer program which, when executed by the processor, implements a method as described above.
In a further aspect the application provides a computer readable storage medium for storing a computer program which, when executed by a processor, implements a method as described above.
Drawings
The features and advantages of the present application will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the application in any way, in which:
FIG. 1 is a flow chart of a method for identifying a form according to an embodiment of the present application;
FIG. 2 shows a schematic diagram of line coordinates provided by one embodiment of the present application;
FIG. 3 shows a schematic view of lateral and longitudinal lines provided by one embodiment of the present application;
FIG. 4 is a schematic diagram of an initial table constructed based on an abscissa and an ordinate, according to an embodiment of the present application;
FIG. 5 shows a schematic diagram of a real form in a drawing provided by one embodiment of the present application;
FIG. 6 illustrates a schematic view of the position of a line in a first coordinate system according to an embodiment of the present application;
FIG. 7 shows a schematic view of an alternative transverse line and an alternative longitudinal line provided by an embodiment of the present application;
FIG. 8 illustrates a block diagram of a form identification system provided by one embodiment of the application;
fig. 9 shows a schematic diagram of an electronic device according to an embodiment of the application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, based on the embodiments of the application, which a person skilled in the art would obtain without making any inventive effort, are within the scope of the application.
Before describing the scheme of the application, related concepts related to the application are described.
1) Vector diagram
Vector diagram, referring to a graph described by a mathematical formula. The difference from the bitmap is that the vector diagram is not composed of pixels, but is composed of basic geometric elements such as line segments, curves, shapes, colors, and the like, and each element in the vector diagram has corresponding coordinate information.
The vector diagram may be created by specialized software such as Adobe Illustrator, coredraw, etc. And, these specialized software also support vector graph derivation DWG vector drawings and PDF vector drawings. The DWG vector drawing differs from the PDF vector drawing in that the DWG vector drawing includes layers and text information (i.e., text primitives), while the PDF drawing does not include layers and text information. In PDF drawings, text information is represented by vector lines.
2) World coordinate system
Refers to a spatial coordinate system in three-dimensional computer graphics. Which may also be referred to as a global coordinate system, inertial coordinate system, or absolute coordinate system. The world coordinate system is a basic mathematical model for describing the position and orientation of a three-dimensional object in space, with a constant coordinate axis and origin.
3) Local coordinate system
Refers to a coordinate system established based on the selected object. In general, the origin of the local coordinate system may be the geometric center or other specific location of the selected object, and the orientation of the coordinate axes may be determined according to the orientation and shape of the selected object. Each selected object may have its own local coordinate system. The relative positional relationship of the local coordinate system to the selected object is unchanged. I.e. as the position of the selected object changes, the coordinate axis orientation, origin, etc. of the local coordinate system also follows the change.
4) Coordinate system transformation
Refers to the process of transforming the coordinates of a selected object in one coordinate system to another coordinate system.
Thus, description of the related concepts is completed.
The form recognition method of the present application may be applied to drawing software or may be applied to an electronic device running the drawing software. Electronic devices include, but are not limited to, tablet computers, notebook computers, servers, and the like. Referring to fig. 1, a flowchart of a table identifying method according to an embodiment of the application is shown. In fig. 1, the table recognition method includes the steps of:
in step S11, among the lines of the designated drawing area, a table line for constructing a table is screened, the table line including a transverse line and a longitudinal line, each line having an abscissa and an ordinate.
The designated drawing area can be an area selected by a technician in the drawing manually according to actual conditions. For example, when a technician needs to meter based on the table data in the area a of the drawing, the area a can be manually selected as the designated drawing area.
A table line may refer to a line used to divide cells in a table. When the lines of the table are screened in the lines of the designated drawing area, the screening can be performed based on the abscissa and the ordinate of each line. In particular, the abscissa and ordinate of each line may be determined based on the first coordinate system. The first coordinate system may be a world coordinate system. The first coordinate system may include an X-axis and a Y-axis. The abscissa of the line may be the X-axis of the line and the ordinate of the line may be the Y-axis of the line. Referring to fig. 2 in combination, a schematic diagram of line coordinates is provided according to an embodiment of the present application. As can be seen from fig. 2, the abscissa of the line lies between 3 and 7 and the ordinate lies between 4 and 5.
The transverse line may refer to a line whose ordinate is a fixed value, i.e. a line parallel to the X-axis of the first coordinate system. The longitudinal line may refer to a line whose abscissa is a fixed value, i.e. a line parallel to the Y-axis of the first coordinate system. Referring to fig. 3 in combination, a schematic diagram of a transverse line and a longitudinal line according to an embodiment of the present application is provided. In fig. 3, for line AB, the abscissa is between 3 and 7, the ordinate is between 4 and 5, and neither the abscissa nor the ordinate is a fixed value, so line AB is not a tabular line; for the line DE, the abscissa is between 5 and 9, the ordinate is 5, and the ordinate is a fixed value (parallel to the X axis), so the line DE can be used as a transverse line; for the line CD, the abscissa is 5, the ordinate is between 3 and 5, and the abscissa is a fixed value (parallel to the Y axis), so the line CD can be used as a longitudinal line.
The transverse lines screened from the designated drawing area can be used as a first line set, and the longitudinal lines screened from the designated drawing area can be used as a second line set.
Step S12, an initial table is constructed based on the ordinate of the transverse line and the abscissa of the longitudinal line.
It is known from the description of step S11 that the ordinate of the transverse line and the abscissa of the longitudinal line are fixed values. From the first set of lines, the ordinate of each transverse line can be extracted, and from the second set of lines, the abscissa of each longitudinal line can be extracted. Based on the extraction, the ordinate and the abscissa may be ordered in the order from large to small in the Y-axis direction of the first coordinate system, and the abscissa may be ordered in the order from large to small in the X-axis direction of the first coordinate system. Based on the ordered abscissas and ordinates, the first position corresponding to the minimum ordinates and abscissas, the second position corresponding to the minimum abscissas and abscissas, the third position corresponding to the maximum abscissas and abscissas, and the fourth position corresponding to the maximum abscissas and abscissas can be used as four corner points of the initial table, and the interval between adjacent abscissas can be used as the side length of a single cell of the initial table to construct and obtain the initial table.
For ease of understanding, the following is by way of example. Assume that the extracted lateral lines have the ordinate of 3, 4,5, 6, and the longitudinal lines have the abscissa of 3,5, 7, 9, 11. Referring in conjunction to fig. 4, a schematic diagram of an initial table constructed based on an abscissa and an ordinate is provided according to an embodiment of the present application. In fig. 4, after the 3, 4,5, 6 are ordered in the Y-axis direction and the 3,5, 7, 9, 11 are ordered in the X-axis direction, the first position corresponding to the minimum ordinate and the minimum abscissa is position a, the second position corresponding to the minimum ordinate and the maximum abscissa is position B, the third position corresponding to the maximum ordinate and the minimum abscissa is position C, and the fourth position corresponding to the maximum ordinate and the maximum abscissa is position D. The initial table shown in fig. 4 can be constructed by taking the position A, B, C, D as four corner points of the initial table and the interval between adjacent ordinate and the interval between adjacent abscissa as the side length of a single cell of the initial table.
Step S13, in the initial table, if the common edge line between the adjacent cells is not located on the transverse line or the longitudinal line, merging the adjacent cells to obtain a drawing table.
Referring to fig. 4 in combination, in constructing the initial table, only the ordinate of the transverse line and the abscissa of the longitudinal line are considered, and the abscissa range of the transverse line and the ordinate range of the longitudinal line are not considered, so that the initial table constructed may be different from the actual table in the drawing. For example, referring to fig. 5, a schematic diagram of a real table in a drawing is provided according to an embodiment of the present application. Based on the real table shown in fig. 5, the transverse line gj, fk, cd, en, and the longitudinal line ge, ab, hl, im, jn can be extracted in step S11. The transverse lines cd are different from the transverse coordinate ranges of other transverse lines, and the longitudinal lines ab are different from the longitudinal coordinate ranges of other longitudinal lines. The initial table shown in fig. 4 can still be constructed in step S12 based on the ordinate of the transverse line and the abscissa of the longitudinal line. It is apparent that the initial table of fig. 4 and the actual table of fig. 5 are different. Therefore, it is necessary to perform further cell consolidation on the cells in the initial table.
In this embodiment, whether adjacent cells merge may be determined based on whether a common edge line between adjacent cells is located at a lateral line or a longitudinal line. For example, referring to fig. 4 and 5 in combination, in fig. 4, the common edge line op of the cell 41 and the cell 42 is not located on any of the lateral lines or the longitudinal lines extracted from fig. 5, so that the cell 41 and the cell 42 need to be combined to obtain a combined cell shown in the upper left corner of fig. 5. Again, taking cell 41 and cell 43 in fig. 4 as an example. The common edge qp of the cell 41 and the cell 43 is located at the lateral line fk extracted from fig. 5, so that the cell 41 and the cell 43 do not need to be merged.
According to the principle, the adjacent cells in the initial table are traversed in turn, so that the drawing table consistent with the real table in the drawing can be obtained.
In the drawing sheet, cells obtained by merging a plurality of adjacent cells in the initial sheet may be regarded as one cell. Such as cell 41 and cell 42 in fig. 4, may be combined into one cell in the upper left corner of fig. 5.
Step S14, in the drawing sheet, the contents in each cell are identified and output by taking the cell as a unit.
In particular, the content in the cells may include, but is not limited to, text content within the cells, fill colors of the cells, and the like. In the case where the drawing is a DWG vector drawing, since the DWG vector drawing includes information such as text primitive and color, text primitive information and color information in each cell may be directly extracted. In the case that the drawing is a PDF vector drawing, since the text content in the PDF vector drawing is formed by combining lines and does not have color information, the text content in each cell can be directly identified according to the combination relationship of the lines.
After identifying the contents in each cell in the drawing sheet, the contents may be output to a designated form. The designated form may be a location in a drawing form to which content is to be output, such as in a form of a calculation software. Therefore, automatic identification of the table contents is realized, and a technician is not required to manually input the contents in the drawing table into the designated window.
In summary, in the technical solutions of some embodiments of the present application, among the lines in the designated drawing area, the transverse lines and the longitudinal lines are obtained by screening. An initial table is constructed based on the ordinate of the transverse line and the abscissa of the longitudinal line. In the initial table, the cells in the initial table are combined based on whether the common edge line between adjacent cells is positioned on a transverse line or a longitudinal line, so as to obtain a drawing table. And identifying and outputting contents in each cell by taking the cell in the drawing table as a unit. Therefore, the table and the table content in the drawing can be automatically identified, a technician does not need to manually input the table content, and the working efficiency of the technician is improved.
The present application is further described below.
In some embodiments, when storing the abscissa and the ordinate of the line in the computer storage, the abscissa of the line may be represented by the abscissa of the two endpoints of the line, and the ordinate of each line may be represented by the ordinate of the two endpoints of the line, that is, only the abscissa and the ordinate of the two endpoints of the line may be stored. The abscissa and ordinate of a line in the first coordinate system may be determined based on the abscissa and ordinate of the two endpoints of the line. Referring to fig. 6 in combination, a schematic diagram of a line position in a first coordinate system is provided according to an embodiment of the present application. Based on the lines shown in fig. 6, the coordinates of the lines may be stored as (3, 5), (7, 4) in the computer storage. Wherein (3, 5) represents the abscissa and ordinate of one of the endpoints of the line and (7, 4) represents the abscissa and ordinate of the other endpoint of the line. Based on the coordinates of these two endpoints, it can be determined that the abscissa of the line lies between 3 and 7 and the ordinate lies between 4 and 5.
Based on the stored form of the line coordinates, the filtering the table line for constructing the table in step S11 may include:
in the appointed drawing area, if the ordinate of the two endpoints of the line are the same, determining the corresponding line as an alternative transverse line, and if the abscissa of the two endpoints of the line are the same, determining the corresponding line as an alternative longitudinal line;
and removing lines which do not meet the preset conditions from the alternative transverse lines and the alternative longitudinal lines to obtain the transverse lines and the longitudinal lines.
For example, the coordinates of the two end points of the line a are (3, 1) and (5, 7), the coordinates of the two end points of the line B are (2, 4) and (2, 8), and the coordinates of the two end points of the line C are (3, 5) and (4, 5). In the line A, B, C, the abscissas and the ordinates of the two endpoints of the line A are different, and an alternative transverse line and an alternative longitudinal line are not required; the abscissa of the two endpoints of the line B is the same and can be used as an alternative longitudinal line; the ordinate of the two endpoints of the line C is the same, and can be used as an alternative transverse line. It will be appreciated that the ordinate of the two endpoints of the line is the same, i.e. the line is parallel to the X-axis of the world coordinate system, and therefore the line may be an alternative transverse line; the abscissa of the two endpoints of the line is the same, i.e. the line is parallel to the Y-axis of the world coordinate system, and therefore the line may be an alternative longitudinal line. And further screening the alternative transverse lines and the alternative longitudinal lines according to preset conditions to obtain the transverse lines and the longitudinal lines. For the preset conditions, reference may be made to the following related description, which is not repeated here. The method for screening the lines is in accordance with the storage form of the line coordinates in the computer storage according to whether the abscissa and the ordinate of the two endpoints of the lines are the same, and the method for screening can screen the alternative transverse lines and the alternative longitudinal lines directly according to the data stored in the computer storage. The screening logic is simpler.
In some embodiments, the removing the lines that do not meet the preset condition from the candidate transverse lines and the candidate longitudinal lines includes:
and removing the lines with the length smaller than the length threshold value from the alternative transverse lines and the alternative longitudinal lines.
The length threshold may be determined according to the actual situation. Lines less than the length threshold may be determined to be non-tabular lines. In this embodiment, the length threshold is one tenth of the height of the text in the drawing.
Removing lines with the length smaller than a length threshold, and on one hand, reducing interference of the lines in subsequent form recognition; on the other hand, the line data volume can be reduced, and then the data processing volume is reduced.
In some embodiments, eliminating the line which does not meet the preset condition from the alternative transverse line and the alternative longitudinal line comprises:
among the alternative transverse lines and the alternative longitudinal lines, lines which do not form a loop with other lines are removed.
Wherein, the ring can refer to a closed area formed by a plurality of lines. If one line forms a closed region with the other line, the line is considered to form a loop with the other line. It will be appreciated that if one line is a form line, that line must form a loop with other lines to form a form. If one line does not form a loop with other lines, such as an isolated line in the drawing that is not connected to any other line, it is certain that it is not a form line. Therefore, the lines which do not form a ring with other lines are removed, the lines which are not necessarily the lines of the table can be removed, and further, the interference of the lines can be reduced in the follow-up table identification, and the accuracy of the table identification is improved. At the same time, the data processing amount can be reduced.
In some embodiments, culling lines that do not form loops with other lines includes:
for any one of the alternative transverse line and the alternative longitudinal line, if an intersection point exists between the line and other lines at a non-end point position, breaking the line from the intersection point;
counting the number of lines associated with each line endpoint in the lines after the breaking operation is completed;
and for any line end point, if the number of the lines associated with the line end point is 1, taking the lines associated with the line end point as lines which do not form a ring with other lines, and eliminating the corresponding lines.
To facilitate understanding. Referring to fig. 7, a schematic diagram of an alternative transverse line and an alternative longitudinal line according to an embodiment of the present application is provided. In fig. 7, if an intersection point exists between the alternative transverse line ab and the alternative longitudinal line ce at the non-end point position c, the alternative transverse line ab is broken from the non-end point position c, and a line ac and a line cb are obtained. Similarly, if an intersection point exists between the alternative transverse line df and the alternative longitudinal line ce at the non-end point position e, the alternative transverse line df is broken from the non-end point position e, and a line de and a line ef are obtained.
The line associated with the line endpoint refers to a line taking the line endpoint as a starting point or an ending point. And in the lines after the breaking operation is completed, when the number of the lines associated with each line endpoint is counted, a mapping table of the line endpoints and the lines associated with the line endpoints can be established, and based on the mapping table, the number of the lines associated with each line endpoint can be obtained. Such as in fig. 7, the lines may include line ad, ac, cb, de, ef, bf, gh, ce after the breaking operation is completed. Taking line endpoints a, c, and g as examples, the mapping table of the line endpoints and the lines associated with the line endpoints may be shown in table 1.
Table 1 mapping table
Line end point Correlation line
a Line ac, line ad
c Line ac, line ce, line cb
g Line gh
Based on the mapping table, it can be determined that the number of lines associated with the line end point a is 2, the number of lines associated with the line end point c is 3, and the number of lines associated with the line end point g is 1.
It will be appreciated that if one line needs to form a loop with the other lines, then the minimum number of lines associated with each end of that line should be 2. On this basis, if the number of the lines associated with one line end point is 1, it can be determined that the line associated with the line end point and other lines do not necessarily form a loop, for example, in fig. 7, the line end point g is only associated with the line gh, and the line gh is an isolated line and needs to be removed.
In the above embodiment, among the lines after the breaking operation is performed, the lines that do not form a loop with other lines are screened and removed by counting the number of the lines associated with each line end point, and this screening manner is more accurate, and omission can be prevented.
It should be noted that, after line screening based on the breaking operation, it is necessary to combine collinear lines into one line among broken lines. Thus, the broken lines are recombined into complete lines to obtain transverse lines and longitudinal lines. Wherein, between different lines, if the directions of the lines are the same (i.e. both transverse or longitudinal) and partially overlap, the lines may be considered to be collinear. For example, the coordinates of line a are (4, 5), (4, 8), the coordinates of line B are (4, 7), (4, 10), the lines a and B are longitudinal lines, and the lines a and B are partially coincident, then the lines a and B are collinear.
Therefore, how to eliminate the lines which do not meet the preset conditions from the alternative transverse lines and the alternative longitudinal lines to obtain the transverse lines and the longitudinal lines is described.
In some embodiments, when merging cells in step S13, it may be determined whether a common edge line between adjacent cells is not located in a lateral line or a longitudinal line based on the following method:
if one or more target points on the common edge are not located on the transverse line or the longitudinal line, determining that the common edge is not located on the transverse line or the longitudinal line.
The target point may be any point on the common edge other than the end point. Such as the midpoint of the common edge, the point at which the common edge is divided by a 4:3 ratio, etc.
In this embodiment, the target point is the midpoint of the common edge. I.e., the midpoint of the common edge is not located at the transverse or longitudinal line, it is determined that the common edge is not located at the transverse or longitudinal line. Judging whether the common edge is positioned on the transverse line and the longitudinal line or not by the positions of one or more target points on the common edge can simplify the judgment logic.
In some embodiments, in the case where the drawing is a PDF vector drawing, the identifying the content in each cell in step S14 includes:
and printing lines in the cells into the pictures, and identifying the pictures to obtain text contents in each cell.
Specifically, the functions of elipse, line, filPoly, etc. in the openCV library can be used. The lines in each cell are printed as a picture respectively, and then the text content in each cell can be obtained by identifying each picture by using OCR (Optical Character Recognition ) technology. OCR technology is a conventional technology that should be known to those skilled in the art, and the present application is not described here in detail. The content in the cell is printed to the picture, the text content in the cell is acquired through a picture identification technology, the dependence on text primitives is reduced, and the application range of the scheme is higher.
In some embodiments, it is contemplated that when printing lines in a cell into a picture, it is necessary to convert the coordinates of the lines to a local coordinate system for printing. Therefore, for any cell, the printing the lines in the cell into the picture includes:
constructing a first bounding box for the cell, and acquiring lines positioned in the first bounding box;
constructing a second bounding box for the line set in the cell bounding box, and constructing a second coordinate system based on the second bounding box;
and converting the coordinates of each line in the cell under the first coordinate system into the second coordinate system, and printing the lines in the cell into the picture based on the coordinates of each line under the second coordinate system.
Here, the second coordinate system is a local coordinate system. The origin of the second coordinate system may be an upper left vertex of the second bounding box, the X-axis may have the same orientation as the X-axis of the first coordinate system, and the Y-axis may be oriented in a direction opposite to the Y-axis direction of the first coordinate system.
Specifically, when the second bounding box is built for the line set in the cell bounding box, the line bounding boxes can be built for the lines in the cell bounding box respectively, and then the line bounding boxes are combined to obtain the second bounding box.
The coordinates of the lines are converted from the first coordinate system to the second coordinate system, so that implementation of the scheme is facilitated. When the second coordinate system is constructed, the construction is performed based on the bounding box, so that the shape structure of the cell or the line set can be simplified, and the calculated amount is reduced.
Referring to fig. 8, a schematic diagram of a table recognition system according to an embodiment of the application is shown. In fig. 8, the form recognition system includes:
the screening module is used for screening form lines for constructing a form in lines of a designated drawing area, wherein the form lines comprise transverse lines and longitudinal lines, and each line has an abscissa and an ordinate;
the initial construction module is used for constructing an initial table based on the ordinate of the transverse line and the abscissa of the longitudinal line;
the merging module is used for merging adjacent cells in the initial table to obtain a drawing table if the common edge between the adjacent cells is not positioned on the transverse line or the longitudinal line;
and the content identification module is used for identifying and outputting text contents in each cell by taking the cell as a unit in the drawing table.
In some embodiments, the merging module is specifically configured to:
and if one or more target points on the common side line are not positioned on the transverse line or the longitudinal line, determining that the common side line is not positioned on the transverse line or the longitudinal line.
In some embodiments, the abscissa of each line is represented by the abscissa of the two endpoints of the line, and the ordinate of each line is represented by the ordinate of the two endpoints of the line: the screening module is specifically used for:
in the appointed drawing area, if the ordinate of the two endpoints of the line is the same, determining the corresponding line as an alternative transverse line, and if the abscissa of the two endpoints of the line is the same, determining the corresponding line as an alternative longitudinal line;
and removing lines which do not meet preset conditions from the alternative transverse lines and the alternative longitudinal lines to obtain the transverse lines and the longitudinal lines.
In some embodiments, the screening module is specifically configured to:
and removing the lines with the length smaller than a length threshold value from the alternative transverse lines and the alternative longitudinal lines.
In some embodiments, the screening module is specifically configured to:
and removing lines which do not form a ring with other lines from the alternative transverse lines and the alternative longitudinal lines.
In some embodiments, the screening module is specifically configured to:
for any one of the alternative transverse line and the alternative longitudinal line, if an intersection point exists between the line and other lines at a non-end point position, breaking the line from the intersection point;
counting the number of lines associated with each line endpoint in the lines after the breaking operation is completed;
and for any line end point, if the number of the lines associated with the line end point is 1, taking the lines associated with the line end point as lines which do not form a ring with other lines, and eliminating the corresponding lines.
In some embodiments, the content in each cell of the drawing sheet includes text content, the text content being composed of lines; the content identification module is specifically used for:
and printing lines in the cells into pictures, and identifying the pictures to obtain text contents in each cell.
In some embodiments, the abscissa and the ordinate of each line are determined based on a first coordinate system; the content identification module is specifically used for:
constructing a first bounding box for a cell, and acquiring a line positioned in the first bounding box;
constructing a second bounding box for the line set in the cell bounding box, and constructing a second coordinate system based on the second bounding box;
and converting the coordinates of each line in the cell under the first coordinate system to the second coordinate system, and printing the lines in the cell into the picture based on the coordinates of each line under the second coordinate system.
Referring to fig. 9, a schematic diagram of an electronic device according to an embodiment of the application is provided. The electronic device comprises a processor and a memory for storing a computer program which, when executed by the processor, implements the method described above.
The processor may be a central processing unit (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules, corresponding to the methods in embodiments of the present application. The processor executes various functional applications of the processor and data processing, i.e., implements the methods of the method embodiments described above, by running non-transitory software programs, instructions, and modules stored in memory.
The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present application also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above method.
Although embodiments of the present application have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the application, and such modifications and variations fall within the scope of the application as defined by the appended claims.

Claims (11)

1. A method of table identification, the method comprising:
screening form lines for constructing a form from lines of a designated drawing area, wherein the form lines comprise transverse lines and longitudinal lines, and each line has an abscissa and an ordinate;
constructing an initial table based on the ordinate of the transverse line and the abscissa of the longitudinal line;
in the initial table, if the common edge line between adjacent cells is not positioned on the transverse line or the longitudinal line, merging the adjacent cells to obtain a drawing table;
in the drawing sheet, the contents in each cell are recognized and outputted in units of cells.
2. The method of claim 1, wherein determining whether a common edge between adjacent cells is not located at the lateral line or the longitudinal line is based on:
and if one or more target points on the common side line are not positioned on the transverse line or the longitudinal line, determining that the common side line is not positioned on the transverse line or the longitudinal line.
3. The method of claim 1, wherein the abscissa of each line is represented by the abscissa of two endpoints of the line, and the ordinate of each line is represented by the ordinate of two endpoints of the line:
the screening the form lines for constructing the form comprises the following steps:
in the appointed drawing area, if the ordinate of the two endpoints of the line is the same, determining the corresponding line as an alternative transverse line, and if the abscissa of the two endpoints of the line is the same, determining the corresponding line as an alternative longitudinal line;
and removing lines which do not meet preset conditions from the alternative transverse lines and the alternative longitudinal lines to obtain the transverse lines and the longitudinal lines.
4. The method of claim 3, wherein the culling lines that do not meet a preset condition from the candidate transverse lines and the candidate longitudinal lines comprises:
and removing the lines with the length smaller than a length threshold value from the alternative transverse lines and the alternative longitudinal lines.
5. The method of claim 3, wherein the culling lines that do not meet a preset condition from the candidate transverse lines and the candidate longitudinal lines comprises:
and removing lines which do not form a ring with other lines from the alternative transverse lines and the alternative longitudinal lines.
6. The method of claim 5, wherein the culling lines that do not form a loop with other lines comprises:
for any one of the alternative transverse line and the alternative longitudinal line, if an intersection point exists between the line and other lines at a non-end point position, breaking the line from the intersection point;
counting the number of lines associated with each line endpoint in the lines after the breaking operation is completed;
and for any line end point, if the number of the lines associated with the line end point is 1, taking the lines associated with the line end point as lines which do not form a ring with other lines, and eliminating the corresponding lines.
7. The method of claim 1, wherein the content in each cell of the drawing sheet comprises text content, the text content being composed of lines;
identifying content in individual cells, comprising:
and printing lines in the cells into pictures, and identifying the pictures to obtain text contents in each cell.
8. The method of claim 7, wherein the abscissa and the ordinate of each line are determined based on a first coordinate system;
the printing the lines in the cells into the picture comprises the following steps:
constructing a first bounding box for a cell, and acquiring a line positioned in the first bounding box;
constructing a second bounding box for the line set in the cell bounding box, and constructing a second coordinate system based on the second bounding box;
and converting the coordinates of each line in the cell under the first coordinate system to the second coordinate system, and printing the lines in the cell into the picture based on the coordinates of each line under the second coordinate system.
9. A form identification system, the system comprising:
the screening module is used for screening form lines for constructing a form in lines of a designated drawing area, wherein the form lines comprise transverse lines and longitudinal lines, and each line has an abscissa and an ordinate;
the initial construction module is used for constructing an initial table based on the ordinate of the transverse line and the abscissa of the longitudinal line;
the merging module is used for merging the adjacent cells in the initial table to obtain a drawing table if the common edge line between the adjacent cells is not positioned on the transverse line or the longitudinal line;
and the content identification module is used for identifying and outputting text contents in each cell by taking the cell as a unit in the drawing table.
10. A computer readable storage medium for storing a computer program which, when executed by a processor, implements the method of any one of claims 1 to 8.
11. An electronic device comprising a processor and a memory for storing a computer program which, when executed by the processor, implements the method of any of claims 1 to 8.
CN202311140761.2A 2023-09-05 2023-09-05 Table identification method, system, electronic equipment and storage medium Pending CN117133002A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311140761.2A CN117133002A (en) 2023-09-05 2023-09-05 Table identification method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311140761.2A CN117133002A (en) 2023-09-05 2023-09-05 Table identification method, system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117133002A true CN117133002A (en) 2023-11-28

Family

ID=88852586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311140761.2A Pending CN117133002A (en) 2023-09-05 2023-09-05 Table identification method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117133002A (en)

Similar Documents

Publication Publication Date Title
CN110008809B (en) Method and device for acquiring form data and server
CN113012210B (en) Method and device for generating depth map, electronic equipment and storage medium
CN108038897B (en) Shadow map generation method and device
CN114550177B (en) Image processing method, text recognition method and device
CN103914876A (en) Method and apparatus for displaying video on 3D map
CN112634343A (en) Training method of image depth estimation model and processing method of image depth information
US20230068025A1 (en) Method and apparatus for generating road annotation, device and storage medium
CN114429637B (en) Document classification method, device, equipment and storage medium
CN115861400B (en) Target object detection method, training device and electronic equipment
KR20210040305A (en) Method and apparatus for generating images
CN114359932A (en) Text detection method, text recognition method and text recognition device
CN114283343A (en) Map updating method, training method and equipment based on remote sensing satellite image
CN112507938A (en) Geometric feature calculation method, geometric feature recognition method and geometric feature recognition device for text primitives
CN116030103B (en) Method, device, apparatus and medium for determining masonry quality
CN117133002A (en) Table identification method, system, electronic equipment and storage medium
CN116721230A (en) Method, device, equipment and storage medium for constructing three-dimensional live-action model
CN114511862B (en) Form identification method and device and electronic equipment
CN115187995B (en) Document correction method, device, electronic equipment and storage medium
CN113361371B (en) Road extraction method, device, equipment and storage medium
CN111256712B (en) Map optimization method and device and robot
CN114723796A (en) Three-dimensional point cloud generation method and device and electronic equipment
JP3367506B2 (en) Image processing apparatus and image processing method
CN116612487B (en) Table identification method and device, electronic equipment and storage medium
CN116168442B (en) Sample image generation method, model training method and target detection method
EP4024348A2 (en) Method and device for determining boundary points of bottom surface of vehicle, roadside device and cloud control platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination