CN111753763A - Method and device for identifying table content of construction drawing and computer equipment - Google Patents
Method and device for identifying table content of construction drawing and computer equipment Download PDFInfo
- Publication number
- CN111753763A CN111753763A CN202010599424.XA CN202010599424A CN111753763A CN 111753763 A CN111753763 A CN 111753763A CN 202010599424 A CN202010599424 A CN 202010599424A CN 111753763 A CN111753763 A CN 111753763A
- Authority
- CN
- China
- Prior art keywords
- identified
- sample
- sample table
- construction drawing
- identifying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000010276 construction Methods 0.000 title claims abstract description 63
- 238000007637 random forest analysis Methods 0.000 claims description 35
- 238000012549 training Methods 0.000 claims description 22
- 238000012937 correction Methods 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 5
- 239000003086 colorant Substances 0.000 claims description 4
- 238000010586 diagram Methods 0.000 abstract description 4
- 238000010801 machine learning Methods 0.000 description 27
- 230000008569 process Effects 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000008676 import Effects 0.000 description 3
- 238000003066 decision tree Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 229910000831 Steel Inorganic materials 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Geometry (AREA)
- Computing Systems (AREA)
- Computer Graphics (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method, a device and computer equipment for identifying table contents of construction drawings. The method comprises the steps of determining a form to be identified on a construction drawing; extracting feature data of the table to be recognized, wherein the feature data comprise structural features and character features of the table to be recognized; matching a sample table corresponding to the table to be identified in a preset sample table library according to the characteristic data; a table data set is determined from the table structure of the sample table. By the invention, the identification efficiency of the table on the architectural diagram can be prompted.
Description
Technical Field
The invention relates to the technical field of drawing identification, in particular to a method and a device for identifying table contents of construction drawings and computer equipment.
Background
For the construction field, including the civil engineering, steel structure, installation and other professional fields, corresponding structure models are often required to be established in various stages of engineering budget, bidding, implementation and the like. In the prior art, in the modeling process, the contents of a building drawing table need to be read manually to obtain structural data for modeling, and for some large-scale projects, the data volume of the structural data is large, so that the workload of manual reading is large, the cost is high, the information of the building drawing is complex, and certain interference is caused to the manual reading process. For example, when manually reading table contents of a floor table to obtain structural data, a floor needs to be determined according to a table structure of the floor table on a CAD drawing, and concrete number data of members and members such as frame beams, wall columns, and reinforcing beams in the floor table are searched, so that workload is large, and meanwhile, people who need to read the table have certain understanding on drawing and expressing meanings of CAD legends.
Therefore, how to quickly identify the contents of the architectural drawing form becomes a technical problem to be solved in the field.
Disclosure of Invention
The invention aims to provide a method, a device and computer equipment for identifying table contents of construction drawings, which are used for solving the technical problems in the prior art.
In one aspect, the present invention provides a method for identifying the contents of a form of a construction drawing.
The method for identifying the table content of the construction drawing comprises the following steps: determining a form to be identified on the construction drawing; extracting feature data of the table to be recognized, wherein the feature data comprise structural features and character features of the table to be recognized; matching a sample table corresponding to the table to be identified in a preset sample table library according to the characteristic data; a table data set is determined from a table structure of the sample table.
Further, the step of extracting the structural features of the table to be recognized comprises: acquiring line primitives in the table to be identified; acquiring the end points of the line primitives and the intersection points of two or more line primitives as the characteristic points of the table to be identified; and extracting the characteristics of the characteristic points as the structural characteristics, wherein the characteristics of the characteristic points comprise the number, angles, line types, colors and/or statistical characteristics of the characteristic points in the table to be recognized of the line primitives.
Further, the step of extracting the structural features of the table to be recognized comprises: acquiring line primitives on the building drawing; and calculating the relation characteristics of the line graphic elements in the table to be identified and all the line graphic elements on the building drawing as the structural characteristics.
Further, the step of extracting the character features of the table to be recognized comprises: acquiring a text primitive in the table to be identified; and performing word segmentation and filtering on the text primitive to obtain the text characteristic.
Further, in a preset sample table library, the step of matching the sample table corresponding to the table to be identified according to the feature data includes: inputting the characteristic data of the table to be recognized into a trained random forest model, wherein the trained random forest model outputs the similarity between the table to be recognized and each sample table; and determining the sample table with the maximum similarity to the table to be identified as the sample table corresponding to the table to be identified.
Further, before inputting the feature data of the table to be recognized into the trained random forest model, the method further comprises: constructing an initial random forest model; obtaining the sample table in the sample table library as a training table; and taking the feature data of the training table as the input of the initial random forest model, taking the similarity between the training table and the sample table in the sample table library as the output of the initial random forest model, and training the initial random forest model to obtain the trained random forest model.
Further, after the step of determining a form data set from the table structure of the sample form, the method of identifying construction drawing form content further comprises: acquiring correction information of the table data set; and according to the correction information, constructing the sample table, and updating the preset sample table library.
In another aspect, the present invention provides an apparatus for identifying the contents of a form of construction drawing.
The device for identifying the contents of the construction drawing form comprises the following components: the determining module is used for determining a table to be identified on the construction drawing; the extraction module is used for extracting feature data of the table to be identified, wherein the feature data comprises structural features and character features of the table to be identified; the matching module is used for matching the sample table corresponding to the table to be identified in a preset sample table library according to the characteristic data; an identification module to determine a table data set from a table structure of the sample table.
In another aspect, to achieve the above object, the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the steps of the method are implemented.
In a further aspect, to achieve the above object, the present invention further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.
The method, the device and the computer equipment for identifying the table content of the construction drawing, provided by the invention, are characterized in that a sample table library is preset, the sample table library comprises a plurality of sample tables with known table structures, for the table to be identified on the construction drawing, the characteristic data of the table to be identified is firstly extracted, then the sample table corresponding to the table to be identified is matched according to the extracted characteristic data, and finally, a table data set can be determined according to the table structure of the sample table. By the aid of the method and the device, automatic identification of the construction drawing forms can be realized, and compared with manual identification in the prior art, the method and the device can save manual operation cost, shorten identification time and improve identification accuracy.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a method for identifying the contents of a form of construction drawing according to an embodiment of the present invention;
FIG. 2 is a block diagram of an apparatus for identifying the contents of a form of construction drawing according to a second embodiment of the present invention;
fig. 3 is a hardware structure diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the problem of large workload when manually identifying the table content of the construction drawing in the prior art, the inventor researches actual drawing tables in the construction field to find that similar tables often have basically similar table structures and can determine the table content on the premise of knowing the table structures and characters in the tables, and based on the results, the invention provides a method, a device, computer equipment and a readable storage medium for identifying the table content of the construction drawing. The content in the form to be identified is identified, the automation of the identification of the content of the form of the construction drawing is realized, compared with manual identification, the manual operation cost can be saved, the identification time is shortened, and the identification accuracy is improved.
The detailed description of the method, the device, the computer equipment and the readable storage medium for identifying the contents of the construction drawing table provided by the invention is described in detail below.
Example one
Specifically, fig. 1 is a flowchart of a method for identifying a table content of a construction drawing according to an embodiment of the present invention, and as shown in fig. 1, the method for identifying a table content of a construction drawing according to an embodiment of the present invention includes steps S101 to S104 as follows.
Step S101: and determining a form to be identified on the construction drawing.
Specifically, in an embodiment, a user may import a construction drawing to be identified into a software system for executing the method of the embodiment, after the drawing is displayed by the system, the user may select a form to be identified through a frame selection operation or a fold line selection operation according to a service requirement, and the system determines the form to be identified based on an area selected by the user.
Alternatively, in an embodiment, the user may import the drawing to be calculated into a software system executing the method of the embodiment, and input parameters, such as the position coordinates of the position area and the length and width of the area, which may define the position area of the form to be recognized on the drawing, in the parameter receiving window, and the system determines the form to be recognized based on the parameters input by the user.
Alternatively, the table to be recognized may be determined based on other operations or instructions, which is not limited in this application.
Step S102: and extracting the characteristic data of the table to be identified.
Specifically, the feature data of the table to be identified may include structural features of the table, literal features of the table, and the like, where the structural features refer to features capable of reflecting the structure of the table, such as the size, the number of rows, the number of columns, the number of lines constituting the table, and the like; the text features refer to features extracted from text in a table, such as text included in the table, whether the table includes preset keywords, repetition degree of contents in the table, and the like, and feature data of the table to be identified is constructed by using part or all of the features.
Step S103: and matching the sample table corresponding to the table to be identified according to the characteristic data in a preset sample table library.
In step S103, a sample table library is preset, where the sample table library includes a plurality of sample tables, and a sample table corresponding to the table to be identified, that is, a sample table with a higher similarity to the table to be identified, is matched in the preset sample table library according to the feature data of the table to be identified.
Optionally, in an embodiment, the sample table library stores data corresponding to the sample table, specifically includes feature data of the sample table, and determines similarity between the table to be identified and the sample table by comparing the feature data of the table to be identified and the feature data of the sample table.
Or in another embodiment, a machine learning model is set, the feature data corresponding to the sample table is used as the input of the machine learning model, the table structure of the sample table is used as the output of the machine learning model, and the machine learning model is trained to determine the structure and/or parameters of the machine learning model, so that the trained machine learning model is obtained; in this step, after the feature data of the table to be recognized is input to the trained machine learning model, the machine learning model may output a predicted table structure, and in a preset sample table library, the sample table having the table structure is the sample table corresponding to the table to be recognized.
Alternatively, the sample table corresponding to the table to be identified may also be determined in other manners, which is not limited herein.
Step S104: a table data set is determined from the table structure of the sample table.
Specifically, the table structure refers to a format of a table and a frame structure of contents, that is, which information is set in each row (column) in a table, so that, on the premise that the table structure and the text elements in the table are known, the contents in the table can be identified, for example, for a floor table, the table structure is: the first column sets the floor, the second column sets the floor bottom elevation, the third column sets the floor height, the character primitives of a certain row are 5, 11.6 and 2.9 in sequence, and then the content of a row in the floor table is as follows: the floor is 5, the floor bottom elevation is 11.6, and the floor height is 2.9.
The sample table is used as a table with high similarity to the table to be identified, and the sample table and the table to be identified have high similarity or the same table structure, so when identifying the content in the table to be identified, the identification can be performed by referring to the table structure of the sample table, specifically, after identifying the content of the table, the content of the table can be output in the form of a table data set, optionally, the table data set comprises a plurality of data records, each data record comprises a plurality of fields, each field is stored in the form of a key value pair, the key is a field name, and the value is a value corresponding to the field. In displaying a tabular data set, one data record is displayed per row and one field is displayed per column. For example, the table to be identified is a floor table, a table data set of the floor table includes a plurality of data records corresponding to floors, that is, each row shows a data record of one floor, and each data record includes fields, specifically, a concrete number of a wall stud, a wall beam, a frame stud, a frame beam, a shear wall, and a slab, and a floor identifier, a floor elevation, a floor height, and the like, so that the finally displayed floor table includes N (N is the number of floors of a floor) rows M (M is the number of fields included in each data record, for example, the above-mentioned data record includes 9 fields, and then M is 9).
In the method for identifying the table content of the construction drawing, a sample table library is preset, the sample table library includes a plurality of sample tables with known table structures, for the table to be identified on the construction drawing, the feature data of the table to be identified is firstly extracted, then the sample table corresponding to the table to be identified is matched according to the extracted feature data, and finally, the table data set can be determined according to the table structures of the sample tables to identify the content in the table to be identified. By adopting the method for identifying the contents of the construction drawing form, the automatic identification of the construction drawing form can be realized, compared with the manual identification in the prior art, the manual operation cost can be saved, the identification time can be shortened, and the identification accuracy can be improved.
Optionally, in an embodiment, when the step S102 extracts the feature data of the table to be recognized, the method specifically includes: extracting the structural features of the table to be identified; and extracting character features of the table to be recognized.
By adopting the method for identifying the table content of the construction drawing, the characteristic data of the table to be identified is extracted from the structural characteristics and the character characteristics of the table, so that the characteristic data can comprehensively reflect the characteristics of the table, the subsequent matching of the sample table by utilizing the characteristic data is facilitated, the matching accuracy of the sample table is improved, and the identification accuracy of the table content is further improved.
The method further comprises the following three extraction modes aiming at the structural features of the table to be identified, wherein the three extraction modes can be combined with each other.
Firstly, the step of extracting the structural features of the table to be identified comprises: acquiring line primitives in a table to be identified; acquiring end points of line primitives and intersection points of two or more line primitives as characteristic points of the table to be identified; and extracting the characteristics of the characteristic points as structural characteristics, wherein the characteristics of the characteristic points comprise the number, the angle, the line type, the color and/or the statistical characteristics of the characteristic points in the table to be recognized of the characteristic points.
Specifically, for a drawing in a format such as CAD, a table includes line primitives and text primitives, and when extracting structural features of the table to be recognized, line primitives of the table to be recognized are first obtained, and then end points of the line primitives and intersection points of the line primitives can be further obtained, where one line primitive has two end points, each intersection point may be an intersection point where two line primitives intersect or an intersection point where two or more line primitives intersect, and the end points and the intersection points constitute feature points of the table to be recognized, optionally, deduplication is performed after determining the end points and the intersection points, and the end points and the intersection points after deduplication are used as the feature points. After the characteristic points are determined, extracting the characteristics of the characteristic points as the structural characteristics of the table to be recognized, wherein the characteristics of the characteristic points comprise the number, the angle, the line type, the color and/or the statistical characteristics of the characteristic points in the table to be recognized.
The number of line primitives at feature points is further explained as follows: one characteristic point is an endpoint of the line graphic elements, and when the endpoint is not an intersection point, the number of the line graphic elements at the characteristic point is 1; if a feature point is the intersection point of two line primitives, the number of the line primitives at the feature point is 2; if another characteristic point is the intersection point of three line primitives, the number of the line primitives at the characteristic point is 3, and so on.
The angle of the line primitive at the feature point includes the angle of the line primitive at the feature point, and the angle of the line primitive refers to an included angle between the line primitive and a 0-degree vector, where the 0-degree vector may be a vector extending horizontally to the right, the angle of the line primitive extending horizontally to the left is 180 degrees, the angle of the line primitive extending vertically upward is 90 degrees, and the angle of the line primitive extending vertically downward is 270 degrees.
The line type of the line primitive at the feature point includes the line type of the line primitive at the feature point, and the line type may be a solid line, a dotted line, a dot-dash line, and the like.
The color of the line primitive at the feature point includes the color of the line primitive at the feature point, and the color of the line primitive may be represented by an RGB color value.
The statistical characteristics of the feature points in the table to be recognized refer to the statistical characteristics of the feature points in the same category in the table to be recognized, and the category dimension of the feature points may be a dimension on any feature of the feature points. Further optionally, the step of calculating the statistical characteristics of the feature points in the table to be identified includes: counting the number of all the feature points in the table to be identified to obtain the total number of the feature points; counting the number of characteristic points of which the number of line primitives at the characteristic points in the table to be identified is N to obtain the number of the Nth characteristic points, wherein N is an integer greater than or equal to 1; and calculating the ratio of the Nth feature point number to the total number of the feature points to obtain the Nth ratio as the structural feature.
For the forms on drawings such as CAD, characteristic association relations such as line types and colors between line drawing elements and position relations between line drawing elements can reflect the form structures of the forms, and the method for identifying the form contents of the construction drawing provided by this embodiment is adopted, end points and intersection points of line drawing elements in the forms to be identified are taken as feature points, and the characteristic association relations and the position relations between line drawing elements at the feature points are taken as the structural characteristics of the forms to be identified, so that the extracted structural characteristics can reflect the characteristic association relations and the position relations between line drawing elements at the same time, and the form structures of the forms to be identified can be better reflected, which is beneficial to matching the sample forms by using the feature data subsequently, and the accuracy of matching the sample forms is improved, and further the accuracy of identifying the form contents is improved.
Secondly, the step of extracting the structural features of the table to be recognized comprises the following steps: and determining the table row number and the table column number of the table to be identified.
Specifically, by extracting line drawing elements of the table and calculating the intersection points of the line drawing elements in the table, the table style and the coordinate information of each unit cell are obtained, and the number of rows and the number of columns of the table are further determined.
For a table, a row (or a column) of the table is usually used as a unit of the table, and accordingly, the column (or the row) of the table becomes a composition structure of the unit, so that the number of rows and the number of columns of the table can also reflect the table structure of the table.
Thirdly, the step of extracting the structural features of the table to be recognized includes: acquiring line primitives on a building drawing; and calculating the relation characteristics of the line graphic elements in the table to be identified and all the line graphic elements on the building drawing as the structural characteristics.
Specifically, the relationship characteristics between the line primitives in the table to be identified and all the line primitives on the building drawing may include the ratio of the line primitives in the table to be identified to all the line primitives on the building drawing, the ratio of the line primitive length mean value in the table to be identified to all the line primitive length mean values on the building drawing, and the like.
By adopting the method for identifying the table content of the building drawing, provided by the embodiment, the relationship characteristics of the line primitives in the table to be identified and all the line primitives on the building drawing are used as the structural characteristics of the table to be identified, so that the characteristics of the table to be identified on the building drawing are reflected, the table matching dimensionality is increased, the sample table matching accuracy is improved, and the table content identification accuracy is further improved.
The step of extracting the character features of the table to be recognized comprises the following steps of: acquiring a text primitive in a table to be identified; and performing word segmentation and filtering on the character primitives to obtain character characteristics.
Specifically, firstly, all the text primitives in the table to be recognized are obtained, and the text primitives are segmented according to the semantics thereof, wherein when the text primitives are numbers, the numbers in one text primitive can be one word, and when the text primitives are characters, the words can be segmented according to the text semantics. And further filtering after obtaining the word segmentation result, filtering meaningless words, synonyms and repeated words, and obtaining words which are character characteristics after filtering.
By adopting the method for identifying the table contents of the construction drawings, the character features are constructed through the character primitives, and the character primitives are subjected to word segmentation and filtering, so that the dimensionality of the characteristic data of the table to be identified can be increased, and the problem that the identification speed is influenced by overlarge data volume is avoided.
Optionally, in an embodiment, when the step S103 matches, in the preset sample table library, the sample table corresponding to the table to be identified according to the feature data, the specifically executed step includes: training a machine learning model, wherein the input of the machine learning model is the characteristic data of a training form, and the output of the machine learning model is the similarity between the training form and a sample form; inputting the characteristic data of the table to be recognized into a trained machine learning model to obtain the similarity between the table to be recognized and each sample table; and determining the sample table with the maximum similarity to the table to be identified as the sample table corresponding to the table to be identified.
Specifically, a machine learning model is preset, a sample table library is constructed, samples in the sample table library are used for training the preset machine learning model, wherein the sample table library comprises a plurality of tables on a construction drawing, specifically, the tables can be table 1, table 2 and table 3 …, feature data of the tables are extracted to be used as input of the machine learning model, the output of the machine learning model is table similarity, specifically, the output of the machine learning model is 1,0,0 … when the feature data of the table 1 is used as the input of the machine learning model; when the feature data in table 2 is input as a machine learning model, the output of the machine learning model is 0,1,0 …; when the feature data in table 3 is input as a machine learning model, the output of the machine learning model is 0,0,1 …, and so on. And after the training of the machine learning model is finished, storing, inputting the characteristic data of the table to be recognized to obtain the similarity between the table to be recognized and each sample table, and finally determining the sample table with the maximum similarity with the table to be recognized as the sample table corresponding to the table to be recognized.
By adopting the method for identifying the table content of the construction drawing, the machine learning model is introduced into the automatic identification table, and based on the strong data learning capability of the machine learning model, the matching accuracy of the table to be identified and the sample table can be improved, so that the identification accuracy of the table content is further improved.
Optionally, in an embodiment, when the step S103 matches, in the preset sample table library, the sample table corresponding to the table to be identified according to the feature data, the specifically executed step includes: inputting the characteristic data of the table to be recognized into a trained random forest model, wherein the trained random forest model outputs the similarity between the table to be recognized and each sample table; and determining the sample table with the maximum similarity to the table to be identified as the sample table corresponding to the table to be identified.
By adopting the method for identifying the table contents of the construction drawings provided by the embodiment, the machine learning model adopts the random forest model, the random forest model is a special bagging method, the decision tree is used as the model in the bagging, and for the constructed decision tree, when the nodes find the characteristics to be split, a part of characteristics are randomly extracted from the characteristics instead of finding all the characteristics to maximize indexes (such as information gain), and an optimal solution is found among the extracted characteristics to be applied to the nodes for splitting. The random forest is based on the bagging idea, and is actually equivalent to sampling samples and features, so that overfitting can be avoided, and the table identification has high accuracy.
Optionally, in an embodiment, before inputting the feature data of the table to be recognized into the trained random forest model, the method further includes: constructing an initial random forest model; obtaining a sample table from a sample table library as a training table; and taking the characteristic data of the training table as the input of the initial random forest model, taking the similarity between the training table and the sample table in the sample table library as the output of the initial random forest model, and training the initial random forest model to obtain the trained random forest model.
Optionally, in an embodiment, after the step of identifying the content in the table to be identified according to the table structure of the sample table, the method of identifying the content of the construction drawing table further includes: acquiring correction information of a table data set; and according to the correction information, constructing a sample table, and updating a preset sample table library.
Specifically, when the sample table is constructed, the table data set is corrected according to the correction information, the feature data (mainly including character feature correction) of the table to be recognized is corrected, the corrected table data set is used as the table structure of the sample table, the corrected feature data is used as the feature data of the sample table, and the construction of the sample table is completed. When the correction information of the form data set is obtained, the contents in the form to be identified can be displayed according to a preset data display mode corresponding to the form structure; and receiving the modification operation of the user on the content in the table to be identified to obtain the correction information.
By adopting the method for identifying the table content of the construction drawing, the correction information of the table data set is obtained after the identification is completed, the sample table is further constructed and the preset sample table library is updated, further, the identification result can be displayed to the user, and the user can correct the identification result. After correction, it is uploaded to the sample table library as a sample table in the sample table library. By adopting the embodiment, the sample table library can be expanded, the matching accuracy of the table to be identified and the sample table is improved, and the identification accuracy of the table content is further improved.
Specifically, taking a floor table as an example, the implementation of the method for identifying the contents of the construction drawing table provided by the embodiment is described as follows.
In this embodiment, the user only needs to import the drawing, and the software executing the method for identifying the contents of the form of the construction drawing can automatically identify the floor list in the drawing and export the identification result. Specifically, the software comprises a client and a server, interaction between a user and the software is completed at the client, a data processing process is completed by the interaction between the client and the server, a large number of related floor tables are stored at the server as sample tables in a sample table library, and a machine learning model for calculating similarity between a table to be identified and the sample tables is stored at the same time. When a user wants to identify a floor table in a drawing, the drawing is guided into software through a client and an identification function is triggered, at the moment, the client automatically collects feature information of all primitives in the drawing and converts various primitives into feature data required for identifying the floor table, namely, example information is converted into vector information, then the vector information is uploaded to a cloud server for similarity calculation, the server determines a sample table through the similarity calculation, further determines the content of the floor table according to the table structure of the sample table and returns the content to the client, the client displays the identified content through a display window according to a content display format, for example, the name of the floor, the floor code, the floor height, the floor bottom elevation and the concrete number corresponding to a component (comprising a non-frame beam, a wall column, a wall beam, a frame column, a frame beam, a shear wall and a plate) are displayed in the window, the user can modify and confirm the display result at the client, and after the display result is confirmed to be correct, the client uploads the related information to the server for storage and subsequent identification, so that the identification accuracy is higher and higher along with the continuous accumulation of server data, and finally the purposes of avoiding user modification and achieving rapid modeling are achieved.
By adopting the method for identifying the form content of the construction drawing provided by the embodiment, the artificial intelligence algorithm of the characteristic engineering is used for calculating the professional graphic similarity of the floor table, and the floor table information is continuously collected, so that the accuracy of the model is higher and higher, the modification operation of the user is gradually reduced, the problem of identifying the floor table of different specialties is solved by using a uniform flow, the floor table identification is automated, the operation cost of the user is saved, the accuracy is improved, the time is shortened, the method is used for all scenes needing to use the floor table, and has universality.
Example two
Corresponding to the first embodiment, a second embodiment of the present invention provides a method, an apparatus, and a computer device for identifying table contents of construction drawing, and fig. 2 is a block diagram of the method, the apparatus, and the computer device for identifying table contents of construction drawing provided by the second embodiment of the present invention, as shown in fig. 2, the apparatus includes a determining module 201, an extracting module 202, a matching module 203, and an identifying module 204: the determining module 201 is used for determining a table to be identified on the construction drawing; the extraction module 202 is configured to extract feature data of a table to be identified, where the feature data includes structural features and text features of the table to be identified; the matching module 203 is used for matching a sample table corresponding to the table to be identified in a preset sample table library according to the characteristic data; the identification module 204 is configured to determine a table data set based on a table structure of a sample table.
Optionally, in an embodiment, the extraction module 202 includes: the system comprises a first acquisition unit, a second acquisition unit and an extraction unit, wherein the first acquisition unit is used for acquiring line primitives in a table to be identified; the second acquisition unit is used for acquiring the end points of the line primitives and the intersection points of two or more line primitives as the characteristic points of the table to be identified; and the extraction unit is used for extracting the characteristics of the characteristic points as structural characteristics, wherein the characteristics of the characteristic points comprise the number, angles, line types, colors and/or statistical characteristics of the characteristic points in the table to be identified.
Optionally, in an embodiment, the extraction module 202 includes: the system comprises a third acquisition unit and a calculation unit, wherein the third acquisition unit is used for acquiring line primitives on the construction drawing; the calculation unit is used for calculating the relation characteristics of the line graphic elements in the table to be identified and all the line graphic elements on the building drawing as the structural characteristics.
Optionally, in an embodiment, the extraction module 202 includes: the fourth acquisition unit is used for acquiring the character primitives in the table to be identified; the processing unit is used for segmenting and filtering the character primitives to obtain character characteristics.
Optionally, in an embodiment, the matching module 203 includes: the system comprises an output unit and a determination unit, wherein the output unit is used for inputting the characteristic data of the table to be recognized into a trained random forest model, and the trained random forest model outputs the similarity between the table to be recognized and each sample table; and the determining unit is used for determining the sample table with the maximum similarity with the table to be identified as the sample table corresponding to the table to be identified.
Optionally, in an embodiment, the apparatus further includes a training module, configured to construct an initial random forest model before the matching module 203 inputs the feature data of the table to be recognized into the trained random forest model; obtaining a sample table from a sample table library as a training table; and taking the characteristic data of the training table as the input of the initial random forest model, taking the similarity between the training table and the sample table in the sample table library as the output of the initial random forest model, and training the initial random forest model to obtain the trained random forest model.
Optionally, in an embodiment, the apparatus for identifying the contents of the construction drawing form further includes: the system comprises an acquisition module and an updating module, wherein the acquisition module is used for acquiring the correction information of the table data set; and the updating module is used for constructing a sample table according to the correction information and updating the preset sample table library.
EXAMPLE III
The third embodiment further provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of multiple servers) capable of executing programs, and the like. As shown in fig. 3, the computer device 01 of the present embodiment at least includes but is not limited to: a memory 011 and a processor 012, which are communicatively connected to each other via a system bus, as shown in fig. 3. It is noted that fig. 3 only shows the computer device 01 having the component memory 011 and the processor 012, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
In this embodiment, the memory 011 (i.e., a readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 011 can be an internal storage unit of the computer device 01, such as a hard disk or a memory of the computer device 01. In other embodiments, the memory 011 can also be an external storage device of the computer device 01, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device 01. Of course, the memory 011 can also include both internal and external memory units of the computer device 01. In this embodiment, the memory 011 is generally used for storing an operating system installed in the computer device 01 and various application software, such as program codes of the apparatus for identifying the contents of the architectural drawing form in the second embodiment. Further, the memory 011 can also be used to temporarily store various kinds of data that have been output or are to be output.
The processor 012 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip in some embodiments. The processor 012 is generally used to control the overall operation of the computer device 01. In this embodiment, the processor 012 is configured to execute program codes or process data stored in the memory 011, such as a method of identifying the contents of a construction drawing form.
Example four
The fourth embodiment further provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer-readable storage medium of this embodiment is used for storing an apparatus for identifying the contents of the construction drawing form, and when being executed by a processor, the apparatus implements the method for identifying the contents of the construction drawing form of the first embodiment.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A method of identifying construction drawing form content, comprising:
determining a form to be identified on the construction drawing;
extracting feature data of the table to be recognized, wherein the feature data comprise structural features and character features of the table to be recognized;
matching a sample table corresponding to the table to be identified in a preset sample table library according to the characteristic data;
a table data set is determined from a table structure of the sample table.
2. The method of identifying construction drawing form content as recited in claim 1, wherein the step of extracting structural features of the form to be identified comprises:
acquiring line primitives in the table to be identified;
acquiring the end points of the line primitives and the intersection points of two or more line primitives as the characteristic points of the table to be identified; and
and extracting the characteristics of the characteristic points as the structural characteristics, wherein the characteristics of the characteristic points comprise the number, angles, line types, colors and/or statistical characteristics of the characteristic points in the table to be recognized of the line primitives.
3. The method of identifying construction drawing form content as recited in claim 1, wherein the step of extracting structural features of the form to be identified comprises:
acquiring line primitives on the building drawing;
and calculating the relation characteristics of the line graphic elements in the table to be identified and all the line graphic elements on the building drawing as the structural characteristics.
4. The method of identifying construction drawing form content as recited in claim 1, wherein the step of extracting text features of the form to be identified comprises:
acquiring a text primitive in the table to be identified;
and performing word segmentation and filtering on the text primitive to obtain the text characteristic.
5. The method for identifying the table content of the construction drawing as recited in claim 1, wherein the step of matching the sample table corresponding to the table to be identified according to the feature data in a preset sample table library comprises:
inputting the characteristic data of the table to be recognized into a trained random forest model, wherein the trained random forest model outputs the similarity between the table to be recognized and each sample table; and
and determining the sample table with the maximum similarity to the table to be identified as the sample table corresponding to the table to be identified.
6. The method of identifying construction drawing form content as recited in claim 5, wherein prior to inputting feature data of the form to be identified into the trained random forest model, the method further comprises:
constructing an initial random forest model;
obtaining the sample table in the sample table library as a training table;
and taking the feature data of the training table as the input of the initial random forest model, taking the similarity between the training table and the sample table in the sample table library as the output of the initial random forest model, and training the initial random forest model to obtain the trained random forest model.
7. The method of identifying construction drawing form content of claim 1, wherein after the step of determining a form data set from the table structure of the sample form, the method of identifying construction drawing form content further comprises:
acquiring correction information of the table data set;
and according to the correction information, constructing the sample table, and updating the preset sample table library.
8. An apparatus for identifying the contents of a construction drawing form, comprising:
the determining module is used for determining a table to be identified on the construction drawing;
the extraction module is used for extracting feature data of the table to be identified, wherein the feature data comprises structural features and character features of the table to be identified;
the matching module is used for matching the sample table corresponding to the table to be identified in a preset sample table library according to the characteristic data;
an identification module to determine a table data set from a table structure of the sample table.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented by the processor when executing the computer program.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010599424.XA CN111753763A (en) | 2020-06-28 | 2020-06-28 | Method and device for identifying table content of construction drawing and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010599424.XA CN111753763A (en) | 2020-06-28 | 2020-06-28 | Method and device for identifying table content of construction drawing and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111753763A true CN111753763A (en) | 2020-10-09 |
Family
ID=72676900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010599424.XA Pending CN111753763A (en) | 2020-06-28 | 2020-06-28 | Method and device for identifying table content of construction drawing and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111753763A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257629A (en) * | 2020-10-29 | 2021-01-22 | 广联达科技股份有限公司 | Text information identification method and device for construction drawing |
CN113779685A (en) * | 2021-09-27 | 2021-12-10 | 万翼科技有限公司 | Data processing method and related device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304490A (en) * | 2018-01-08 | 2018-07-20 | 有米科技股份有限公司 | Text based similarity determines method, apparatus and computer equipment |
WO2019174130A1 (en) * | 2018-03-14 | 2019-09-19 | 平安科技(深圳)有限公司 | Bill recognition method, server, and computer readable storage medium |
CN110390269A (en) * | 2019-06-26 | 2019-10-29 | 平安科技(深圳)有限公司 | PDF document table extracting method, device, equipment and computer readable storage medium |
-
2020
- 2020-06-28 CN CN202010599424.XA patent/CN111753763A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304490A (en) * | 2018-01-08 | 2018-07-20 | 有米科技股份有限公司 | Text based similarity determines method, apparatus and computer equipment |
WO2019174130A1 (en) * | 2018-03-14 | 2019-09-19 | 平安科技(深圳)有限公司 | Bill recognition method, server, and computer readable storage medium |
CN110390269A (en) * | 2019-06-26 | 2019-10-29 | 平安科技(深圳)有限公司 | PDF document table extracting method, device, equipment and computer readable storage medium |
Non-Patent Citations (3)
Title |
---|
席晓鹏, 糜宁芳, 罗志伟, 蔡士杰: "工程图中的模板识别和匹配方法", 计算机应用研究, no. 12, 28 December 2003 (2003-12-28) * |
彭欢, 谭建荣, 张树有: "基于矩阵表达的工程图纸表信息提取方法研究", 机械, no. 09, 30 September 2005 (2005-09-30) * |
马纯: "随机森林相似度算法研究", 工程硕士学位论文, 15 May 2019 (2019-05-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257629A (en) * | 2020-10-29 | 2021-01-22 | 广联达科技股份有限公司 | Text information identification method and device for construction drawing |
CN113779685A (en) * | 2021-09-27 | 2021-12-10 | 万翼科技有限公司 | Data processing method and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106909901B (en) | Method and device for detecting object from image | |
CN111259854B (en) | Method and device for identifying structured information of table in text image | |
CN111753763A (en) | Method and device for identifying table content of construction drawing and computer equipment | |
CN108959453B (en) | Information extraction method and device based on text clustering and readable storage medium | |
CN111061933A (en) | Picture sample library construction method and device, readable storage medium and terminal equipment | |
CN112699923A (en) | Document classification prediction method and device, computer equipment and storage medium | |
CN114581639A (en) | Method for generating information of beam steel bars in BIM (building information modeling) model based on beam leveling construction drawing | |
CN111046472B (en) | Method, device, computer equipment and storage medium for displaying model component information | |
CN111797685B (en) | Identification method and device of table structure | |
CN111752958A (en) | Intelligent associated label method, device, computer equipment and storage medium | |
CN109101973B (en) | Character recognition method, electronic device and storage medium | |
CN110598995A (en) | Intelligent customer rating method and device and computer readable storage medium | |
CN115630422A (en) | BIM model display method and system | |
CN115544620A (en) | Method, device and equipment for analyzing door and window tables in drawing and storage medium | |
CN116935010A (en) | Method, device and equipment for marking inner and outer walls and readable storage medium | |
CN111695441B (en) | Image document processing method, device and computer readable storage medium | |
CN114066948A (en) | Point cloud registration method and system, electronic device and storage medium | |
CN113887422A (en) | Table picture content extraction method, device and equipment based on artificial intelligence | |
CN111882534A (en) | Method and device for identifying line type and readable storage medium | |
CN117290915A (en) | Modeling method and device for prefabricated wall, computer equipment and readable storage medium | |
CN116227479B (en) | Entity identification method, entity identification device, computer equipment and readable storage medium | |
CN113177995B (en) | Text reorganization method of CAD drawing and computer readable storage medium | |
CN117236794B (en) | BIM-based engineering supervision information management method, system, medium and equipment | |
CN115982358B (en) | Document splitting method, device, terminal equipment and computer readable storage medium | |
CN114139512B (en) | Electronic form control method, electronic form control device, computer readable storage medium and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |