CN114677696A - Bridge steel structure quota data identification method - Google Patents
Bridge steel structure quota data identification method Download PDFInfo
- Publication number
- CN114677696A CN114677696A CN202011449004.XA CN202011449004A CN114677696A CN 114677696 A CN114677696 A CN 114677696A CN 202011449004 A CN202011449004 A CN 202011449004A CN 114677696 A CN114677696 A CN 114677696A
- Authority
- CN
- China
- Prior art keywords
- data
- information
- quota
- document image
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/183—Tabulation, i.e. one-dimensional positioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Character Discrimination (AREA)
Abstract
The invention relates to a bridge steel structure quota data identification method, which comprises the following steps: uploading a file: acquiring a document image corresponding to a target drawing, uploading the document image to an OCR system for analysis, wherein the analysis is successful, the verification is submitted after manual recheck, if the verification is successful, downloading a data file, and if the verification is failed, re-executing the manual recheck; utilize OCR recognition technology, improve current bridge steel construction quota calculation mode, improve quota computational efficiency, the rate of accuracy is liberated out from the work that repeated consuming time with current quota personnel. Data in the quota calculation process are collected and summarized in real time, and powerful data support is provided for production capacity analysis decision making of production management departments.
Description
Technical Field
The invention relates to the field of image processing, in particular to a bridge steel structure quota data identification method for positioning and identifying tables and texts in engineering drawings of bridge steel structure projects.
Background
The traffic of the bridge steel structure is in the annual rising stage, and the production capacity is more than 1100 ten thousand tons in 2019.
The quota is used as an important ring of a bridge steel structure production link and is of great importance to the management and control of the production cost of a project. It is known through the research, current bridge steel construction quota calculation still adopts comparatively traditional mode: the main material quota, the coating quota and the welding quota are calculated by using data of a material table in a bridge design drawing, and the conventional acquisition mode is manual recording to excel.
With the continuous increase of structural projects of bridge steel, the limitation of a traditional quota calculation mode is continuously exposed, and the traditional calculation method has the defects of low efficiency, high error tendency, strong human subjectivity, poor data traceability, difficult historical data query, insufficient material consumption statistics and the like, so that the prior art needs to be improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a bridge steel structure quota data identification method to overcome the defects in the prior art.
The technical scheme for solving the technical problems is as follows:
a bridge steel structure quota data identification method comprises the following steps:
1) uploading a file: acquiring a document image corresponding to a target drawing, and uploading the document image to an OCR system for analysis, wherein the document image at least comprises a table to be identified; if the uploading is successful, the next step is carried out to analyze the document image specifically, and if the uploading is failed, the document image is uploaded again;
2) the document image enters an OCR system for analysis, if the analysis is successful, the step 4) is carried out, if the analysis is successful, the step 3) is carried out, if the analysis is failed, the reason of the error is checked manually, and the step 1 is skipped again after the processing;
3) manually completing the unsuccessful part, and then executing the step 4);
4) manually rechecking;
5) submitting the audit, if the audit is successful, executing data file downloading, and if the audit is failed, skipping to the step 4) to re-execute manual recheck.
Further, in the step 2), the analyzing includes:
2.1) establishing a form detector by adopting a neural network based on RCNN, and identifying the position of the form to be identified so as to determine the specific position of the form to be identified in the document image;
2.2) carrying out perspective transformation on each table to be identified to obtain a corrected independent material table, and dividing the corrected independent material table into independent table images;
2.3) for each table image in the character direction score image, the character direction is normalized to [0, 1] and corresponds to the angle of [0, 2 ]; performing text recognition on each line of text positioned; sorting the recognized texts from top to bottom and from left to right according to the coordinates of the upper left point;
2.4) judging whether the table is a target table according to the content of the identification table;
2.5) table structuring: classifying each entry in the text aiming at various tables, distinguishing relevant information of row marks, column marks and spans of the tables, and then extracting information in all the tables; for the extraction of the picture names and the picture numbers, the picture numbers and the picture names on each PDF are all arranged at the lower part of each page of PDF, so the PDFs are cut and subjected to table identification, and then the information corresponding to the picture names and the picture numbers is extracted according to rules;
2.6) outputting the material table structured information: after the structuralization of the material table is completed, the structuralization information needs to be converted into data which can be used for transmission and storage, the output mode is a JSON format, and the data format is formatted and output according to the enterprise self-defined standard;
2.7) return information of all locations in the image: identifying all positioned contents in the images to obtain formatting information of each image, finally packaging the information into a larger JSON, and taking a single PDF as a value in an identification result list;
further, in the step 2.3), the text recognition adopts a CNN + RNN + CTC model to obtain all character code strings of each line of text, where the model is a pre-trained whole line character recognition model, and the specific training method includes: marking text information corresponding to the whole line of images, directly sending the text information into a CRNN (continuous computing network) without marking character segmentation information, finally calculating Loss by adopting a CTC (central control system) technology, and performing gradient updating to obtain a whole line identification network model;
further, in the step 4), the manual review includes the following steps:
4.1) leakage detection: the leakage checking function is that the rating system performs leakage checking operation on the material tables in the OCR system identification files to see whether unidentified material tables exist or not and whether the identified tables are required tables or not;
4.2) format conversion: the function is that the rating system correspondingly selects the columns of the material list recognized by the OCR system and the fields in the system, solves the problem that the column heads of the material list are inconsistent in files issued by different design houses, and converts the files into uniform names in the system;
4.3) error checking: the method has the functions that the rating system carries out error checking operation on detailed data in a material table recognized by the OCR system, whether unidentified data exist or not is judged, and whether the recognized data have errors or not is manually checked;
the beneficial effects of the invention are: utilize OCR recognition technology, improve current bridge steel construction quota calculation mode, improve quota computational efficiency, the rate of accuracy is liberated out from the work that repeated consuming time with current quota personnel. Data in the quota calculation process are collected and summarized in real time, and powerful data support is provided for production capacity analysis decision making of production management departments.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a diagram illustrating the table position recognition result according to the present invention;
FIG. 3 is a schematic diagram of a single table obtained by perspective transformation of FIG. 2 according to the present invention;
FIG. 4 is a diagram illustrating the recognition result of FIG. 3 according to the present invention;
FIG. 5 is a schematic diagram of the structural result of the present invention;
FIG. 6 is a diagram illustrating a final recognition result according to the present invention;
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
As shown in fig. 1 to 6, a bridge steel structure quota data identification method includes the following steps:
1) uploading a file: acquiring a document image corresponding to a target drawing, uploading the document image to an OCR system for analysis, wherein the document image at least comprises a table to be identified; if the uploading is successful, the next step is carried out to analyze the document image specifically, and if the uploading is failed, the document image is uploaded again;
2) the document image enters an OCR system for analysis, if the analysis is successful, the step 4) is carried out, if the analysis is successful, the step 3) is carried out, if the analysis is failed, the reason of the error is checked manually, and the step 1 is skipped again after the processing;
3) manually completing the unsuccessful part, and then executing the step 4);
4) manually rechecking;
5) submitting the audit, if the audit is successful, executing data file downloading, and if the audit is failed, skipping to the step 4) to re-execute manual recheck.
In the step 2), the analysis includes the following steps:
2.1) establishing a form detector by adopting a neural network based on RCNN, and identifying the position of the form to be identified so as to determine the specific position of the form to be identified in the document image;
2.2) carrying out perspective transformation on each table to be identified to obtain a corrected independent material table, and dividing the corrected independent material table into independent table images;
2.3) for each form image in the character direction score image, the character direction is normalized to be [ O, 1] and corresponds to the angle of [0, 2 ]; performing text recognition on each line of text positioned; sorting the recognized texts from top to bottom and from left to right according to the coordinates of the upper left point;
2.4) judging whether the table is a target table according to the content of the identification table;
detailed description of the preferred embodiment 1
If the target table is a material table, judging whether the target table is the material table or not according to the standard of judging whether the target table is the material table or not, namely judging the characteristics of each column name of the target table; specifically, in the target table, if any four or more words of "transmission number, name, portion, structure type, item, rod, type, material, serial number, material number, unit number, total length, number, unit weight, statistical center, unit meter, unit net weight, unit mass, unit weight, net weight, total, material, type, remark, subtotal, steel plate, steel material" exist, the target table is determined as the target table.
2.5) table structuring: classifying each entry in the text aiming at various tables, distinguishing relevant information of row marks, column marks and spans of the tables, and then extracting information in all the tables; for the extraction of the picture names and the picture numbers, the picture numbers and the picture names on each PDF are all arranged at the lower part of each page of PDF, so the PDFs are cut and subjected to table identification, and then the information corresponding to the picture names and the picture numbers is extracted according to rules;
2.6) outputting the material table structured information: after the structuralization of the material table is completed, the structuralization information needs to be converted into data which can be used for transmission and storage, the output mode is a JSON format, and the data format is formatted and output according to the enterprise self-defined standard;
2.7) return information of all locations in the image: identifying all positioned contents in the images to obtain formatting information of each image, finally packaging the information into a larger JSON, and taking a single PDF as a value in an identification result list;
in the step 2.3), the text recognition adopts a CNN + RNN + CTC model to obtain all character code strings of each line of text, wherein the model is a pre-trained whole line character recognition model, and the specific training method comprises the following steps: marking text information corresponding to the whole line of images, directly sending the text information into a CRNN (continuous computing network) without marking character segmentation information, finally calculating Loss by adopting a CTC (central control system) technology, and performing gradient updating to obtain a whole line identification network model;
in the step 4), the manual rechecking comprises the following steps:
4.1) leakage detection: the leakage checking function is that the rating system performs leakage checking operation on the material tables in the OCR system identification files to see whether unidentified material tables exist or not and whether the identified tables are required tables or not;
4.2) format conversion: the method has the functions that the rating system performs corresponding selection function on the columns of the material list recognized by the OCR system and the fields in the system, the problem that the heads of the columns of the material list are inconsistent in files issued by different design hospitals is solved, and the files are converted into uniform names in the system;
4.3) error checking: the function is that the rating system carries out error checking operation on detailed data in the material table recognized by the OCR system, whether unidentified data exist or not is judged, and whether the recognized data have errors or not is manually checked;
aiming at the existing quota mode, the efficiency and the accuracy of quota calculation are improved by utilizing the rapid extraction of a bill of materials; the invention can be suitable for drawings with higher definition collected by various collection modes, such as mobile phones, scanners, high-speed cameras, cameras and other equipment; has the following advantages:
1. automation: the staff carries out scanning, the discernment of drawing automatically, very big reduction the work load of typeeing, promoted user experience.
2. Intelligentization: the method adopts artificial intelligence to automatically position and identify without manual input, thereby reducing labor cost.
3. Informationization: the operation of the whole process is informationized and stored in a data mode, and follow-up important work such as tracing, data statistics and the like can be carried out.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (4)
1. A bridge steel structure quota data identification method is characterized by comprising the following steps:
1) uploading a file: acquiring a document image corresponding to a target drawing, and uploading the document image to an OCR system for analysis, wherein the document image at least comprises a table to be identified; if the uploading is successful, the next step is carried out to analyze the document image specifically, and if the uploading is failed, the document image is uploaded again;
2) the document image enters an OCR system for analysis, if the analysis is successful, the step 4) is carried out, if the analysis is successful, the step 3) is carried out, if the analysis is failed, the reason of the error is checked manually, and the step 1 is skipped again after the processing;
3) manually completing the unsuccessful part, and then executing the step 4);
4) manually rechecking;
5) submitting the audit, if the audit is successful, executing data file downloading, and if the audit is failed, skipping to the step 4) to re-execute manual recheck.
2. The bridge steel structure quota data identification method as claimed in claim 1, wherein the analysis in the step 2) comprises the following steps:
2.1) establishing a form detector by adopting a neural network based on RCNN, and identifying the position of the form to be identified so as to determine the specific position of the form to be identified in the document image;
2.2) carrying out perspective transformation on each table to be identified to obtain a corrected independent material table, and dividing the corrected independent material table into independent table images;
2.3) for each table image in the character direction score image, the character direction is normalized to [0, 1] and corresponds to the angle of [0, 2 ]; performing text recognition on each line of text positioned; sorting the recognized texts from top to bottom and from left to right according to the coordinates of the upper left point;
2.4) judging whether the table is a target table according to the content of the identification table;
2.5) table structuring: classifying each entry in the text aiming at various tables, distinguishing relevant information of row marks, column marks and spans of the tables, and then extracting information in all the tables; for the extraction of the picture names and the picture numbers, the picture numbers and the picture names on each PDF are all arranged at the lower part of each page of PDF, so the PDFs are cut and subjected to table identification, and then the information corresponding to the picture names and the picture numbers is extracted according to rules;
2.6) outputting the material table structured information: after the structuralization of the material table is completed, the structuralization information needs to be converted into data which can be used for transmission and storage, the output mode is a JSON format, and the data format is formatted and output according to the enterprise self-defined standard;
2.7) return information of all locations in the image: identifying all contents positioned in the images to obtain formatting information of each image, finally packaging the information into a larger JSON, and taking a single PDF as one value in an identification result list.
3. The bridge steel structure quota data identification method of claim 2, wherein: in the step 2.3), the text recognition adopts a CNN + RNN + CTC model to obtain all character code strings of each line of text, wherein the model is a pre-trained whole line character recognition model, and the specific training method comprises the following steps: and marking text information corresponding to the whole line of images, directly sending the text information into the CRNN without marking character segmentation information, finally calculating the Loss by adopting a CTC (central control system) technology, and performing gradient updating to obtain a whole line identification network model.
4. The bridge steel structure quota data identification method according to claim 1, wherein in the step 4), the manual review comprises the following steps:
4.1) leak detection: the leakage detection function is that the rating system performs leakage detection operation on the material table in the OCR system identification file to see whether an unidentified material table exists or not and whether the identified table is a required table or not;
4.2) format conversion: the function is that the rating system correspondingly selects the columns of the material list recognized by the OCR system and the fields in the system, solves the problem that the column heads of the material list are inconsistent in files issued by different design houses, and converts the files into uniform names in the system;
4.3) error checking: the function is that the rating system carries out error checking operation on detailed data in the material table recognized by the OCR system to see whether unidentified data exist or not, and whether the recognized data have errors or not is manually checked.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011449004.XA CN114677696A (en) | 2020-12-09 | 2020-12-09 | Bridge steel structure quota data identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011449004.XA CN114677696A (en) | 2020-12-09 | 2020-12-09 | Bridge steel structure quota data identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114677696A true CN114677696A (en) | 2022-06-28 |
Family
ID=82070169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011449004.XA Pending CN114677696A (en) | 2020-12-09 | 2020-12-09 | Bridge steel structure quota data identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114677696A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116994282A (en) * | 2023-09-25 | 2023-11-03 | 安徽省交通规划设计研究总院股份有限公司 | Reinforcing steel bar quantity identification and collection method for bridge design drawing |
-
2020
- 2020-12-09 CN CN202011449004.XA patent/CN114677696A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116994282A (en) * | 2023-09-25 | 2023-11-03 | 安徽省交通规划设计研究总院股份有限公司 | Reinforcing steel bar quantity identification and collection method for bridge design drawing |
CN116994282B (en) * | 2023-09-25 | 2023-12-15 | 安徽省交通规划设计研究总院股份有限公司 | Reinforcing steel bar quantity identification and collection method for bridge design drawing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113642088B (en) | Method for feeding back construction progress information and displaying deviation of BIM (building information modeling) model in real time | |
CN107067044A (en) | A kind of finance reimbursement unanimous vote is according to intelligent checks system | |
CN110032654B (en) | Supermarket commodity entry method and system based on artificial intelligence | |
CN114462556B (en) | Enterprise association industry chain classification method, training method, device, equipment and medium | |
CN115034200A (en) | Drawing information extraction method and device, electronic equipment and storage medium | |
CN112836809A (en) | Device characteristic extraction method and fault prediction method of convolutional neural network based on differential feature fusion | |
CN110543475A (en) | financial statement data automatic identification and analysis method based on machine learning | |
CN114219507A (en) | Qualification auditing method and device for traditional Chinese medicine supplier, electronic equipment and storage medium | |
CN112102443A (en) | Marking system and marking method suitable for substation equipment inspection image | |
CN117035810A (en) | Agricultural product traceability system based on multi-source data | |
CN114677696A (en) | Bridge steel structure quota data identification method | |
CN112598142B (en) | Wind turbine maintenance working quality inspection auxiliary method and system | |
CN117592470A (en) | Low-cost gazette data extraction method driven by large language model | |
CN117768618A (en) | Method for analyzing personnel violation based on video image | |
CN116899881A (en) | Logistics cargo identification and sorting method and system based on deep learning | |
CN114708445B (en) | Trademark similarity recognition method and device, electronic equipment and storage medium | |
CN111047731A (en) | AR technology-based telecommunication room inspection method and system | |
CN115601778A (en) | Job correction method, device and equipment based on image recognition and storage medium | |
CN112418652B (en) | Risk identification method and related device | |
CN115311611A (en) | Steel bar counting method for production of prefabricated components of prefabricated building | |
CN114676207A (en) | Financial data audit relation auditing module for financial long text review system | |
JP2001101340A (en) | Character reader and character recognition method | |
CN113963367B (en) | Model-based financial transaction file and money extraction method | |
CN111223109B (en) | Complex form image analysis method | |
CN118097707A (en) | Multi-intermodal document recognition method and system based on artificial intelligence OCR technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |