CN114677696A - Bridge steel structure quota data identification method - Google Patents

Bridge steel structure quota data identification method Download PDF

Info

Publication number
CN114677696A
CN114677696A CN202011449004.XA CN202011449004A CN114677696A CN 114677696 A CN114677696 A CN 114677696A CN 202011449004 A CN202011449004 A CN 202011449004A CN 114677696 A CN114677696 A CN 114677696A
Authority
CN
China
Prior art keywords
data
information
quota
document image
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011449004.XA
Other languages
Chinese (zh)
Inventor
刘恒超
张明涛
杨路帆
谢智华
邱守慈
高天
杨超然
徐兵峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Railway Hi Tech Industry Corp Ltd
Original Assignee
China Railway Hi Tech Industry Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Railway Hi Tech Industry Corp Ltd filed Critical China Railway Hi Tech Industry Corp Ltd
Priority to CN202011449004.XA priority Critical patent/CN114677696A/en
Publication of CN114677696A publication Critical patent/CN114677696A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/183Tabulation, i.e. one-dimensional positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to a bridge steel structure quota data identification method, which comprises the following steps: uploading a file: acquiring a document image corresponding to a target drawing, uploading the document image to an OCR system for analysis, wherein the analysis is successful, the verification is submitted after manual recheck, if the verification is successful, downloading a data file, and if the verification is failed, re-executing the manual recheck; utilize OCR recognition technology, improve current bridge steel construction quota calculation mode, improve quota computational efficiency, the rate of accuracy is liberated out from the work that repeated consuming time with current quota personnel. Data in the quota calculation process are collected and summarized in real time, and powerful data support is provided for production capacity analysis decision making of production management departments.

Description

Bridge steel structure quota data identification method
Technical Field
The invention relates to the field of image processing, in particular to a bridge steel structure quota data identification method for positioning and identifying tables and texts in engineering drawings of bridge steel structure projects.
Background
The traffic of the bridge steel structure is in the annual rising stage, and the production capacity is more than 1100 ten thousand tons in 2019.
The quota is used as an important ring of a bridge steel structure production link and is of great importance to the management and control of the production cost of a project. It is known through the research, current bridge steel construction quota calculation still adopts comparatively traditional mode: the main material quota, the coating quota and the welding quota are calculated by using data of a material table in a bridge design drawing, and the conventional acquisition mode is manual recording to excel.
With the continuous increase of structural projects of bridge steel, the limitation of a traditional quota calculation mode is continuously exposed, and the traditional calculation method has the defects of low efficiency, high error tendency, strong human subjectivity, poor data traceability, difficult historical data query, insufficient material consumption statistics and the like, so that the prior art needs to be improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a bridge steel structure quota data identification method to overcome the defects in the prior art.
The technical scheme for solving the technical problems is as follows:
a bridge steel structure quota data identification method comprises the following steps:
1) uploading a file: acquiring a document image corresponding to a target drawing, and uploading the document image to an OCR system for analysis, wherein the document image at least comprises a table to be identified; if the uploading is successful, the next step is carried out to analyze the document image specifically, and if the uploading is failed, the document image is uploaded again;
2) the document image enters an OCR system for analysis, if the analysis is successful, the step 4) is carried out, if the analysis is successful, the step 3) is carried out, if the analysis is failed, the reason of the error is checked manually, and the step 1 is skipped again after the processing;
3) manually completing the unsuccessful part, and then executing the step 4);
4) manually rechecking;
5) submitting the audit, if the audit is successful, executing data file downloading, and if the audit is failed, skipping to the step 4) to re-execute manual recheck.
Further, in the step 2), the analyzing includes:
2.1) establishing a form detector by adopting a neural network based on RCNN, and identifying the position of the form to be identified so as to determine the specific position of the form to be identified in the document image;
2.2) carrying out perspective transformation on each table to be identified to obtain a corrected independent material table, and dividing the corrected independent material table into independent table images;
2.3) for each table image in the character direction score image, the character direction is normalized to [0, 1] and corresponds to the angle of [0, 2 ]; performing text recognition on each line of text positioned; sorting the recognized texts from top to bottom and from left to right according to the coordinates of the upper left point;
2.4) judging whether the table is a target table according to the content of the identification table;
2.5) table structuring: classifying each entry in the text aiming at various tables, distinguishing relevant information of row marks, column marks and spans of the tables, and then extracting information in all the tables; for the extraction of the picture names and the picture numbers, the picture numbers and the picture names on each PDF are all arranged at the lower part of each page of PDF, so the PDFs are cut and subjected to table identification, and then the information corresponding to the picture names and the picture numbers is extracted according to rules;
2.6) outputting the material table structured information: after the structuralization of the material table is completed, the structuralization information needs to be converted into data which can be used for transmission and storage, the output mode is a JSON format, and the data format is formatted and output according to the enterprise self-defined standard;
2.7) return information of all locations in the image: identifying all positioned contents in the images to obtain formatting information of each image, finally packaging the information into a larger JSON, and taking a single PDF as a value in an identification result list;
further, in the step 2.3), the text recognition adopts a CNN + RNN + CTC model to obtain all character code strings of each line of text, where the model is a pre-trained whole line character recognition model, and the specific training method includes: marking text information corresponding to the whole line of images, directly sending the text information into a CRNN (continuous computing network) without marking character segmentation information, finally calculating Loss by adopting a CTC (central control system) technology, and performing gradient updating to obtain a whole line identification network model;
further, in the step 4), the manual review includes the following steps:
4.1) leakage detection: the leakage checking function is that the rating system performs leakage checking operation on the material tables in the OCR system identification files to see whether unidentified material tables exist or not and whether the identified tables are required tables or not;
4.2) format conversion: the function is that the rating system correspondingly selects the columns of the material list recognized by the OCR system and the fields in the system, solves the problem that the column heads of the material list are inconsistent in files issued by different design houses, and converts the files into uniform names in the system;
4.3) error checking: the method has the functions that the rating system carries out error checking operation on detailed data in a material table recognized by the OCR system, whether unidentified data exist or not is judged, and whether the recognized data have errors or not is manually checked;
the beneficial effects of the invention are: utilize OCR recognition technology, improve current bridge steel construction quota calculation mode, improve quota computational efficiency, the rate of accuracy is liberated out from the work that repeated consuming time with current quota personnel. Data in the quota calculation process are collected and summarized in real time, and powerful data support is provided for production capacity analysis decision making of production management departments.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a diagram illustrating the table position recognition result according to the present invention;
FIG. 3 is a schematic diagram of a single table obtained by perspective transformation of FIG. 2 according to the present invention;
FIG. 4 is a diagram illustrating the recognition result of FIG. 3 according to the present invention;
FIG. 5 is a schematic diagram of the structural result of the present invention;
FIG. 6 is a diagram illustrating a final recognition result according to the present invention;
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
As shown in fig. 1 to 6, a bridge steel structure quota data identification method includes the following steps:
1) uploading a file: acquiring a document image corresponding to a target drawing, uploading the document image to an OCR system for analysis, wherein the document image at least comprises a table to be identified; if the uploading is successful, the next step is carried out to analyze the document image specifically, and if the uploading is failed, the document image is uploaded again;
2) the document image enters an OCR system for analysis, if the analysis is successful, the step 4) is carried out, if the analysis is successful, the step 3) is carried out, if the analysis is failed, the reason of the error is checked manually, and the step 1 is skipped again after the processing;
3) manually completing the unsuccessful part, and then executing the step 4);
4) manually rechecking;
5) submitting the audit, if the audit is successful, executing data file downloading, and if the audit is failed, skipping to the step 4) to re-execute manual recheck.
In the step 2), the analysis includes the following steps:
2.1) establishing a form detector by adopting a neural network based on RCNN, and identifying the position of the form to be identified so as to determine the specific position of the form to be identified in the document image;
2.2) carrying out perspective transformation on each table to be identified to obtain a corrected independent material table, and dividing the corrected independent material table into independent table images;
2.3) for each form image in the character direction score image, the character direction is normalized to be [ O, 1] and corresponds to the angle of [0, 2 ]; performing text recognition on each line of text positioned; sorting the recognized texts from top to bottom and from left to right according to the coordinates of the upper left point;
2.4) judging whether the table is a target table according to the content of the identification table;
detailed description of the preferred embodiment 1
If the target table is a material table, judging whether the target table is the material table or not according to the standard of judging whether the target table is the material table or not, namely judging the characteristics of each column name of the target table; specifically, in the target table, if any four or more words of "transmission number, name, portion, structure type, item, rod, type, material, serial number, material number, unit number, total length, number, unit weight, statistical center, unit meter, unit net weight, unit mass, unit weight, net weight, total, material, type, remark, subtotal, steel plate, steel material" exist, the target table is determined as the target table.
2.5) table structuring: classifying each entry in the text aiming at various tables, distinguishing relevant information of row marks, column marks and spans of the tables, and then extracting information in all the tables; for the extraction of the picture names and the picture numbers, the picture numbers and the picture names on each PDF are all arranged at the lower part of each page of PDF, so the PDFs are cut and subjected to table identification, and then the information corresponding to the picture names and the picture numbers is extracted according to rules;
2.6) outputting the material table structured information: after the structuralization of the material table is completed, the structuralization information needs to be converted into data which can be used for transmission and storage, the output mode is a JSON format, and the data format is formatted and output according to the enterprise self-defined standard;
2.7) return information of all locations in the image: identifying all positioned contents in the images to obtain formatting information of each image, finally packaging the information into a larger JSON, and taking a single PDF as a value in an identification result list;
in the step 2.3), the text recognition adopts a CNN + RNN + CTC model to obtain all character code strings of each line of text, wherein the model is a pre-trained whole line character recognition model, and the specific training method comprises the following steps: marking text information corresponding to the whole line of images, directly sending the text information into a CRNN (continuous computing network) without marking character segmentation information, finally calculating Loss by adopting a CTC (central control system) technology, and performing gradient updating to obtain a whole line identification network model;
in the step 4), the manual rechecking comprises the following steps:
4.1) leakage detection: the leakage checking function is that the rating system performs leakage checking operation on the material tables in the OCR system identification files to see whether unidentified material tables exist or not and whether the identified tables are required tables or not;
4.2) format conversion: the method has the functions that the rating system performs corresponding selection function on the columns of the material list recognized by the OCR system and the fields in the system, the problem that the heads of the columns of the material list are inconsistent in files issued by different design hospitals is solved, and the files are converted into uniform names in the system;
4.3) error checking: the function is that the rating system carries out error checking operation on detailed data in the material table recognized by the OCR system, whether unidentified data exist or not is judged, and whether the recognized data have errors or not is manually checked;
aiming at the existing quota mode, the efficiency and the accuracy of quota calculation are improved by utilizing the rapid extraction of a bill of materials; the invention can be suitable for drawings with higher definition collected by various collection modes, such as mobile phones, scanners, high-speed cameras, cameras and other equipment; has the following advantages:
1. automation: the staff carries out scanning, the discernment of drawing automatically, very big reduction the work load of typeeing, promoted user experience.
2. Intelligentization: the method adopts artificial intelligence to automatically position and identify without manual input, thereby reducing labor cost.
3. Informationization: the operation of the whole process is informationized and stored in a data mode, and follow-up important work such as tracing, data statistics and the like can be carried out.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (4)

1. A bridge steel structure quota data identification method is characterized by comprising the following steps:
1) uploading a file: acquiring a document image corresponding to a target drawing, and uploading the document image to an OCR system for analysis, wherein the document image at least comprises a table to be identified; if the uploading is successful, the next step is carried out to analyze the document image specifically, and if the uploading is failed, the document image is uploaded again;
2) the document image enters an OCR system for analysis, if the analysis is successful, the step 4) is carried out, if the analysis is successful, the step 3) is carried out, if the analysis is failed, the reason of the error is checked manually, and the step 1 is skipped again after the processing;
3) manually completing the unsuccessful part, and then executing the step 4);
4) manually rechecking;
5) submitting the audit, if the audit is successful, executing data file downloading, and if the audit is failed, skipping to the step 4) to re-execute manual recheck.
2. The bridge steel structure quota data identification method as claimed in claim 1, wherein the analysis in the step 2) comprises the following steps:
2.1) establishing a form detector by adopting a neural network based on RCNN, and identifying the position of the form to be identified so as to determine the specific position of the form to be identified in the document image;
2.2) carrying out perspective transformation on each table to be identified to obtain a corrected independent material table, and dividing the corrected independent material table into independent table images;
2.3) for each table image in the character direction score image, the character direction is normalized to [0, 1] and corresponds to the angle of [0, 2 ]; performing text recognition on each line of text positioned; sorting the recognized texts from top to bottom and from left to right according to the coordinates of the upper left point;
2.4) judging whether the table is a target table according to the content of the identification table;
2.5) table structuring: classifying each entry in the text aiming at various tables, distinguishing relevant information of row marks, column marks and spans of the tables, and then extracting information in all the tables; for the extraction of the picture names and the picture numbers, the picture numbers and the picture names on each PDF are all arranged at the lower part of each page of PDF, so the PDFs are cut and subjected to table identification, and then the information corresponding to the picture names and the picture numbers is extracted according to rules;
2.6) outputting the material table structured information: after the structuralization of the material table is completed, the structuralization information needs to be converted into data which can be used for transmission and storage, the output mode is a JSON format, and the data format is formatted and output according to the enterprise self-defined standard;
2.7) return information of all locations in the image: identifying all contents positioned in the images to obtain formatting information of each image, finally packaging the information into a larger JSON, and taking a single PDF as one value in an identification result list.
3. The bridge steel structure quota data identification method of claim 2, wherein: in the step 2.3), the text recognition adopts a CNN + RNN + CTC model to obtain all character code strings of each line of text, wherein the model is a pre-trained whole line character recognition model, and the specific training method comprises the following steps: and marking text information corresponding to the whole line of images, directly sending the text information into the CRNN without marking character segmentation information, finally calculating the Loss by adopting a CTC (central control system) technology, and performing gradient updating to obtain a whole line identification network model.
4. The bridge steel structure quota data identification method according to claim 1, wherein in the step 4), the manual review comprises the following steps:
4.1) leak detection: the leakage detection function is that the rating system performs leakage detection operation on the material table in the OCR system identification file to see whether an unidentified material table exists or not and whether the identified table is a required table or not;
4.2) format conversion: the function is that the rating system correspondingly selects the columns of the material list recognized by the OCR system and the fields in the system, solves the problem that the column heads of the material list are inconsistent in files issued by different design houses, and converts the files into uniform names in the system;
4.3) error checking: the function is that the rating system carries out error checking operation on detailed data in the material table recognized by the OCR system to see whether unidentified data exist or not, and whether the recognized data have errors or not is manually checked.
CN202011449004.XA 2020-12-09 2020-12-09 Bridge steel structure quota data identification method Pending CN114677696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011449004.XA CN114677696A (en) 2020-12-09 2020-12-09 Bridge steel structure quota data identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011449004.XA CN114677696A (en) 2020-12-09 2020-12-09 Bridge steel structure quota data identification method

Publications (1)

Publication Number Publication Date
CN114677696A true CN114677696A (en) 2022-06-28

Family

ID=82070169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011449004.XA Pending CN114677696A (en) 2020-12-09 2020-12-09 Bridge steel structure quota data identification method

Country Status (1)

Country Link
CN (1) CN114677696A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994282A (en) * 2023-09-25 2023-11-03 安徽省交通规划设计研究总院股份有限公司 Reinforcing steel bar quantity identification and collection method for bridge design drawing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994282A (en) * 2023-09-25 2023-11-03 安徽省交通规划设计研究总院股份有限公司 Reinforcing steel bar quantity identification and collection method for bridge design drawing
CN116994282B (en) * 2023-09-25 2023-12-15 安徽省交通规划设计研究总院股份有限公司 Reinforcing steel bar quantity identification and collection method for bridge design drawing

Similar Documents

Publication Publication Date Title
CN113642088B (en) Method for feeding back construction progress information and displaying deviation of BIM (building information modeling) model in real time
CN107067044A (en) A kind of finance reimbursement unanimous vote is according to intelligent checks system
CN110032654B (en) Supermarket commodity entry method and system based on artificial intelligence
CN114462556B (en) Enterprise association industry chain classification method, training method, device, equipment and medium
CN115034200A (en) Drawing information extraction method and device, electronic equipment and storage medium
CN112836809A (en) Device characteristic extraction method and fault prediction method of convolutional neural network based on differential feature fusion
CN110543475A (en) financial statement data automatic identification and analysis method based on machine learning
CN114219507A (en) Qualification auditing method and device for traditional Chinese medicine supplier, electronic equipment and storage medium
CN112102443A (en) Marking system and marking method suitable for substation equipment inspection image
CN117035810A (en) Agricultural product traceability system based on multi-source data
CN114677696A (en) Bridge steel structure quota data identification method
CN112598142B (en) Wind turbine maintenance working quality inspection auxiliary method and system
CN117592470A (en) Low-cost gazette data extraction method driven by large language model
CN117768618A (en) Method for analyzing personnel violation based on video image
CN116899881A (en) Logistics cargo identification and sorting method and system based on deep learning
CN114708445B (en) Trademark similarity recognition method and device, electronic equipment and storage medium
CN111047731A (en) AR technology-based telecommunication room inspection method and system
CN115601778A (en) Job correction method, device and equipment based on image recognition and storage medium
CN112418652B (en) Risk identification method and related device
CN115311611A (en) Steel bar counting method for production of prefabricated components of prefabricated building
CN114676207A (en) Financial data audit relation auditing module for financial long text review system
JP2001101340A (en) Character reader and character recognition method
CN113963367B (en) Model-based financial transaction file and money extraction method
CN111223109B (en) Complex form image analysis method
CN118097707A (en) Multi-intermodal document recognition method and system based on artificial intelligence OCR technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination